Define your data schema once. Validate at scale. Stay columnar.
Project description
Define your schema once. Validate at scale. Stay columnar.
Built for DataFrames, powered across Pydantic, Polars, and SQLAlchemy.
Flycatcher is a DataFrame-native schema layer for Python. Define your data model once and generate optimized representations for every part of your stack:
- ๐ฏ Pydantic models for API validation & serialization
- โก Polars validators for blazing-fast bulk validation
- ๐๏ธ SQLAlchemy tables for typed database access
Built for modern data workflows: Validate millions of rows at high speed, keep schema drift at zero, and stay columnar end-to-end.
โ Why Flycatcher?
Modern Python data projects need row-level validation (Pydantic), efficient bulk operations (Polars), and typed database queries (SQLAlchemy). But maintaining multiple schemas across this stack can lead to duplication, drift, and manually juggling row-oriented and columnar paradigms.
Flycatcher solves this: One schema definition โ three optimized outputs.
from flycatcher import Schema, Integer, String, Float, col, model_validator
class ProductSchema(Schema):
id = Integer(primary_key=True)
name = String(min_length=3, max_length=100)
price = Float(gt=0)
discount_price = Float(gt=0, nullable=True)
@model_validator
def check_discount():
# Cross-field validation with DSL
return (
col('discount_price') < col('price'),
"Discount price must be less than regular price"
)
# Generate three optimized representations
ProductModel = ProductSchema.to_pydantic() # โ Pydantic BaseModel
ProductValidator = ProductSchema.to_polars_validator() # โ Polars DataFrame validator
ProductTable = ProductSchema.to_sqlalchemy() # โ SQLAlchemy Table
Flycatcher lets you stay DataFrame-native without giving up the speed of Polars, the ergonomic validation of Pydantic, or the Pythonic power of SQLAlchemy.
๐ Quick Start
Installation
pip install flycatcher
# or
uv add flycatcher
Define Your Schema
from flycatcher import Schema, Integer, String, Boolean, Datetime
class UserSchema(Schema):
id = Integer(primary_key=True)
username = String(min_length=3, max_length=50, unique=True)
email = String(pattern=r'^[^@]+@[^@]+\.[^@]+$', unique=True, index=True)
age = Integer(ge=13, le=120)
is_active = Boolean(default=True)
created_at = Datetime()
Use Pydantic for Row-Level Validation
Perfect for APIs, forms, and single-record validation:
from datetime import datetime
User = UserSchema.to_pydantic()
# Validates constraints automatically via Pydantic
user = User(
id=1,
username="alice",
email="alice@example.com",
age=25,
created_at=datetime.utcnow()
)
# Serialize to JSON/dict
print(user.model_dump_json())
Use Polars for Bulk Validation
Perfect for DataFrame-level validation:
import polars as pl
UserValidator = UserSchema.to_polars_validator()
# Validate 1M+ rows with blazing speed
df = pl.read_csv("users.csv")
validated_df = UserValidator.validate(df, strict=True)
validated_df.write_parquet("validated_users.parquet")
Use SQLAlchemy for Database Operations
Perfect for typed queries and database interactions:
from sqlalchemy import create_engine
UserTable = UserSchema.to_sqlalchemy(table_name="users")
engine = create_engine("postgresql://localhost/mydb")
# Type-safe queries
with engine.connect() as conn:
result = conn.execute(
UserTable.select()
.where(UserTable.c.is_active == True)
.where(UserTable.c.age >= 18)
)
for row in result:
print(row)
โจ Key Features
Rich Field Types & Constraints
| Field Type | Constraints | Example |
|---|---|---|
Integer() |
ge, gt, le, lt, multiple_of |
age = Integer(ge=0, le=120) |
Float() |
ge, gt, le, lt |
price = Float(gt=0) |
String() |
min_length, max_length, pattern |
email = String(pattern=r'^[^@]+@...') |
Boolean() |
- | is_active = Boolean(default=True) |
Datetime() |
- | created_at = Datetime() |
Date() |
- | birth_date = Date() |
All fields support (validation): nullable, default, description
SQLAlchemy-specific: primary_key, unique, index, autoincrement
Custom & Cross-Field Validation
Use the col() DSL for powerful field-level and cross-field validation that works across both Pydantic and Polars:
from flycatcher import Schema, Integer, Datetime, col, model_validator
class BookingSchema(Schema):
check_in = Datetime()
check_out = Datetime()
nights = Integer(ge=1)
@model_validator
def check_dates():
return (
col('check_out') > col('check_in'),
"Check-out must be after check-in"
)
@model_validator
def check_minimum_stay():
# For advanced operations like .dt.month, use explicit Polars format
import polars as pl
return {
'polars': (
(~pl.col('check_in').dt.month().is_in([7, 8])) | (pl.col('nights') >= 3),
"Minimum stay in July and August is 3 nights"
),
'pydantic': lambda v: (
v.check_in.month not in [7, 8] or v.nights >= 3,
"Minimum stay in July and August is 3 nights"
)
}
Validation Modes
Polars validation supports flexible error handling:
# Strict mode: Raise on validation errors (default)
validated_df = UserValidator.validate(df, strict=True)
# Non-strict mode: Filter out invalid rows
valid_df = UserValidator.validate(df, strict=False)
# Show violations for debugging
validated_df = UserValidator.validate(df, strict=True, show_violations=True)
๐ฏ Complete Example: ETL Pipeline
import polars as pl
from flycatcher import Schema, Integer, Float, String, Datetime, col, model_validator
from sqlalchemy import create_engine, MetaData
# 1. Define schema once
class OrderSchema(Schema):
order_id = Integer(primary_key=True)
customer_email = String(pattern=r'^[^@]+@[^@]+\.[^@]+$', index=True)
amount = Float(gt=0)
tax = Float(ge=0)
total = Float(gt=0)
created_at = Datetime()
@model_validator
def check_total():
return (
col('total') == col('amount') + col('tax'),
"Total must equal amount + tax"
)
# 2. Extract & Validate with Polars (handles millions of rows)
OrderValidator = OrderSchema.to_polars_validator()
df = pl.read_csv("orders.csv")
validated_df = OrderValidator.validate(df, strict=True)
# 3. Load to database with SQLAlchemy
OrderTable = OrderSchema.to_sqlalchemy(table_name="orders")
engine = create_engine("postgresql://localhost/analytics")
with engine.connect() as conn:
conn.execute(OrderTable.insert(), validated_df.to_dicts())
conn.commit()
โ Result: Validated millions of rows, enforced business rules, and loaded to database โ all from one schema definition.
๐๏ธ Design Philosophy
One schema, three representations. Each optimized for its use case.
Schema Definition
โ
โโโโโโโโโโโโผโโโโโโโโโโโ
โ โ โ
Pydantic Polars SQLAlchemy
โ โ โ
APIs ETL Database
What Flycatcher Does
โ
Single source of truth for schema definitions
โ
Generate optimized representations for different use cases
โ
Keep runtimes separate (no ORM โ DataFrame conversions)
โ
Use stable public APIs (Pydantic, Polars, SQLAlchemy)
What Flycatcher Doesn't Do
โ Mix row-oriented and columnar paradigms
โ Create a "unified runtime" (that would be slow)
โ Reinvent validation logic (delegates to proven libraries when possible)
โ Depend on internal APIs
โ ๏ธ Current Limitations (v0.1.0)
Flycatcher v0.1.0 is an alpha release. The core functionality works perfectly, but some advanced features are planned for future versions:
Polars DSL
The col() DSL supports basic operations (>, <, ==, +, -, *, /, &, |), but advanced Polars operations require explicit format:
- โ
.is_null(),.is_not_null()- Use explicit Polars:pl.col('field').is_null() - โ
.str.contains(),.str.startswith()- Use explicit Polars or field constraints - โ
.dt.month,.dt.year- Use explicit Polars format - โ
.is_in([...])- Use explicit Polars format
Workaround: Use the explicit format in @model_validator:
@model_validator
def check():
return {
'polars': (pl.col('field').is_null(), "Message"),
'pydantic': lambda v: (v.field is None, "Message")
}
Pydantic Features
- โ
@field_validator- Only@model_validatoris supported (coming in v0.2.0) - โ Field aliases and computed fields (coming in v0.2.0+)
- โ Custom serialization options (coming in v0.2.0+)
Workaround: Use @model_validator for all validation needs.
SQLAlchemy Features
- โ Foreign key relationships - Must be added manually after table generation (coming in v0.3.0+)
- โ Composite primary keys - Only single-field primary keys supported (coming in v0.3.0+)
- โ Function-based defaults (e.g.,
default=func.now()) - Only literal defaults supported
Workaround: Add relationships and composite keys manually in SQLAlchemy after table generation.
Field Types
- โ Enum, UUID, JSON, Array field types (coming in v0.3.0+)
- โ Numeric/Decimal field type (coming in v0.3.0+)
Workaround: Use String with pattern validation or manual handling.
๐ Comparison
| Feature | Flycatcher | SQLModel | Patito |
|---|---|---|---|
| Pydantic support | โ | โ | โ |
| Polars support | โ | โ | โ |
| SQLAlchemy support | โ | โ | โ |
| DataFrame-level DB ops | ๐ง (v0.2) | โ | โ |
| Cross-field validation | โ | โ ๏ธ (Pydantic only) | โ ๏ธ (Polars only) |
| Single schema definition | โ | โ ๏ธ (Pydantic + ORM hybrid) | โ ๏ธ (Pydantic + Polars hybrid) |
Flycatcher is the only library that generates optimized representations for all three systems while keeping them properly separated.
๐ Documentation
- Getting Started - Installation and basics
- Tutorials - Step-by-step guides
- How-To Guides - Solve specific problems
- API Reference - Complete API documentation
- Explanations - Deep dives and concepts
๐ฃ๏ธ Roadmap
v0.1.0 (Released) ๐
- Core schema definition with metaclass
- Field types with constraints (Integer, String, Float, Boolean, Datetime, Date)
- Pydantic model generator
- Polars DataFrame validator with bulk validation
- SQLAlchemy table generator
- Cross-field validators with DSL (
col()) - Test suite with 70%+ coverage
- Complete documentation site
- PyPI publication
v0.2.0 (In Progress) ๐ง
Theme: Enhanced validation and database operations
-
@field_validatorsupport in addition to existing@model_validator - Enhanced Polars DSL:
.is_null(),.is_not_null(),.str.contains(),.str.startswith(),.dt.month,.dt.year,.is_in([...]) - Pydantic enhancements: field aliases, computed fields, custom serialization
- Enable inheritance of
Schemato create subclasses with different fields - For more details, see the GitHub Milestone for v0.2.0
v0.3.0 (Planned)
- DataFrame-level queries (
Schema.query()) - Bulk write operations (
Schema.insert(),Schema.update(),Schema.upsert()) - Complete ETL loop staying columnar end-to-end
- Add PascalCase metaclass
- Additional Pydantic validation modes (
mode='before',mode='wrap') - For more details, see the GitHub Milestone for v0.3.0
v0.4.0+ (Future)
Theme: Advanced field types and relationships
- Additional field types: Enum, UUID, JSON, Array, Numeric/Decimal, Time, Binary, Interval
- SQLAlchemy relationships: Foreign keys, composite primary keys
- SQLAlchemy function-based defaults (e.g.,
default=func.now()) - JOIN support in queries
- Aggregations (GROUP BY, COUNT, SUM)
- Schema migrations helper
๐ค Contributing
Contributions are welcome! Please see our [Contributing Guide] for details.
๐ License
MIT License - see LICENSE for details.
๐ฌ Community
- GitHub Issues - Bug reports and feature requests
- GitHub Discussions - Questions and community discussion
- Documentation - Full guides and API reference
Built with โค๏ธ for the DataFrame generation
โญ Star us on GitHub ย |ย ๐ Read the docs ย |ย ๐ Report a bug
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file flycatcher-0.1.0.tar.gz.
File metadata
- Download URL: flycatcher-0.1.0.tar.gz
- Upload date:
- Size: 319.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
98b0fdb0be9ada6cc8956b65667db3f46cd78c8b1598e9dc879180bfd1028730
|
|
| MD5 |
94b4a1cc391bdd9cc85ffbacde24c253
|
|
| BLAKE2b-256 |
1068bfc6eacbb8bfc83ff92ad7dea0327863b7dc36a487888fe6b7fe07976f98
|
Provenance
The following attestation bundles were made for flycatcher-0.1.0.tar.gz:
Publisher:
publish.yml on mrmcmullan/flycatcher
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
flycatcher-0.1.0.tar.gz -
Subject digest:
98b0fdb0be9ada6cc8956b65667db3f46cd78c8b1598e9dc879180bfd1028730 - Sigstore transparency entry: 731173416
- Sigstore integration time:
-
Permalink:
mrmcmullan/flycatcher@33ea6ef833ba00f65b9ec49d08056b4ca563502c -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/mrmcmullan
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@33ea6ef833ba00f65b9ec49d08056b4ca563502c -
Trigger Event:
release
-
Statement type:
File details
Details for the file flycatcher-0.1.0-py3-none-any.whl.
File metadata
- Download URL: flycatcher-0.1.0-py3-none-any.whl
- Upload date:
- Size: 23.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0c93727734053fd38aa089eaf49c9c88eb7a88a4843132b353d6c81d6bb652c4
|
|
| MD5 |
2423ee88c6cacf7931315d9d63d9e28e
|
|
| BLAKE2b-256 |
61e839dbfe3d509ddd2b2f9ec42fea97e3b7e73db7db92eaef0ca52542ca1dbe
|
Provenance
The following attestation bundles were made for flycatcher-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on mrmcmullan/flycatcher
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
flycatcher-0.1.0-py3-none-any.whl -
Subject digest:
0c93727734053fd38aa089eaf49c9c88eb7a88a4843132b353d6c81d6bb652c4 - Sigstore transparency entry: 731173417
- Sigstore integration time:
-
Permalink:
mrmcmullan/flycatcher@33ea6ef833ba00f65b9ec49d08056b4ca563502c -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/mrmcmullan
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@33ea6ef833ba00f65b9ec49d08056b4ca563502c -
Trigger Event:
release
-
Statement type: