Skip to main content

Define your data schema once. Validate at scale. Stay columnar.

Project description

Flycatcher Logo

Define your schema once. Validate at scale. Stay columnar.

Built for DataFrames, powered across Pydantic, Polars, and SQLAlchemy.

CI codecov PyPI version Python 3.12+ License: MIT Documentation


Flycatcher is a DataFrame-native schema layer for Python. Define your data model once and generate optimized representations for every part of your stack:

  • ๐ŸŽฏ Pydantic models for API validation & serialization
  • โšก Polars validators for blazing-fast bulk validation
  • ๐Ÿ—„๏ธ SQLAlchemy tables for typed database access

Built for modern data workflows: Validate millions of rows at high speed, keep schema drift at zero, and stay columnar end-to-end.

โ“ Why Flycatcher?

Modern Python data projects need row-level validation (Pydantic), efficient bulk operations (Polars), and typed database queries (SQLAlchemy). But maintaining multiple schemas across this stack can lead to duplication, drift, and manually juggling row-oriented and columnar paradigms.

Flycatcher solves this: One schema definition โ†’ three optimized outputs.

from flycatcher import Schema, Integer, String, Float, col, model_validator

class ProductSchema(Schema):
    id = Integer(primary_key=True)
    name = String(min_length=3, max_length=100)
    price = Float(gt=0)
    discount_price = Float(gt=0, nullable=True)

    @model_validator
    def check_discount():
        # Cross-field validation with DSL
        return (
            col('discount_price') < col('price'),
            "Discount price must be less than regular price"
        )

# Generate three optimized representations
ProductModel = ProductSchema.to_pydantic()         # โ†’ Pydantic BaseModel
ProductValidator = ProductSchema.to_polars_validator() # โ†’ Polars DataFrame validator
ProductTable = ProductSchema.to_sqlalchemy()       # โ†’ SQLAlchemy Table

Flycatcher lets you stay DataFrame-native without giving up the speed of Polars, the ergonomic validation of Pydantic, or the Pythonic power of SQLAlchemy.


๐Ÿš€ Quick Start

Installation

pip install flycatcher
# or
uv add flycatcher

Define Your Schema

from flycatcher import Schema, Integer, String, Boolean, Datetime

class UserSchema(Schema):
    id = Integer(primary_key=True)
    username = String(min_length=3, max_length=50, unique=True)
    email = String(pattern=r'^[^@]+@[^@]+\.[^@]+$', unique=True, index=True)
    age = Integer(ge=13, le=120)
    is_active = Boolean(default=True)
    created_at = Datetime()

Use Pydantic for Row-Level Validation

Perfect for APIs, forms, and single-record validation:

from datetime import datetime

User = UserSchema.to_pydantic()

# Validates constraints automatically via Pydantic
user = User(
    id=1,
    username="alice",
    email="alice@example.com",
    age=25,
    created_at=datetime.utcnow()
)

# Serialize to JSON/dict
print(user.model_dump_json())

Use Polars for Bulk Validation

Perfect for DataFrame-level validation:

import polars as pl

UserValidator = UserSchema.to_polars_validator()

# Validate 1M+ rows with blazing speed
df = pl.read_csv("users.csv")
validated_df = UserValidator.validate(df, strict=True)

validated_df.write_parquet("validated_users.parquet")

Use SQLAlchemy for Database Operations

Perfect for typed queries and database interactions:

from sqlalchemy import create_engine

UserTable = UserSchema.to_sqlalchemy(table_name="users")

engine = create_engine("postgresql://localhost/mydb")

# Type-safe queries
with engine.connect() as conn:
    result = conn.execute(
        UserTable.select()
        .where(UserTable.c.is_active == True)
        .where(UserTable.c.age >= 18)
    )
    for row in result:
        print(row)

โœจ Key Features

Rich Field Types & Constraints

Field Type Constraints Example
Integer() ge, gt, le, lt, multiple_of age = Integer(ge=0, le=120)
Float() ge, gt, le, lt price = Float(gt=0)
String() min_length, max_length, pattern email = String(pattern=r'^[^@]+@...')
Boolean() - is_active = Boolean(default=True)
Datetime() - created_at = Datetime()
Date() - birth_date = Date()

All fields support (validation): nullable, default, description

SQLAlchemy-specific: primary_key, unique, index, autoincrement

Custom & Cross-Field Validation

Use the col() DSL for powerful field-level and cross-field validation that works across both Pydantic and Polars:

from flycatcher import Schema, Integer, Datetime, col, model_validator

class BookingSchema(Schema):
    check_in = Datetime()
    check_out = Datetime()
    nights = Integer(ge=1)

    @model_validator
    def check_dates():
        return (
            col('check_out') > col('check_in'),
            "Check-out must be after check-in"
        )

    @model_validator
    def check_minimum_stay():
        # For advanced operations like .dt.month, use explicit Polars format
        import polars as pl
        return {
            'polars': (
                (~pl.col('check_in').dt.month().is_in([7, 8])) | (pl.col('nights') >= 3),
                "Minimum stay in July and August is 3 nights"
            ),
            'pydantic': lambda v: (
                v.check_in.month not in [7, 8] or v.nights >= 3,
                "Minimum stay in July and August is 3 nights"
            )
        }

Validation Modes

Polars validation supports flexible error handling:

# Strict mode: Raise on validation errors (default)
validated_df = UserValidator.validate(df, strict=True)

# Non-strict mode: Filter out invalid rows
valid_df = UserValidator.validate(df, strict=False)

# Show violations for debugging
validated_df = UserValidator.validate(df, strict=True, show_violations=True)

๐ŸŽฏ Complete Example: ETL Pipeline

import polars as pl
from flycatcher import Schema, Integer, Float, String, Datetime, col, model_validator
from sqlalchemy import create_engine, MetaData

# 1. Define schema once
class OrderSchema(Schema):
    order_id = Integer(primary_key=True)
    customer_email = String(pattern=r'^[^@]+@[^@]+\.[^@]+$', index=True)
    amount = Float(gt=0)
    tax = Float(ge=0)
    total = Float(gt=0)
    created_at = Datetime()

    @model_validator
    def check_total():
        return (
            col('total') == col('amount') + col('tax'),
            "Total must equal amount + tax"
        )

# 2. Extract & Validate with Polars (handles millions of rows)
OrderValidator = OrderSchema.to_polars_validator()
df = pl.read_csv("orders.csv")
validated_df = OrderValidator.validate(df, strict=True)

# 3. Load to database with SQLAlchemy
OrderTable = OrderSchema.to_sqlalchemy(table_name="orders")
engine = create_engine("postgresql://localhost/analytics")

with engine.connect() as conn:
    conn.execute(OrderTable.insert(), validated_df.to_dicts())
    conn.commit()

โœ… Result: Validated millions of rows, enforced business rules, and loaded to database โ€” all from one schema definition.


๐Ÿ—๏ธ Design Philosophy

One schema, three representations. Each optimized for its use case.

        Schema Definition
               โ†“
    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
    โ†“          โ†“          โ†“
Pydantic    Polars    SQLAlchemy
   โ†“          โ†“          โ†“
 APIs       ETL      Database

What Flycatcher Does

โœ… Single source of truth for schema definitions
โœ… Generate optimized representations for different use cases
โœ… Keep runtimes separate (no ORM โ†” DataFrame conversions)
โœ… Use stable public APIs (Pydantic, Polars, SQLAlchemy)

What Flycatcher Doesn't Do

โŒ Mix row-oriented and columnar paradigms
โŒ Create a "unified runtime" (that would be slow)
โŒ Reinvent validation logic (delegates to proven libraries when possible)
โŒ Depend on internal APIs


โš ๏ธ Current Limitations (v0.1.0)

Flycatcher v0.1.0 is an alpha release. The core functionality works perfectly, but some advanced features are planned for future versions:

Polars DSL

The col() DSL supports basic operations (>, <, ==, +, -, *, /, &, |), but advanced Polars operations require explicit format:

  • โŒ .is_null(), .is_not_null() - Use explicit Polars: pl.col('field').is_null()
  • โŒ .str.contains(), .str.startswith() - Use explicit Polars or field constraints
  • โŒ .dt.month, .dt.year - Use explicit Polars format
  • โŒ .is_in([...]) - Use explicit Polars format

Workaround: Use the explicit format in @model_validator:

@model_validator
def check():
    return {
        'polars': (pl.col('field').is_null(), "Message"),
        'pydantic': lambda v: (v.field is None, "Message")
    }

Pydantic Features

  • โŒ @field_validator - Only @model_validator is supported (coming in v0.2.0)
  • โŒ Field aliases and computed fields (coming in v0.2.0+)
  • โŒ Custom serialization options (coming in v0.2.0+)

Workaround: Use @model_validator for all validation needs.

SQLAlchemy Features

  • โŒ Foreign key relationships - Must be added manually after table generation (coming in v0.3.0+)
  • โŒ Composite primary keys - Only single-field primary keys supported (coming in v0.3.0+)
  • โŒ Function-based defaults (e.g., default=func.now()) - Only literal defaults supported

Workaround: Add relationships and composite keys manually in SQLAlchemy after table generation.

Field Types

  • โŒ Enum, UUID, JSON, Array field types (coming in v0.3.0+)
  • โŒ Numeric/Decimal field type (coming in v0.3.0+)

Workaround: Use String with pattern validation or manual handling.


๐Ÿ“Š Comparison

Feature Flycatcher SQLModel Patito
Pydantic support โœ… โœ… โœ…
Polars support โœ… โŒ โœ…
SQLAlchemy support โœ… โœ… โŒ
DataFrame-level DB ops ๐Ÿšง (v0.2) โŒ โŒ
Cross-field validation โœ… โš ๏ธ (Pydantic only) โš ๏ธ (Polars only)
Single schema definition โœ… โš ๏ธ (Pydantic + ORM hybrid) โš ๏ธ (Pydantic + Polars hybrid)

Flycatcher is the only library that generates optimized representations for all three systems while keeping them properly separated.


๐Ÿ“š Documentation


๐Ÿ›ฃ๏ธ Roadmap

v0.1.0 (Released) ๐Ÿš€

  • Core schema definition with metaclass
  • Field types with constraints (Integer, String, Float, Boolean, Datetime, Date)
  • Pydantic model generator
  • Polars DataFrame validator with bulk validation
  • SQLAlchemy table generator
  • Cross-field validators with DSL (col())
  • Test suite with 70%+ coverage
  • Complete documentation site
  • PyPI publication

v0.2.0 (In Progress) ๐Ÿšง

Theme: Enhanced validation and database operations

  • @field_validator support in addition to existing @model_validator
  • Enhanced Polars DSL: .is_null(), .is_not_null(), .str.contains(), .str.startswith(), .dt.month, .dt.year, .is_in([...])
  • Pydantic enhancements: field aliases, computed fields, custom serialization
  • Enable inheritance of Schema to create subclasses with different fields
  • For more details, see the GitHub Milestone for v0.2.0

v0.3.0 (Planned)

  • DataFrame-level queries (Schema.query())
  • Bulk write operations (Schema.insert(), Schema.update(), Schema.upsert())
  • Complete ETL loop staying columnar end-to-end
  • Add PascalCase metaclass
  • Additional Pydantic validation modes (mode='before', mode='wrap')
  • For more details, see the GitHub Milestone for v0.3.0

v0.4.0+ (Future)

Theme: Advanced field types and relationships

  • Additional field types: Enum, UUID, JSON, Array, Numeric/Decimal, Time, Binary, Interval
  • SQLAlchemy relationships: Foreign keys, composite primary keys
  • SQLAlchemy function-based defaults (e.g., default=func.now())
  • JOIN support in queries
  • Aggregations (GROUP BY, COUNT, SUM)
  • Schema migrations helper

๐Ÿค Contributing

Contributions are welcome! Please see our [Contributing Guide] for details.


๐Ÿ“„ License

MIT License - see LICENSE for details.


๐Ÿ’ฌ Community


Built with โค๏ธ for the DataFrame generation

โญ Star us on GitHub ย |ย  ๐Ÿ“– Read the docs ย |ย  ๐Ÿ› Report a bug

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

flycatcher-0.1.0.tar.gz (319.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

flycatcher-0.1.0-py3-none-any.whl (23.3 kB view details)

Uploaded Python 3

File details

Details for the file flycatcher-0.1.0.tar.gz.

File metadata

  • Download URL: flycatcher-0.1.0.tar.gz
  • Upload date:
  • Size: 319.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for flycatcher-0.1.0.tar.gz
Algorithm Hash digest
SHA256 98b0fdb0be9ada6cc8956b65667db3f46cd78c8b1598e9dc879180bfd1028730
MD5 94b4a1cc391bdd9cc85ffbacde24c253
BLAKE2b-256 1068bfc6eacbb8bfc83ff92ad7dea0327863b7dc36a487888fe6b7fe07976f98

See more details on using hashes here.

Provenance

The following attestation bundles were made for flycatcher-0.1.0.tar.gz:

Publisher: publish.yml on mrmcmullan/flycatcher

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file flycatcher-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: flycatcher-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 23.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for flycatcher-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0c93727734053fd38aa089eaf49c9c88eb7a88a4843132b353d6c81d6bb652c4
MD5 2423ee88c6cacf7931315d9d63d9e28e
BLAKE2b-256 61e839dbfe3d509ddd2b2f9ec42fea97e3b7e73db7db92eaef0ca52542ca1dbe

See more details on using hashes here.

Provenance

The following attestation bundles were made for flycatcher-0.1.0-py3-none-any.whl:

Publisher: publish.yml on mrmcmullan/flycatcher

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page