Skip to main content

Convert Polars DataFrames to lists of Pydantic models with schema inference

Project description

❄️ Articuno ❄️

Convert Polars DataFrames to Pydantic models — and optionally generate clean Python code from them.

A blazing-fast tool for schema inference, data validation, and model generation powered by Polars and Pydantic.


📋 Table of Contents


🚀 Features

  • 🔍 Infer Pydantic models directly from polars.DataFrame schemas
  • 🧪 Validate data by converting DataFrame rows to Pydantic instances
  • 🧱 Supports nested Structs, Lists, Nullable fields, and advanced types
  • 🧬 Generate Python model code from dynamic models using datamodel-code-generator
  • 🦜 Optional Patito integration for declarative, constraint-rich models and advanced validation
  • 🎨 Generate Patito model code alongside Pydantic for flexible schema workflows

📦 Installation

pip install articuno

🛠 Usage

1. Convert a DataFrame to Pydantic Models

import polars as pl
from articuno import df_to_pydantic

df = pl.DataFrame({
    "name": ["Alice", "Bob"],
    "age": [30, 25],
    "is_active": [True, False],
})

models = df_to_pydantic(df)

print(models[0])
print(models[0].dict())

Output:

name='Alice' age=30 is_active=True
{'name': 'Alice', 'age': 30, 'is_active': True}

2. Infer a Model Only

from articuno import infer_pydantic_model

model = infer_pydantic_model(df, model_name="UserModel")
print(model.schema_json(indent=2))

Output (snippet):

{
  "title": "UserModel",
  "type": "object",
  "properties": {
    "name": { "title": "Name", "type": "string" },
    "age": { "title": "Age", "type": "integer" },
    "is_active": { "title": "Is Active", "type": "boolean" }
  },
  "required": ["name", "age", "is_active"]
}

3. Generate Python Source Code from a Model

from articuno import generate_pydantic_class_code

code = generate_pydantic_class_code(model, model_name="UserModel")
print(code)

Output:

from pydantic import BaseModel

class UserModel(BaseModel):
    name: str
    age: int
    is_active: bool

Or write it to a file:

generate_pydantic_class_code(model, output_path="user_model.py")

🧬 Example: Nested Structs

nested_df = pl.DataFrame({
    "user": pl.Series([
        {"name": "Alice", "age": 30},
        {"name": "Bob", "age": 25},
    ], dtype=pl.Struct([
        ("name", pl.Utf8),
        ("age", pl.Int64),
    ]))
})

models = df_to_pydantic(nested_df)
print(models[0])
print(models[0].user)
print(models[0].user.name)

Output:

AutoModel(user=AutoModel_user_Struct(name='Alice', age=30))
AutoModel_user_Struct(name='Alice', age=30)
Alice

🦜 Patito Integration (Optional)

Articuno can optionally generate and validate models using Patito, a declarative schema validation library with advanced constraints.

How it works:

  • Use infer_patito_model alongside infer_pydantic_model to create Patito models from Polars DataFrames.
  • Generate Patito model source code with generate_patito_class_code.
  • Validate and enforce schemas with Patito's constraint system for tighter data rules.

Example:

from articuno import infer_patito_model

patito_model = infer_patito_model(df, model_name="UserPatitoModel")
print(patito_model.schema_json(indent=2))

Patito integration is optional and requires installing Patito:

pip install patito

⏰ When to Use Articuno

  • ✅ You use Polars and want type-safe modeling
  • ✅ You dynamically load or transform tabular data
  • ✅ You want to generate sharable Python classes
  • ✅ You want to validate Polars DataFrames using Pydantic rules
  • ✅ You want optional advanced validation with Patito

⚙️ Supported Type Mappings

Polars Type Pydantic Type
pl.Int*, pl.UInt* int
pl.Float* float
pl.Utf8 str
pl.Boolean bool
pl.Date datetime.date
pl.Datetime datetime.datetime
pl.Duration datetime.timedelta
pl.List List[...]
pl.Struct Nested Pydantic model
pl.Null Optional[...]

🧩 Integration Ideas

  • 🔐 Use for FastAPI or Litestar API schemas
  • 🧼 Use in ETL pipelines to enforce schema contracts
  • 📄 Use to generate Pydantic models from data exports
  • 🔀 Use with polars.read_json / read_parquet to auto-model nested data
  • 🦜 Use Patito models for advanced schema validation where needed

🧪 Development & Testing

git clone https://github.com/your-username/articuno
cd articuno
pip install -e ".[dev]"
pytest

🧙‍♂️ FastAPI Integration (Decorator + CLI Bootstrap)

Articuno makes it easy to generate response_models for your FastAPI endpoints that return polars.DataFrames — no need to manually define Pydantic models.

🧩 Step 1: Add the Decorator

Use the @infer_response_model decorator on your FastAPI endpoint. Provide:

  • a name for the generated Pydantic model,
  • an example input dict to simulate a call to your endpoint,
  • an optional path to your models.py file (defaults to models.py next to the FastAPI app file).
from fastapi import FastAPI
from articuno.decorator import infer_response_model
import polars as pl

app = FastAPI()

@infer_response_model(
    name="UserModel",
    example_input={"limit": 2},
    models_path="models.py"  # Optional, relative to this file by default
)
@app.get("/users")
def get_users(limit: int):
    return pl.DataFrame({
        "name": ["Alice", "Bob"],
        "age": [30, 25],
    }).head(limit)

The decorator registers the endpoint for the CLI to analyze later without changing runtime behavior.

⚙️ Step 2: Run the CLI Bootstrap

After writing or modifying your endpoints, run:

articuno bootstrap app/main.py

This will:

  1. Import and call all decorated endpoints with the example input
  2. Infer a Pydantic model from the returned DataFrame
  3. Write the model to the specified models.py file
  4. Update your FastAPI app to use the generated response models

🛠 CLI Options

Usage: cli.py bootstrap [OPTIONS] APP_PATH

Arguments:
  APP_PATH                Path to your FastAPI app file (e.g., app/main.py)

Options:
  --models-path PATH      Optional output path for models.py (defaults to same folder as app)
  --dry-run               Preview changes without writing files
  --help                  Show this message and exit

📜 Patito vs Articuno

Feature Patito Articuno
Polars–Pydantic bridge ✅ Declarative schema ✅ Dynamic inference
Validation constraints ✅ Unique, bounds ⚠️ Basic types, nullables
Nested Structs ❌ Not supported ✅ Fully recursive
Code generation ✅ via datamodel-code-gen
Example/mock data .examples

Patito is ideal for static schema validation with custom constraints and ETL pipelines.

Articuno excels at dynamic schema inference, nested model generation, and code export for API use cases.


License

MIT © 2025 Odos Matthews

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

articuno-0.4.5.tar.gz (13.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

articuno-0.4.5-py3-none-any.whl (11.2 kB view details)

Uploaded Python 3

File details

Details for the file articuno-0.4.5.tar.gz.

File metadata

  • Download URL: articuno-0.4.5.tar.gz
  • Upload date:
  • Size: 13.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.6

File hashes

Hashes for articuno-0.4.5.tar.gz
Algorithm Hash digest
SHA256 aaa93a0109a6793ea5a49414da9bbfff365f46ed3d191c59afe06371032bdc1d
MD5 b7eabd9a36ea896c3e0e95e5dd4cc98c
BLAKE2b-256 e5fd7df3c3e38cda02d58a339eb733db21383be4bb156db285f76df291089f55

See more details on using hashes here.

File details

Details for the file articuno-0.4.5-py3-none-any.whl.

File metadata

  • Download URL: articuno-0.4.5-py3-none-any.whl
  • Upload date:
  • Size: 11.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.6

File hashes

Hashes for articuno-0.4.5-py3-none-any.whl
Algorithm Hash digest
SHA256 1f6a0b5993d9002a0f0b043a5333c7d16cf6a3b4620ae6d798f791e9f9a29670
MD5 b859207a3e1817ff5b46133d261cbd26
BLAKE2b-256 ed4b9f27433eb7e6178d1c05cfd01fdb3aa4ccf4725b08293be7670bd3135af2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page