Convert Polars DataFrames to lists of Pydantic models with schema inference
Project description
❄️ Articuno ❄️
Convert Polars DataFrames to Pydantic models — and optionally generate clean Python code from them.
A blazing-fast tool for schema inference, data validation, and model generation powered by Polars and Pydantic.
📋 Table of Contents
- 🚀 Features
- 📦 Installation
- 🛠 Usage
- 🧬 Example: Nested Structs
- 🦜 Patito Integration (Optional)
- ⏰ When to Use Articuno
- ⚙️ Supported Type Mappings
- 🧩 Integration Ideas
- 🧪 Development & Testing
- 🧙♂️ FastAPI Integration (Decorator + CLI Bootstrap)
- 🛠 CLI Options
- 📜 Patito vs Articuno
- License
🚀 Features
- 🔍 Infer Pydantic models directly from
polars.DataFrameschemas - 🧪 Validate data by converting DataFrame rows to Pydantic instances
- 🧱 Supports nested Structs, Lists, Nullable fields, and advanced types
- 🧬 Generate Python model code from dynamic models using datamodel-code-generator
- 🦜 Optional Patito integration for declarative, constraint-rich models and advanced validation
- 🎨 Generate Patito model code alongside Pydantic for flexible schema workflows
📦 Installation
pip install articuno
Optional extras:
pip install datamodel-code-generator # for code generation
pip install patito # for Patito support
🛠 Usage
1. Convert a DataFrame to Pydantic Models
import polars as pl
from articuno import df_to_pydantic
df = pl.DataFrame({
"name": ["Alice", "Bob"],
"age": [30, 25],
"is_active": [True, False],
})
models = df_to_pydantic(df)
print(models[0])
print(models[0].dict())
Output:
name='Alice' age=30 is_active=True
{'name': 'Alice', 'age': 30, 'is_active': True}
2. Infer a Model Only
from articuno import infer_pydantic_model
model = infer_pydantic_model(df, model_name="UserModel")
print(model.model_json_schema(indent=2))
3. Generate Python Source Code from a Model
from articuno import generate_class_code
code = generate_class_code(model, model_name="UserModel")
print(code)
Output:
from pydantic import BaseModel
class UserModel(BaseModel):
name: str
age: int
is_active: bool
Or write it to a file:
generate_class_code(model, output_path="user_model.py")
🧬 Example: Nested Structs
nested_df = pl.DataFrame({
"user": pl.Series([
{"name": "Alice", "age": 30},
{"name": "Bob", "age": 25},
], dtype=pl.Struct([
("name", pl.Utf8),
("age", pl.Int64),
]))
})
models = df_to_pydantic(nested_df)
print(models[0])
print(models[0].user)
print(models[0].user.name)
Output:
AutoModel(user=AutoModel_0_Struct(name='Alice', age=30))
AutoModel_0_Struct(name='Alice', age=30)
Alice
🦜 Patito Integration (Optional)
Articuno can optionally generate and validate models using Patito, a declarative schema validation library with advanced constraints.
How it works:
- Use
df_to_patito(df)to convert rows into Patito instances - Use
infer_patito_model(df)to generate a reusable Patito model class
Example:
from articuno import infer_patito_model
patito_model = infer_patito_model(df, model_name="UserPatitoModel")
print(patito_model.schema_json(indent=2))
Patito integration is optional and requires installing Patito:
pip install patito
⏰ When to Use Articuno
- ✅ You use Polars and want type-safe modeling
- ✅ You dynamically load or transform tabular data
- ✅ You want to generate sharable Python classes
- ✅ You want to validate Polars DataFrames using Pydantic rules
- ✅ You want optional advanced validation with Patito
⚙️ Supported Type Mappings
| Polars Type | Pydantic Type |
|---|---|
pl.Int*, pl.UInt* |
int |
pl.Float* |
float |
pl.Utf8 |
str |
pl.Boolean |
bool |
pl.Date |
datetime.date |
pl.Datetime |
datetime.datetime |
pl.Duration |
datetime.timedelta |
pl.List |
List[...] |
pl.Struct |
Nested Pydantic model |
pl.Null |
Optional[...] |
🧩 Integration Ideas
- 🔐 Use for FastAPI or Litestar API schemas
- 🧼 Use in ETL pipelines to enforce schema contracts
- 📄 Use to generate Pydantic models from data exports
- 🔀 Use with
polars.read_json/read_parquetto auto-model nested data - 🦜 Use Patito models for advanced schema validation where needed
🧪 Development & Testing
git clone https://github.com/your-username/articuno
cd articuno
pip install -e ".[dev]"
pytest
🧙♂️ FastAPI Integration (Decorator + CLI Bootstrap)
Articuno makes it easy to generate response_models for your FastAPI endpoints that return polars.DataFrames — no need to manually define Pydantic models.
🧩 Step 1: Add the Decorator
Use the @infer_response_model decorator on your FastAPI endpoint. Provide:
- a name for the generated Pydantic model,
- an example input dict to simulate a call to your endpoint,
- an optional path to your models.py file (defaults to
models.pynext to the FastAPI app file).
from fastapi import FastAPI
from articuno.decorator import infer_response_model
import polars as pl
app = FastAPI()
@infer_response_model(
name="UserModel",
example_input={"limit": 2},
models_path="models.py" # Optional, relative to this file by default
)
@app.get("/users")
def get_users(limit: int):
return pl.DataFrame({
"name": ["Alice", "Bob"],
"age": [30, 25],
}).head(limit)
The decorator registers the endpoint for the CLI to analyze later without changing runtime behavior.
⚙️ Step 2: Run the CLI Bootstrap
After writing or modifying your endpoints, run:
articuno bootstrap app/main.py
This will:
- Import and call all decorated endpoints with the example input
- Infer a Pydantic model from the returned DataFrame
- Write the model to the specified models.py file
- Update your FastAPI app to use the generated response models
🛠 CLI Options
Usage: articuno bootstrap [OPTIONS] APP_PATH
Arguments:
APP_PATH Path to your FastAPI app file (e.g., app/main.py)
Options:
--models-path PATH Optional output path for models.py (defaults to same folder as app)
--dry-run Preview changes without writing files
--help Show this message and exit
📜 Patito vs Articuno
| Feature | Patito | Articuno |
|---|---|---|
| Polars–Pydantic bridge | ✅ Declarative schema | ✅ Dynamic inference |
| Validation constraints | ✅ Unique, bounds | ⚠️ Basic types, nullables |
| Nested Structs | ❌ Not supported | ✅ Fully recursive |
| Code generation | ❌ | ✅ via datamodel-code-gen |
| Example/mock data | ✅ .examples |
❌ |
Patito is ideal for static schema validation with custom constraints and ETL pipelines.
Articuno excels at dynamic schema inference, nested model generation, and code export for API use cases.
License
MIT © 2025 Odos Matthews
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file articuno-0.4.9.tar.gz.
File metadata
- Download URL: articuno-0.4.9.tar.gz
- Upload date:
- Size: 13.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0483268b8b729826714ebea9e53ff6eda95eca156e1bef27b88f37bbc502c432
|
|
| MD5 |
5332901c484a2fd02dbb4e82b830e6ae
|
|
| BLAKE2b-256 |
aa1089a2ce9090637ea30c2cddf5cb38f8471734fc12fb96b3e3c21c7552eaa0
|
File details
Details for the file articuno-0.4.9-py3-none-any.whl.
File metadata
- Download URL: articuno-0.4.9-py3-none-any.whl
- Upload date:
- Size: 10.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9af9389b5662f01158f919bc31f5ba45711863e5429c75af15248820d9e91d44
|
|
| MD5 |
f0756c7f1cde2ba7d1769d795eba4308
|
|
| BLAKE2b-256 |
fb4a403675a4d90ea1eb36270fbf700ad85dff68e961250b34cde040e82fb66d
|