Skip to main content

Convert Polars or Pandas DataFrames to lists of Pydantic models with schema inference

Project description

❄️ Articuno ❄️

Convert Polars or Pandas DataFrames to Pydantic models with schema inference — and generate clean Python class code.


✨ Features

  • Infer Pydantic models dynamically from Polars or Pandas DataFrames
  • Infer Pydantic models and instances directly from iterables of dictionaries
  • Supports nested structs, optional fields, and common data types
  • Supports PyArrow-backed Pandas columns (e.g., int64[pyarrow], string[pyarrow])
  • Optional force_optional flag to make all fields optional regardless of data
  • Configurable max_scan parameter to limit schema inference to the first N records of an iterable
  • Generate clean Python model code using datamodel-code-generator
  • Lightweight, dependency-flexible design

📦 Installation

Install the core package:

pip install articuno

Add optional dependencies as needed:

  • Polars support:
    pip install articuno[polars]
    
  • Pandas support (with optional PyArrow support):
    pip install articuno[pandas]
    
  • Full install:
    pip install articuno[polars,pandas]
    

🚀 Usage

🔍 DataFrame-based Inference

Infer models from Polars or Pandas DataFrames:

from articuno import df_to_pydantic
import polars as pl

df = pl.DataFrame({
    "id": [1, 2, 3],
    "name": ["Alice", "Bob", "Charlie"],
    "score": [95.5, 88.0, 92.3]
})

instances = df_to_pydantic(df, model_name="UserModel")
print(instances[0])  # id=1 name='Alice' score=95.5

Or just get the model class:

from articuno import infer_pydantic_model
Model = infer_pydantic_model(df, model_name="UserModel")
print(Model.schema_json(indent=2))

🧰 Iterable-of-Dicts Inference

Infer schemas and instantiate models directly from iterables of dict (e.g., SQL query results, JSON records):

from articuno import (
    df_to_pydantic,
    infer_pydantic_model,
    dicts_to_pydantic,
    infer_generic_model,
)

# Sample records
dicts = [
    {"id": 1, "value": "foo"},
    {"id": 2, "value": "bar"},
    # ...
]

# Convert to Pydantic instances (scans first 1000 by default)
instances = df_to_pydantic(dicts)
for obj in instances:
    print(obj)

# Get model class only with custom name/scan limit
ModelClass = infer_pydantic_model(
    dicts,
    model_name="RecModel",
    max_scan=500
)
print(ModelClass.schema_json(indent=2))

# Lazy generator of instances
for obj in dicts_to_pydantic(dicts, max_scan=200):
    print(obj)

# Generic model inference
GenericModel = infer_generic_model(dicts, model_name="GenModel")

🌟 PyArrow-backed Pandas Columns

import pandas as pd
from articuno import infer_pydantic_model

df = pd.DataFrame({
    "id": pd.Series([1,2,3], dtype="int64[pyarrow]"),
    "name": pd.Series(["A","B","C"], dtype="string[pyarrow]")
})
Model = infer_pydantic_model(df, model_name="ArrowUser")
print(Model.schema_json(indent=2))

🔥 Force Optional Fields

from articuno import infer_pydantic_model, df_to_pydantic

Model = infer_pydantic_model(df, force_optional=True)
models = df_to_pydantic(df, force_optional=True)

🧾 Generate Code

from articuno.codegen import generate_class_code
code = generate_class_code(Model)
print(code)

⚙️ Supported Type Mappings

Polars Type Pandas Type (incl. PyArrow) Pydantic Type
pl.Int*, pl.UInt* int64, Int64, int64[pyarrow] int
pl.Float* float64, float64[pyarrow] float
pl.Utf8 object, string[pyarrow] str
pl.Boolean bool, bool[pyarrow] bool
pl.Date datetime64[ns] datetime.date
pl.Datetime datetime64[ns] datetime.datetime
pl.Duration timedelta64[ns] datetime.timedelta
pl.List list List[...]
pl.Struct dict Nested model
pl.Null None, NaN Optional[...]

🛠️ Development

pip install articuno[dev]
pytest

🔗 Links


📄 License

MIT © Odos Matthews

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

articuno-0.8.0-py3-none-any.whl (11.2 kB view details)

Uploaded Python 3

File details

Details for the file articuno-0.8.0-py3-none-any.whl.

File metadata

  • Download URL: articuno-0.8.0-py3-none-any.whl
  • Upload date:
  • Size: 11.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.6

File hashes

Hashes for articuno-0.8.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a88dca800722e9857a2ce73321bf1f940079c8bfb7d964345ce84c7de3b34c11
MD5 40d54bb8cf453002e99dc02adcd4fb1f
BLAKE2b-256 22e713f31c0b58d240cd18b59005f086316fb612d2aafc7560e4ea03119cfb75

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page