Skip to main content

Convert Polars or Pandas DataFrames to lists of Pydantic models with schema inference

Project description

❄️ Articuno ❄️

Convert Polars or Pandas DataFrames to Pydantic models with schema inference — and generate clean Python class code.


✨ Features

  • Infer Pydantic models dynamically from Polars or Pandas DataFrames
  • Infer Pydantic models and instances directly from iterables of dictionaries
  • Supports nested structs, optional fields, and common data types
  • Supports PyArrow-backed Pandas columns (e.g., int64[pyarrow], string[pyarrow])
  • Optional force_optional flag to make all fields optional regardless of data
  • Configurable max_scan parameter to limit schema inference to the first N records of an iterable
  • Generate clean Python model code using datamodel-code-generator
  • Lightweight, dependency-flexible design

📦 Installation

Install the core package:

pip install articuno

Add optional dependencies as needed:

  • Polars support:
    pip install articuno[polars]
    
  • Pandas support (with optional PyArrow support):
    pip install articuno[pandas]
    
  • Full install:
    pip install articuno[polars,pandas]
    

🚀 Usage

🔍 DataFrame-based Inference

Infer models from Polars or Pandas DataFrames:

from articuno import df_to_pydantic
import polars as pl

df = pl.DataFrame({
    "id": [1, 2, 3],
    "name": ["Alice", "Bob", "Charlie"],
    "score": [95.5, 88.0, 92.3]
})

instances = df_to_pydantic(df, model_name="UserModel")
print(instances[0])  # id=1 name='Alice' score=95.5

Or just get the model class:

from articuno import infer_pydantic_model
Model = infer_pydantic_model(df, model_name="UserModel")
print(Model.schema_json(indent=2))

🧰 Iterable-of-Dicts Inference

Infer schemas and instantiate models directly from iterables of dict (e.g., SQL query results, JSON records):

from articuno import (
    df_to_pydantic,
    infer_pydantic_model,
    dicts_to_pydantic,
    infer_generic_model,
)

# Sample records
dicts = [
    {"id": 1, "value": "foo"},
    {"id": 2, "value": "bar"},
    # ...
]

# Convert to Pydantic instances (scans first 1000 by default)
instances = df_to_pydantic(dicts)
for obj in instances:
    print(obj)

# Get model class only with custom name/scan limit
ModelClass = infer_pydantic_model(
    dicts,
    model_name="RecModel",
    max_scan=500
)
print(ModelClass.schema_json(indent=2))

# Lazy generator of instances
for obj in dicts_to_pydantic(dicts, max_scan=200):
    print(obj)

# Generic model inference
GenericModel = infer_generic_model(dicts, model_name="GenModel")

🌟 PyArrow-backed Pandas Columns

import pandas as pd
from articuno import infer_pydantic_model

df = pd.DataFrame({
    "id": pd.Series([1,2,3], dtype="int64[pyarrow]"),
    "name": pd.Series(["A","B","C"], dtype="string[pyarrow]")
})
Model = infer_pydantic_model(df, model_name="ArrowUser")
print(Model.schema_json(indent=2))

🔥 Force Optional Fields

from articuno import infer_pydantic_model, df_to_pydantic

Model = infer_pydantic_model(df, force_optional=True)
models = df_to_pydantic(df, force_optional=True)

🧾 Generate Code

from articuno.codegen import generate_class_code
code = generate_class_code(Model)
print(code)

⚙️ Supported Type Mappings

Polars Type Pandas Type (incl. PyArrow) Pydantic Type
pl.Int*, pl.UInt* int64, Int64, int64[pyarrow] int
pl.Float* float64, float64[pyarrow] float
pl.Utf8 object, string[pyarrow] str
pl.Boolean bool, bool[pyarrow] bool
pl.Date datetime64[ns] datetime.date
pl.Datetime datetime64[ns] datetime.datetime
pl.Duration timedelta64[ns] datetime.timedelta
pl.List list List[...]
pl.Struct dict Nested model
pl.Null None, NaN Optional[...]

🛠️ Development

pip install articuno[dev]
pytest

🔗 Links


📄 License

MIT © Odos Matthews

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

articuno-0.7.0.tar.gz (11.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

articuno-0.7.0-py3-none-any.whl (12.0 kB view details)

Uploaded Python 3

File details

Details for the file articuno-0.7.0.tar.gz.

File metadata

  • Download URL: articuno-0.7.0.tar.gz
  • Upload date:
  • Size: 11.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.6

File hashes

Hashes for articuno-0.7.0.tar.gz
Algorithm Hash digest
SHA256 00c548885d0fc8563c61ad3d335fcc738f9b46ae4b4a0401fc582d5c4b8b9794
MD5 aa41356c44c50d370f50f380f009e609
BLAKE2b-256 cac624680f53dc93d5091fd1c0ff904efe25a9d7c20f7010ade3a785bfe77211

See more details on using hashes here.

File details

Details for the file articuno-0.7.0-py3-none-any.whl.

File metadata

  • Download URL: articuno-0.7.0-py3-none-any.whl
  • Upload date:
  • Size: 12.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.6

File hashes

Hashes for articuno-0.7.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a276045d9025488037d52d2e957569cd27945c2793288da07282fe51fcdcf1c9
MD5 90e9057824dd8bbc574638add2f0b7c6
BLAKE2b-256 40143cb8f5991fca93e651ac5b1e72a4366303a71f9b36c296948e7dbb0a9de5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page