Infer validated, frontend-ready field schemas from pandas DataFrames.

These details have not been verified by PyPI

Project links

Project description

MLSchema

Turn pandas DataFrames into validated, front-end-ready field schemas.

mlschema is a lightweight Python SDK for deriving JSON-serialisable field contracts from tabular data. It is designed for model inputs, prediction forms, review tools, annotation workflows, dashboards, and any frontend that needs to render fields from a pandas.DataFrame without hand-writing the same schema twice.

It pairs naturally with mlform, but the generated schema is plain JSON-compatible data and can be consumed by any frontend or service layer.

Why MLSchema

DataFrame columns already carry useful contract information: names, dtypes, categories, nullability, dates, numeric values, and structured pairs. MLSchema turns that information into a validated field list.

Instead of maintaining separate form definitions beside the data pipeline, use infer_schema(df) as the baseline and refine only what is genuinely product-specific: labels, bounds, defaults, units, placeholders, UI hints, or custom field kinds.

import pandas as pd

from mlschema import infer_schema

df = pd.DataFrame(
    {
        "name": ["Ada", "Linus", "Grace"],
        "score": [98.5, 86.0, 91.0],
        "role": pd.Categorical(["engineer", "engineer", "scientist"]),
        "active": [True, False, True],
    }
)

schema = infer_schema(df)

[
  {
    "kind": "text",
    "label": "name",
    "required": true
  },
  {
    "kind": "number",
    "label": "score",
    "required": true,
    "step": 0.1
  },
  {
    "kind": "category",
    "label": "role",
    "required": true,
    "options": ["engineer", "scientist"]
  },
  {
    "kind": "boolean",
    "label": "active",
    "required": true
  }
]

Key Features

Function-first API: infer_schema(df).
Builtin inference for text, number, category, boolean, date, and two-axis series fields.
Pydantic v2 validation before any schema is returned.
JSON-serialisable field-list output for frontend and service integration.
Field refinements through overrides.
Domain-specific behaviour through custom builders.
New frontend contracts through strict custom kinds.
Typed public API with py.typed, Pyright, Ruff, pytest, and CI.

Requirements

Python >=3.14,<3.15
pandas >=3.0.3,<4.0.0
pydantic >=2.13.4,<3.0.0

Installation

uv add mlschema

Alternative package managers:

pip install mlschema

poetry add mlschema

Pin a version when reproducible environments matter:

uv add "mlschema==0.2.0"

Quick Start

import pandas as pd

from mlschema import infer_schema

df = pd.DataFrame(
    {
        "customer": ["Ada", "Linus", "Grace"],
        "age": [42, 55, 38],
        "tier": pd.Categorical(["pro", "free", "pro"], categories=["free", "pro"]),
        "created": pd.date_range("2024-01-01", periods=3),
    }
)

schema = infer_schema(df)

The result can be returned from an API, stored as a contract, passed to a form renderer, or used in tests to detect schema drift.

MLSchema works best when DataFrame dtypes are deliberate. Numeric columns should use numeric dtypes, categorical columns should use category, date columns should use pandas datetime dtypes, and boolean columns should use boolean dtypes. Ambiguous object columns fall back to text.

Schema Output

The canonical output is a field list.

There is no top-level envelope by default. MLSchema returns the contract directly:

[
  {
    "kind": "text",
    "label": "customer",
    "required": true
  },
  {
    "kind": "number",
    "label": "age",
    "required": true,
    "step": 1
  },
  {
    "kind": "category",
    "label": "tier",
    "required": true,
    "options": ["free", "pro"]
  },
  {
    "kind": "date",
    "label": "created",
    "required": true
  }
]

Each field includes:

kind: the frontend discriminator.
label: the human-readable label, inferred from the column name unless overridden.
required: true when the source column contains no missing values.
kind-specific metadata, such as step, options, field1, field2, or validation bounds.

Optional values set to None are omitted from the output.

Builtin Kinds

Builtin kinds are enabled by default and resolved in a fixed order.

Kind	Detection	Notes
`series`	Non-null cells are 2-element tuples, lists, or dictionaries.	Infers `field1` and `field2` recursively.
`boolean`	`bool`, `boolean`	Emits a boolean field contract.
`category`	`category`	Emits `options` from categorical categories.
`date`	`datetime64[ns]`, `datetime64[us]`, `datetime64`	Emits a date field contract.
`number`	`int64`, `int32`, `float64`, `float32`	Emits `step: 1` for integer columns and `step: 0.1` for float columns.
`text`	fallback	Claims columns not handled by earlier kinds.

The order matters. series runs before text because it detects pair-shaped object cells by content. text runs last as the safe fallback.

Series Columns

A series field represents a two-axis value stored in a single DataFrame column, such as timestamp-value readings.

import pandas as pd

from mlschema import infer_schema

df = pd.DataFrame(
    {
        "readings": [
            (pd.Timestamp("2024-01-01"), 23.5),
            (pd.Timestamp("2024-01-02"), 24.1),
            (pd.Timestamp("2024-01-03"), 22.8),
        ],
    }
)

schema = infer_schema(df)

[
  {
    "kind": "series",
    "label": "readings",
    "required": true,
    "field1": {
      "kind": "date",
      "label": "field1",
      "required": true
    },
    "field2": {
      "kind": "number",
      "label": "field2",
      "required": true,
      "step": 0.1
    }
  }
]

Supported cell shapes are:

(timestamp, value)
[timestamp, value]
{"timestamp": timestamp, "value": value}

Nested series are rejected. Cardinality constraints such as minPoints and maxPoints can be added with overrides.

Refining Fields With Overrides

Inference provides the structural baseline. Production interfaces often need clearer labels, ranges, defaults, units, placeholders, or UI metadata.

schema = infer_schema(
    df,
    overrides={
        "age": {
            "label": "Age",
            "description": "Customer age in years.",
            "min": 0,
            "max": 120,
            "step": 1,
            "unit": "years",
        },
        "tier": {
            "label": "Plan",
            "defaultValue": "pro",
        },
    },
)

Overrides are applied after inference and before final validation. Missing columns and invalid constraints fail explicitly instead of producing a broken schema.

Extending MLSchema

Use a custom builder when an existing kind is correct, but the column needs domain-aware metadata.

from pandas import Series

from mlschema import FieldContext, infer_schema

def money_builder(series: Series, ctx: FieldContext) -> dict | None:
    if ctx.name != "amount_eur":
        return None

    return {
        "kind": "number",
        "label": "Amount",
        "required": ctx.required,
        "step": 0.01,
        "unit": "EUR",
        "min": 0,
    }

schema = infer_schema(df, builders=[money_builder])

Use a custom kind when the frontend needs a new field discriminator and a dedicated validation model.

from typing import Literal

from pandas import Series

from mlschema import BaseField, FieldContext, infer_schema, kind

class DurationField(BaseField):
    kind: Literal["duration"] = "duration"
    unit: Literal["seconds"] = "seconds"
    minSeconds: int
    maxSeconds: int

def duration_builder(series: Series, ctx: FieldContext) -> dict | None:
    if ctx.dtype not in {"timedelta64[ns]", "timedelta64[us]"}:
        return None

    return {
        "kind": "duration",
        "label": ctx.name,
        "required": ctx.required,
        "unit": "seconds",
        "minSeconds": int(series.min().total_seconds()),
        "maxSeconds": int(series.max().total_seconds()),
    }

schema = infer_schema(
    df,
    kinds=[
        kind(model=DurationField, infer=duration_builder),
    ],
)

Resolution is predictable:

user builders
custom kind builders
builtin builders

The first builder returning a field dictionary owns the column.

Validation And Errors

MLSchema validates the generated contract before returning it.

Common errors include:

Error	Meaning
`EmptyDataFrameError`	The input DataFrame has no rows or no columns.
`FieldBuilderError`	A builder returned an invalid payload, omitted `kind`, no builder matched, or an override targeted a missing column.
`UnknownFieldKindError`	A builder emitted a kind with no registered field model.
`FieldKindAlreadyRegisteredError`	Duplicate kind names were registered.
`FieldKindError`	`kind()` received an invalid field model.
`pydantic.ValidationError`	The final field payload violates its Pydantic model.

Library exceptions are available from mlschema.core.exceptions and re-exported from mlschema.core.

With mlform

MLSchema focuses on inference and validation. mlform can consume the generated field list to render interactive forms.

The split is intentional: Python owns the data contract; the frontend owns rendering, interaction, and submission.

Documentation

Documentation: https://ulloasp.github.io/mlschema/
Usage guide: https://ulloasp.github.io/mlschema/usage/
Schema standard: https://ulloasp.github.io/mlschema/schema-standard/
API reference: https://ulloasp.github.io/mlschema/reference/
Changelog: https://ulloasp.github.io/mlschema/changelog/

Tooling And Quality

MIT-licensed package distributed as wheel and sdist.
Built with Hatchling.
Typed with py.typed.
Tested with pytest and pytest-cov.
Checked with ruff and pyright.
CI provided by GitHub Actions.

Contributing

Contributions are welcome.

Useful commands for local development:

uv sync
uv run pre-commit install
uv run pytest

Project links:

Issues: https://github.com/UlloaSP/mlschema/issues
Discussions: https://github.com/UlloaSP/mlschema/discussions
Contributing guide: https://github.com/UlloaSP/mlschema/blob/main/CONTRIBUTING.md

Security

Please report security concerns privately by emailing pablo.ulloa.santin@udc.es.

The disclosure process is documented in SECURITY.md.

License

Released under the MIT License.

License: https://github.com/UlloaSP/mlschema/blob/main/LICENSE
Third-party notices: https://github.com/UlloaSP/mlschema/blob/main/THIRD_PARTY_LICENSES.md

Made by Pablo Ulloa Santin and contributors.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.2.0

Jun 1, 2026

0.1.6

Apr 21, 2026

0.1.5

Apr 17, 2026

0.1.4

Apr 17, 2026

0.1.3

Apr 16, 2026

0.1.2

Oct 29, 2025

0.1.1

Oct 16, 2025

0.1.0

Sep 16, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mlschema-0.2.0.tar.gz (54.1 kB view details)

Uploaded Jun 1, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

mlschema-0.2.0-py3-none-any.whl (35.1 kB view details)

Uploaded Jun 1, 2026 Python 3

File details

Details for the file mlschema-0.2.0.tar.gz.

File metadata

Download URL: mlschema-0.2.0.tar.gz
Upload date: Jun 1, 2026
Size: 54.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for mlschema-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`041c9c771a319b1e786da46acae1ecc2f40939650121495486c95f10c736689f`
MD5	`40daa34c27808a335d4a177b12764332`
BLAKE2b-256	`b012167f360dd55ab9ad0bd6f24ca714b6a721e6f0e2cb72af2f33168fdeca4b`

See more details on using hashes here.

File details

Details for the file mlschema-0.2.0-py3-none-any.whl.

File metadata

Download URL: mlschema-0.2.0-py3-none-any.whl
Upload date: Jun 1, 2026
Size: 35.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for mlschema-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`1ecdc6b9dda6963873e35cd733c5e04755d4d19b93c13f6b84ccf6533a0106c5`
MD5	`b1b76053319a4be27dee4afdfb5ca13d`
BLAKE2b-256	`54bb07f1bc9ee55aa8abf68b907befcfe2b365b04d1aee08588d6b21599d6419`

See more details on using hashes here.

mlschema 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

MLSchema

Why MLSchema

Key Features

Requirements

Installation

Quick Start

Schema Output

Builtin Kinds

Series Columns

Refining Fields With Overrides

Extending MLSchema

Validation And Errors

With mlform

Documentation

Tooling And Quality

Contributing

Security

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes