Lightweight orchestration layer that turns pandas DataFrames into front-end-ready JSON schemas, engineered to pair seamlessly with [mlform](https://github.com/UlloaSP/mlform).

These details have not been verified by PyPI

Project links

Project description

MLSchema

Lightweight orchestration layer that turns pandas DataFrames into front-end-ready JSON schemas, engineered to pair seamlessly with mlform.

MLSchema

Overview

mlschema accelerates form and contract generation by automatically deriving JSON field definitions from tabular data. The library applies a strategy-driven pipeline on top of pandas, validating every payload with Pydantic before it reaches your UI tier or downstream services.

Converts analytics data into stable JSON schemas in a few lines of code.
Keeps inference logic server-side; no external services or background workers required.
Ships with production-tested strategies for text, numeric, categorical, boolean, temporal, and two-axis series data.
Designed for synchronous use alongside mlform, yet fully usable on its own.

Key Features

Strategy registry that lets you opt into only the field types you want to expose.
Pydantic v2 models guarantee structural validity and embed domain-specific constraints.
Normalized dtype matching covers both pandas extension types and NumPy dtypes.
Deterministic JSON output (fields / reports / explanations) suitable for form engines and low-code tooling.
Fully typed public API with strict static analysis (Pyright) and comprehensive tests.

Requirements

Python >= 3.14, < 3.15
pandas >= 2.3.3, < 3.0.0
pydantic >= 2.12.3, < 3.0.0

All transitive dependencies are resolved automatically by your package manager.

Installation

uv add mlschema

Alternative package managers:

pip install mlschema
poetry add mlschema
conda install -c conda-forge mlschema
pipenv install mlschema

Pin a version (for example mlschema==0.1.3) when you need deterministic environments.

Quick Start

import pandas as pd
from mlschema import MLSchema
from mlschema.strategies import TextStrategy, NumberStrategy, CategoryStrategy

df = pd.DataFrame(
  {
    "name": ["Ada", "Linus", "Grace"],
    "score": [98.5, 86.0, 91.0],
    "role": pd.Categorical(["engineer", "engineer", "scientist"]),
  }
)

builder = MLSchema()
builder.register(TextStrategy())      # fallback for unsupported dtypes
builder.register(NumberStrategy())
builder.register(CategoryStrategy())

schema = builder.build(df)

Schema Output

The payload is ready to serialise to JSON and inject into your UI or downstream service:

{
  "fields": [
  {"title": "name", "required": true, "type": "text"},
  {"title": "score", "required": true, "type": "number", "step": 0.1},
  {"title": "role", "required": true, "type": "category", "options": ["engineer", "scientist"]}
  ],
  "reports": [],
  "explanations": []
}

TextStrategy acts as the default fallback. Make sure it is registered when you want unsupported columns to degrade gracefully.

Series columns

Columns where each cell is a 2-element compound value ((v1, v2), [v1, v2], or {"key1": v1, "key2": v2}) are handled automatically by SeriesStrategy. Sub-field schemas are inferred from the element dtypes via the registered strategies:

import pandas as pd
from datetime import date
from mlschema import MLSchema
from mlschema.strategies import TextStrategy, NumberStrategy, DateStrategy, SeriesStrategy

df = pd.DataFrame({
    "sensor_id": pd.Categorical(["A", "B", "C"]),
    "readings": [
        (date(2024, 1, 1), 23.5),
        (date(2024, 1, 2), 24.1),
        (date(2024, 1, 3), 22.8),
    ],
})

builder = MLSchema()
builder.register(TextStrategy())
builder.register(NumberStrategy())
builder.register(DateStrategy())
builder.register(SeriesStrategy())   # claims compound-cell columns automatically

schema = builder.build(df)

{
  "fields": [
    {"title": "sensor_id", "required": true, "type": "category", "options": ["A", "B", "C"]},
    {
      "title": "readings", "required": true, "type": "series",
      "field1": {"title": "field1", "required": true, "type": "date", "step": 1},
      "field2": {"title": "field2", "required": true, "type": "number", "step": 0.1}
    }
  ],
  "reports": [],
  "explanations": []
}

min_points and max_points can be set directly on SeriesField to document cardinality constraints; they are not inferred from data.

How It Works

Registry orchestration – MLSchema keeps an in-memory registry of field strategies, keyed by a logical type_name and one or more pandas dtypes.
Inference pipeline – each DataFrame column is normalised, matched against the registry, and dispatched to the first compatible strategy.
Schema materialisation – strategies merge required metadata (title, type, required) with data-driven attributes, then dump the result through a Pydantic model.
Structured output – the service returns the canonical {"fields": [...], "reports": [], "explanations": []} payload that feeds mlform or any form rendering layer.

Built-in Strategies

Strategy class	`type` name	Supported pandas dtypes	Additional attributes
`TextStrategy`	`text`	`object`, `string`	`defaultValue` (from `BaseField`), `minLength`, `maxLength`, `pattern`, `placeholder`
`NumberStrategy`	`number`	`int64`, `int32`, `float64`, `float32`	`defaultValue` (from `BaseField`), `min`, `max`, `step`, `unit`, `placeholder`
`CategoryStrategy`	`category`	`category`	`defaultValue` (from `BaseField`), `options`
`BooleanStrategy`	`boolean`	`bool`, `boolean`	`defaultValue` (from `BaseField`)
`DateStrategy`	`date`	`datetime64[ns]`, `datetime64`	`defaultValue` (from `BaseField`), `min`, `max`, `step`
`SeriesStrategy`	`series`	content-based (2-element cells)	`field1`, `field2`, `min_points`, `max_points`

Register only the strategies you need. Duplicate registrations raise explicit errors; use MLSchema.update() to swap implementations at runtime.

SeriesStrategy uses content-based detection instead of dtype matching — it automatically claims any object column whose cells are all 2-element tuples, lists, or dicts, and infers the sub-field schemas from the element dtypes via the registry.

Extending MLSchema

Create bespoke field types by pairing a custom Pydantic model with a strategy implementation:

from typing import Literal
from pandas import Series
from mlschema.core import BaseField, Strategy


class RatingField(BaseField):
  type: Literal["rating"] = "rating"
  min: int | None = None
  max: int | None = None
  precision: float = 0.5


class RatingStrategy(Strategy):
  def __init__(self) -> None:
    super().__init__(
      type_name="rating",
      schema_cls=RatingField,
      dtypes=("float64",),
    )

  def attributes_from_series(self, series: Series) -> dict:
    return {
      "min": float(series.min()),
      "max": float(series.max()),
    }

Use Strategy.dtypes to advertise the pandas dtypes your strategy understands.
Avoid mutating the incoming Series; treat it as read-only.
Reserved keys (title, type, required, description) are populated by the base class.

Reference the full guide at https://ulloasp.github.io/mlschema/usage/ for end-to-end patterns.

Validation & Error Handling

EmptyDataFrameError – raised when the DataFrame has no rows or columns.
FallbackStrategyMissingError – triggered if an unsupported dtype is encountered without a registered fallback.
StrategyNameAlreadyRegisteredError / StrategyDtypeAlreadyRegisteredError – guard against duplicate registrations.
Pydantic ValidationError / PydanticCustomError – surface invalid field constraints early (min/max, regex patterns, date ranges, etc.).

All exceptions derive from mlschema.core.MLSchemaError, making it straightforward to trap library-level failures.

Tooling & Quality

Distributed as an MIT-licensed wheel and sdist built with Hatchling.
Strict typing (pyright) and linting (ruff) shipped with the repo.
Test suite powered by pytest and pytest-cov; coverage reports live alongside the source tree.
py.typed marker ensures type information propagates to downstream projects.

Resources

Documentation portal: https://ulloasp.github.io/mlschema/
API reference: https://ulloasp.github.io/mlschema/reference/
Changelog: https://ulloasp.github.io/mlschema/changelog/
Issue tracker: https://github.com/UlloaSP/mlschema/issues
Discussions: https://github.com/UlloaSP/mlschema/discussions
mlform (optional form renderer): https://github.com/UlloaSP/mlform

Contributing

Community contributions are welcome. Review the guidelines and pick an issue to get started:

Contribution guide: https://github.com/UlloaSP/mlschema/blob/main/CONTRIBUTING.md
Good first issues: https://github.com/UlloaSP/mlschema/labels/good%20first%20issue
Development workflow: uv sync, uv run pre-commit install, uv run pytest

Security

Please report security concerns privately by emailing pablo.ulloa.santin@udc.es. The coordinated disclosure process is documented at https://github.com/UlloaSP/mlschema/blob/main/SECURITY.md.

License

Released under the MIT License. Complete terms and third-party attributions are available at:

License: https://github.com/UlloaSP/mlschema/blob/main/LICENSE
Third-party notices: https://github.com/UlloaSP/mlschema/blob/main/THIRD_PARTY_LICENSES.md

Made by Pablo Ulloa Santin and the MLSchema community.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.2.0

Jun 1, 2026

This version

0.1.6

Apr 21, 2026

0.1.5

Apr 17, 2026

0.1.4

Apr 17, 2026

0.1.3

Apr 16, 2026

0.1.2

Oct 29, 2025

0.1.1

Oct 16, 2025

0.1.0

Sep 16, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mlschema-0.1.6.tar.gz (63.9 kB view details)

Uploaded Apr 21, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

mlschema-0.1.6-py3-none-any.whl (38.5 kB view details)

Uploaded Apr 21, 2026 Python 3

File details

Details for the file mlschema-0.1.6.tar.gz.

File metadata

Download URL: mlschema-0.1.6.tar.gz
Upload date: Apr 21, 2026
Size: 63.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for mlschema-0.1.6.tar.gz
Algorithm	Hash digest
SHA256	`95a88fda34459bcef9598b1211165ce2f1f6e055ccccdc0a58b1f6e32690b1df`
MD5	`6cd600bbdf3ed69548da6d558042d0c9`
BLAKE2b-256	`3e743bede75ea6276e6a223fb493d7169783a4e7b8967cdc2bfb1b41f43c04ac`

See more details on using hashes here.

File details

Details for the file mlschema-0.1.6-py3-none-any.whl.

File metadata

Download URL: mlschema-0.1.6-py3-none-any.whl
Upload date: Apr 21, 2026
Size: 38.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for mlschema-0.1.6-py3-none-any.whl
Algorithm	Hash digest
SHA256	`21b8d9f168db85084b6a170a9d492bd997ff7439a32faed8420318413c564c36`
MD5	`b1e415f3c314e280c468db32431ffddf`
BLAKE2b-256	`7e013d2c45d93ed866c1e984f4113849fd7f70183d35338f3728334fd90bd144`

See more details on using hashes here.

mlschema 0.1.6

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

MLSchema

Contents

Overview

Key Features

Requirements

Installation

Quick Start

Schema Output

Series columns

How It Works

Built-in Strategies

Extending MLSchema

Validation & Error Handling

Tooling & Quality

Resources

Contributing

Security

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes