Skip to main content

Lightweight orchestration layer that turns pandas DataFrames into front-end-ready JSON schemas, engineered to pair seamlessly with [mlform](https://github.com/UlloaSP/mlform).

Project description

MLSchema

PyPI - Version Python Versions CI License

Lightweight orchestration layer that turns pandas DataFrames into front-end-ready JSON schemas, engineered to pair seamlessly with mlform.

Contents

Overview

mlschema accelerates form and contract generation by automatically deriving JSON field definitions from tabular data. The library applies a strategy-driven pipeline on top of pandas, validating every payload with Pydantic before it reaches your UI tier or downstream services.

  • Converts analytics data into stable JSON schemas in a few lines of code.
  • Keeps inference logic server-side; no external services or background workers required.
  • Ships with production-tested strategies for text, numeric, categorical, boolean, and temporal data.
  • Designed for synchronous use alongside mlform, yet fully usable on its own.

Key Features

  • Strategy registry that lets you opt into only the field types you want to expose.
  • Pydantic v2 models guarantee structural validity and embed domain-specific constraints.
  • Normalized dtype matching covers both pandas extension types and NumPy dtypes.
  • Deterministic JSON output (inputs / outputs) suitable for form engines and low-code tooling.
  • Fully typed public API with strict static analysis (Pyright) and comprehensive tests.

Requirements

  • Python >= 3.14, < 3.15
  • pandas >= 2.3.3, < 3.0.0
  • pydantic >= 2.12.3, < 3.0.0

All transitive dependencies are resolved automatically by your package manager.

Installation

uv add mlschema

Alternative package managers:

  • pip install mlschema
  • poetry add mlschema
  • conda install -c conda-forge mlschema
  • pipenv install mlschema

Pin a version (for example mlschema==0.1.2) when you need deterministic environments.

Quick Start

import pandas as pd
from mlschema import MLSchema
from mlschema.strategies import TextStrategy, NumberStrategy, CategoryStrategy

df = pd.DataFrame(
  {
    "name": ["Ada", "Linus", "Grace"],
    "score": [98.5, 86.0, 91.0],
    "role": pd.Categorical(["engineer", "engineer", "scientist"]),
  }
)

builder = MLSchema()
builder.register(TextStrategy())      # fallback for unsupported dtypes
builder.register(NumberStrategy())
builder.register(CategoryStrategy())

schema = builder.build(df)

Schema Output

The payload is ready to serialise to JSON and inject into your UI or downstream service:

{
  "inputs": [
  {"title": "name", "required": true, "type": "text"},
  {"title": "score", "required": true, "type": "number", "step": 0.1},
  {"title": "role", "required": true, "type": "category", "options": ["engineer", "scientist"]}
  ],
  "outputs": []
}

TextStrategy acts as the default fallback. Make sure it is registered when you want unsupported columns to degrade gracefully.

How It Works

  1. Registry orchestrationMLSchema keeps an in-memory registry of field strategies, keyed by a logical type_name and one or more pandas dtypes.
  2. Inference pipeline – each DataFrame column is normalised, matched against the registry, and dispatched to the first compatible strategy.
  3. Schema materialisation – strategies merge required metadata (title, type, required) with data-driven attributes, then dump the result through a Pydantic model.
  4. Structured output – the service returns the canonical {"inputs": [...], "outputs": []} payload that feeds mlform or any form rendering layer.

Built-in Strategies

Strategy class type name Supported pandas dtypes Additional attributes
TextStrategy text object, string minLength, maxLength, pattern, value, placeholder
NumberStrategy number int64, int32, float64, float32 min, max, step, value, unit, placeholder
CategoryStrategy category category options, value
BooleanStrategy boolean bool, boolean value
DateStrategy date datetime64[ns], datetime64 min, max, value, step

Register only the strategies you need. Duplicate registrations raise explicit errors; use MLSchema.update() to swap implementations at runtime.

Extending MLSchema

Create bespoke field types by pairing a custom Pydantic model with a strategy implementation:

from typing import Literal
from pandas import Series
from mlschema.core import BaseField, Strategy


class RatingField(BaseField):
  type: Literal["rating"] = "rating"
  min: int | None = None
  max: int | None = None
  precision: float = 0.5


class RatingStrategy(Strategy):
  def __init__(self) -> None:
    super().__init__(
      type_name="rating",
      schema_cls=RatingField,
      dtypes=("float64",),
    )

  def attributes_from_series(self, series: Series) -> dict:
    return {
      "min": float(series.min()),
      "max": float(series.max()),
    }
  • Use Strategy.dtypes to advertise the pandas dtypes your strategy understands.
  • Avoid mutating the incoming Series; treat it as read-only.
  • Reserved keys (title, type, required, description) are populated by the base class.

Reference the full guide at https://ulloasp.github.io/mlschema/usage/ for end-to-end patterns.

Validation & Error Handling

  • EmptyDataFrameError – raised when the DataFrame has no rows or columns.
  • FallbackStrategyMissingError – triggered if an unsupported dtype is encountered without a registered fallback.
  • StrategyNameAlreadyRegisteredError / StrategyDtypeAlreadyRegisteredError – guard against duplicate registrations.
  • Pydantic ValidationError / PydanticCustomError – surface invalid field constraints early (min/max, regex patterns, date ranges, etc.).

All exceptions derive from mlschema.core.MLSchemaError, making it straightforward to trap library-level failures.

Tooling & Quality

  • Distributed as an MIT-licensed wheel and sdist built with Hatchling.
  • Strict typing (pyright) and linting (ruff) shipped with the repo.
  • Test suite powered by pytest and pytest-cov; coverage reports live alongside the source tree.
  • py.typed marker ensures type information propagates to downstream projects.

Resources

Contributing

Community contributions are welcome. Review the guidelines and pick an issue to get started:

Security

Please report security concerns privately by emailing pablo.ulloa.santin@udc.es. The coordinated disclosure process is documented at https://github.com/UlloaSP/mlschema/blob/main/SECURITY.md.

License

Released under the MIT License. Complete terms and third-party attributions are available at:


Made by Pablo Ulloa Santin and the MLSchema community.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mlschema-0.1.2.tar.gz (52.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mlschema-0.1.2-py3-none-any.whl (30.6 kB view details)

Uploaded Python 3

File details

Details for the file mlschema-0.1.2.tar.gz.

File metadata

  • Download URL: mlschema-0.1.2.tar.gz
  • Upload date:
  • Size: 52.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for mlschema-0.1.2.tar.gz
Algorithm Hash digest
SHA256 319054f16d36a6ce0f00878c47d76003997f1d703c7320d7a6461399e988984f
MD5 2a37f7a5a1a2b31b23ad2a32ba2b0c22
BLAKE2b-256 048f2dc3ba81f9ae554d9f5c5bd5402c007fc1fcfa2bc90766613c8d7ebe6372

See more details on using hashes here.

File details

Details for the file mlschema-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: mlschema-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 30.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for mlschema-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 7a8a4adb0eca4d49af1e77d8336746afc4e0a348a41a8a62476d3e5c7d6469b5
MD5 39c15d566e847e8d2501ae0903a6b584
BLAKE2b-256 001293416b9e8ab05feb93b21f812dea45a3e51ad95694919597901767c53501

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page