Skip to main content

Infer JSON schemas from sample data

Project description

philiprehberger-schema-infer

Tests PyPI version GitHub release Last updated License Bug Reports Feature Requests Sponsor

Infer JSON schemas from sample data.

Installation

pip install philiprehberger-schema-infer

Usage

Infer schema from samples

from philiprehberger_schema_infer import infer

samples = [
    {"name": "Alice", "age": 30, "active": True},
    {"name": "Bob", "age": 25, "email": "bob@test.com"},
]

schema = infer(samples)
# {
#   "type": "object",
#   "properties": {
#     "name": {"type": "string", "minLength": 3, "maxLength": 5},
#     "age": {"type": "integer", "minimum": 25, "maximum": 30},
#     "active": {"type": "boolean"},
#     "email": {"type": "string", "format": "email", ...}
#   },
#   "required": ["age", "name"]
# }

Full JSON Schema output

from philiprehberger_schema_infer import to_json_schema

schema = to_json_schema(samples)
# {
#   "$schema": "https://json-schema.org/draft/2020-12/schema",
#   "type": "object",
#   "properties": { ... },
#   "required": [...]
# }

Single value type inference

from philiprehberger_schema_infer import infer_type

infer_type([1, 2, 3])
# {"type": "array", "items": {"type": "integer"}}

Schema strictness levels

Control how aggressively fields are marked required and constraints are applied:

from philiprehberger_schema_infer import infer

# Loose: no required fields, no numeric/string constraints
schema = infer(samples, strictness="loose")

# Normal (default): fields in all samples are required, constraints included
schema = infer(samples, strictness="normal")

# Strict: all fields required, additionalProperties set to False
schema = infer(samples, strictness="strict")

Custom format detection

Register domain-specific regex patterns for format detection:

from philiprehberger_schema_infer import register_format, infer_type

register_format("phone", r"^\+\d{1,3}-\d{3,14}$")
register_format("credit-card", r"^\d{4}-\d{4}-\d{4}-\d{4}$")

infer_type("+1-5551234567")
# {"type": "string", "format": "phone"}

Merge schemas

Combine multiple inferred schemas with union/intersection logic for required fields:

from philiprehberger_schema_infer import merge_schemas

merged = merge_schemas(schema_a, schema_b, schema_c)

API

Function Description
infer(samples, *, strictness="normal") Infer JSON Schema from a list of dicts. Supports "loose", "normal", and "strict" levels.
infer_type(value) Infer schema type for a single value
merge_schemas(*schemas) Merge two or more schemas into one accepting any of them
register_format(name, pattern) Register a custom regex pattern for string format detection
to_json_schema(samples, *, strictness="normal") Wraps infer() output with $schema URI for draft 2020-12

Development

pip install -e .
python -m pytest tests/ -v

Support

If you find this package useful, consider giving it a star on GitHub — it helps motivate continued maintenance and development.

LinkedIn More packages

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

philiprehberger_schema_infer-0.3.0.tar.gz (8.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

philiprehberger_schema_infer-0.3.0-py3-none-any.whl (6.8 kB view details)

Uploaded Python 3

File details

Details for the file philiprehberger_schema_infer-0.3.0.tar.gz.

File metadata

File hashes

Hashes for philiprehberger_schema_infer-0.3.0.tar.gz
Algorithm Hash digest
SHA256 30b165362c166eebd28c679afe8ed4ef5125e27e8d07a7ff7b64372c9c3c6a44
MD5 fd13906b2a8535677315cffaec866aec
BLAKE2b-256 128003cc6b5801ba6b6a14c4fea5efd7bc52c8e8f62f694d20704b685dd818e3

See more details on using hashes here.

File details

Details for the file philiprehberger_schema_infer-0.3.0-py3-none-any.whl.

File metadata

File hashes

Hashes for philiprehberger_schema_infer-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e57df1b80a078c67d094e824aed0a674addd6783ef81bc2218b9cf66cdf0f4a6
MD5 4064e7072b79cac8d22a55acbf38b102
BLAKE2b-256 ff3e6a0d85e199890c556f5a916488468f040d746a12a7a346b8a5b06d7e132c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page