Infer JSON schemas from sample data
Project description
philiprehberger-schema-infer
Infer JSON schemas from sample data.
Installation
pip install philiprehberger-schema-infer
Usage
from philiprehberger_schema_infer import infer
samples = [
{"name": "Alice", "age": 30, "active": True},
{"name": "Bob", "age": 25, "email": "bob@test.com"},
]
schema = infer(samples)
# {
# "type": "object",
# "properties": {
# "name": {"type": "string", "minLength": 3, "maxLength": 5},
# "age": {"type": "integer", "minimum": 25, "maximum": 30},
# "active": {"type": "boolean"},
# "email": {"type": "string", "format": "email", ...}
# },
# "required": ["age", "name"]
# }
Full JSON Schema output
from philiprehberger_schema_infer import to_json_schema
schema = to_json_schema(samples)
# {
# "$schema": "https://json-schema.org/draft/2020-12/schema",
# "type": "object",
# "properties": { ... },
# "required": [...]
# }
Single value type inference
from philiprehberger_schema_infer import infer_type
infer_type([1, 2, 3])
# {"type": "array", "items": {"type": "integer"}}
Schema strictness levels
Control how aggressively fields are marked required and constraints are applied:
from philiprehberger_schema_infer import infer
# Loose: no required fields, no numeric/string constraints
schema = infer(samples, strictness="loose")
# Normal (default): fields in all samples are required, constraints included
schema = infer(samples, strictness="normal")
# Strict: all fields required, additionalProperties set to False
schema = infer(samples, strictness="strict")
Custom format detection
Register domain-specific regex patterns for format detection:
from philiprehberger_schema_infer import register_format, infer_type
register_format("phone", r"^\+\d{1,3}-\d{3,14}$")
register_format("credit-card", r"^\d{4}-\d{4}-\d{4}-\d{4}$")
infer_type("+1-5551234567")
# {"type": "string", "format": "phone"}
Merge schemas
Combine multiple inferred schemas with union/intersection logic for required fields:
from philiprehberger_schema_infer import merge_schemas
merged = merge_schemas(schema_a, schema_b, schema_c)
Confidence scores
Analyze how consistently a type was observed across samples for each field:
from philiprehberger_schema_infer import infer_with_confidence
samples = [
{"name": "Alice", "value": 42},
{"name": "Bob", "value": "hello"},
{"name": "Carol", "value": 99},
]
result = infer_with_confidence(samples)
# {
# "name": {"type": "string", "confidence": 1.0},
# "value": {"type": ..., "confidence": 0.67}
# }
TypeScript interface output
Generate TypeScript interfaces from sample data:
from philiprehberger_schema_infer import to_typescript
samples = [
{"name": "Alice", "age": 30, "active": True},
{"name": "Bob", "age": 25},
]
print(to_typescript(samples, name="User"))
# interface User {
# active?: boolean;
# age: number;
# name: string;
# }
Inference from a .jsonl file
Infer a schema directly from a JSON Lines file without loading it manually:
from philiprehberger_schema_infer import infer_from_jsonl
schema = infer_from_jsonl("events.jsonl")
# Skip lines that aren't valid JSON objects instead of raising
schema = infer_from_jsonl("events.jsonl", skip_invalid=True)
Python dataclass output
Generate Python dataclass definitions from sample data:
from philiprehberger_schema_infer import to_dataclass
samples = [
{"name": "Alice", "age": 30, "email": "alice@test.com"},
{"name": "Bob", "age": 25},
]
print(to_dataclass(samples, name="User"))
# @dataclass
# class User:
# age: int
# name: str
# email: str | None = None
API
| Function / Class | Description |
|---|---|
infer(samples, *, strictness="normal") |
Infer JSON Schema from a list of dicts. Supports "loose", "normal", and "strict" levels. |
infer_from_jsonl(path, *, strictness="normal", skip_invalid=False) |
Infer schema from a .jsonl file |
infer_type(value) |
Infer schema type for a single value |
infer_with_confidence(samples) |
Infer types with per-field confidence scores indicating type consistency |
merge_schemas(*schemas) |
Merge two or more schemas into one accepting any of them |
register_format(name, pattern) |
Register a custom regex pattern for string format detection |
to_dataclass(samples, *, name, strictness) |
Generate a Python dataclass definition from sample data |
to_json_schema(samples, *, strictness="normal") |
Wraps infer() output with $schema URI for draft 2020-12 |
to_typescript(samples, *, name, strictness) |
Generate a TypeScript interface definition from sample data |
Development
pip install -e .
python -m pytest tests/ -v
Support
If you find this project useful:
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file philiprehberger_schema_infer-0.5.0.tar.gz.
File metadata
- Download URL: philiprehberger_schema_infer-0.5.0.tar.gz
- Upload date:
- Size: 12.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6e846917ec9e82c2284aef786530dd80b48131a79b987f910dcc698cfc2df341
|
|
| MD5 |
0a9d5b0bf9b0df6d101857028e671cbe
|
|
| BLAKE2b-256 |
a76c3c06d0112b29b721a8797e0d47f26f34e1199833d26cbb4e95c0148e4150
|
File details
Details for the file philiprehberger_schema_infer-0.5.0-py3-none-any.whl.
File metadata
- Download URL: philiprehberger_schema_infer-0.5.0-py3-none-any.whl
- Upload date:
- Size: 9.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f099853aa4d561f50cea956b84a2c36eea1a23376e480e900dc76411ad7540fc
|
|
| MD5 |
4034e9e4eea765ae46a9036a4c1852af
|
|
| BLAKE2b-256 |
089b6a6886b8e457cd5039f37480f89c4f1bc1f21ca1faa80a299847ea69e1d4
|