Token-efficient schema language for LLMs with validation and conversion
Project description
SlimSchema
Compact schemas for LLM-generated JSON.
SlimSchema provides 6 core functions:
to_data(json, schema)- Validate JSON against a schemafrom_data(examples)- Infer schema from JSON examplesto_schema(input)- Normalize any format to Schema IRto_pydantic(schema)- Convert to Pydantic BaseModelto_msgspec(schema)- Convert to msgspec Structapply_patch(data, patches)- Apply JSON Patch mutations (RFC 6902)
Quick Start
pip install slimschema
Define Your Schema
Choose your preferred format - all three work identically:
Option 1: YAML
name: str{1..100}
email: email
age: 18..120
country: str{2..2}
status: active | inactive | pending
Option 2: Pydantic
from pydantic import BaseModel, Field
from typing import Literal
class User(BaseModel):
name: str = Field(min_length=1, max_length=100)
email: str
age: int = Field(ge=18, le=120)
country: str = Field(min_length=2, max_length=2)
status: Literal["active", "inactive", "pending"]
Option 3: msgspec
import msgspec
from typing import Literal
class User(msgspec.Struct):
name: str
email: str
age: int
country: str
status: Literal["active", "inactive", "pending"]
Core API
1. Validate JSON with to_data()
from slimschema import to_data
schema = """
name: str{1..100}
email: email
age: 18..120
country: str{2..2}
status: active | inactive | pending
"""
# Valid JSON
json_response = """
<json>
{
"name": "Alice",
"email": "alice@example.com",
"age": 30,
"country": "US",
"status": "active"
}
</json>
"""
user, error = to_data(json_response, schema)
print(user["name"]) # "Alice"
print(error) # None
Invalid JSON produces clear errors:
bad_json = """
{
"name": "Bob",
"email": "not-an-email",
"age": 150,
"country": "USA",
"status": "unknown"
}
"""
user, error = to_data(bad_json, schema)
print(error)
# "Expected `str` matching regex '^[^@]+@[^@]+\.[^@]+$' - at `$.email`"
2. Infer Schema with from_data()
Enums are detected by repetition - if the same value appears multiple times, it becomes an enum:
from slimschema import from_data, to_yaml
examples = [
{"name": "Alice", "status": "active"},
{"name": "Bob", "status": "inactive"},
{"name": "Charlie", "status": "active"},
{"name": "Diana", "status": "inactive"},
]
schema = from_data(examples, name="User")
print(to_yaml(schema))
Output:
# User
name: str
status: active | inactive
3. Convert Formats with to_schema(), to_pydantic(), to_msgspec()
All schema formats are interchangeable:
from slimschema import to_schema, to_pydantic, to_msgspec, to_yaml
# Start with YAML
yaml_schema = """
name: str
age: 18..120
status: active | inactive
"""
# Convert to Pydantic (class-based API)
pydantic_model = to_pydantic(yaml_schema)
user = pydantic_model(name="Alice", age=30, status="active")
# Convert to msgspec (functional API with msgspec.convert)
import msgspec
msgspec_struct = to_msgspec(yaml_schema)
user = msgspec.convert({"name": "Bob", "age": 25, "status": "active"}, type=msgspec_struct)
# Convert between formats
from pydantic import BaseModel, Field
from typing import Literal
class Product(BaseModel):
name: str
status: Literal["draft", "active"]
schema = to_schema(Product)
yaml_output = to_yaml(schema)
print(yaml_output)
# # Product
# name: str
# status: draft | active
4. Apply Patches with apply_patch()
Mutate data using JSON Patch (RFC 6902). Paths use JSON Pointer syntax (/field, /nested/field, /array/0).
from slimschema import apply_patch
data = {"name": "Bob", "age": 30, "tags": ["user"]}
patches = [
{"op": "replace", "path": "/name", "value": "Alice"},
{"op": "add", "path": "/email", "value": "alice@example.com"},
{"op": "add", "path": "/tags/-", "value": "admin"}, # append with -
{"op": "remove", "path": "/age"}
]
result = apply_patch(data, patches)
# {"name": "Alice", "email": "alice@example.com", "tags": ["user", "admin"]}
Operations: add, remove, replace, move (rename/relocate), copy, test (conditional).
YAML Syntax Reference
# Basic types
name: str
age: int
price: float
active: bool
# String constraints
username: str{3..20} # length constraint
email: email # format validator
url: url
uuid: uuid
date: date # YYYY-MM-DD
datetime: datetime # ISO 8601
sku: /^[A-Z]{3}-\d{4}$/ # regex pattern
# Numeric constraints
age: 18..120 # int range
price: 0.01..99999.99 # float range
quantity: 1.. # min only (no max)
# Literals (enums)
status: draft | active | archived
role: admin | user
# Arrays
tags: [str]
scores: [int]
items: [str{1..50}]
# Optional fields
bio?: str
updated?: datetime
# Comments
name: str # User's full name
Configure Inference
from_data() can be configured to control enum detection, range detection, and more:
from slimschema import InferenceConfig, from_data, to_yaml
# Disable enum detection
config = InferenceConfig(detect_enums=False)
schema = from_data(data, config=config)
# Allow up to 10 unique values for enums (default is 5)
config = InferenceConfig(enum_max_cardinality=10)
schema = from_data(data, config=config)
Other options: detect_ranges, detect_formats, max_samples, max_nesting_depth, int_range_max_delta, float_range_max_delta.
Generate LLM Prompts
Create prompts with embedded schemas for structured output:
from slimschema import to_prompt
# Default: <output>```json...```</output> (most robust)
prompt = to_prompt(schema)
# Customize tagging and fencing
prompt = to_prompt(
schema,
instruction="Extract user data from the text.",
tag="xml", # "xml" or "none"
tag_name="output", # Tag name
fence="fenced", # "fenced" or "none"
format_label="json" # "json", "xml", "csv", "yaml"
)
# Compact version (no instruction text)
prompt = to_prompt_compact(schema)
Tagging strategies:
tag="xml"+fence="fenced":<output>```json...```</output>(recommended)tag="none"+fence="fenced":```json...```tag="xml"+fence="none":<output>...</output>
Robust Extraction
to_data() extracts structured data from LLM responses with multiple fallback strategies:
Supports 4 formats: JSON, CSV, XML, YAML
Multiple tagging strategies (priority order):
- XML wrapped fence:
<output>```json...```</output> - Fence alone:
```json...``` - XML tag alone:
<json>...</json> - Raw format detection
Special handling:
- JSONL/JSON-ND: Newline-delimited JSON objects (with or without commas)
- CSV: Auto-delimiter detection (
,,;,\t,|) - Case-insensitive: All tags and fence labels
- Flexible fencing: 3-10 backticks supported
from slimschema import to_data
# Works with any tagging strategy
response = '<output>```json\n{"name": "Alice"}\n```</output>'
data, error = to_data(response, schema)
# JSONL support
response = """<json>
{"name": "Alice"}
{"name": "Bob"}
</json>"""
data, error = to_data(response, schema) # Returns list
See docs/extraction.md for complete documentation.
Installation
# Basic installation
pip install slimschema
# With XML extraction support
pip install slimschema[xml]
All core dependencies (msgspec, pydantic, ruamel.yaml) are included.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file slimschema-0.0.0.dev3.tar.gz.
File metadata
- Download URL: slimschema-0.0.0.dev3.tar.gz
- Upload date:
- Size: 49.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
936a3c982f03e2930ceb0439ce8e8f89bb4c4ef5fb83ec40492485b3649229e9
|
|
| MD5 |
3ec694d7814cfd855b8c6ce0e696515d
|
|
| BLAKE2b-256 |
d2b0652cdcd7212284024e1367d1235eac1206396cac91607a2b46c600a98683
|
Provenance
The following attestation bundles were made for slimschema-0.0.0.dev3.tar.gz:
Publisher:
on-release-main.yml on botassembly/slimschema
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
slimschema-0.0.0.dev3.tar.gz -
Subject digest:
936a3c982f03e2930ceb0439ce8e8f89bb4c4ef5fb83ec40492485b3649229e9 - Sigstore transparency entry: 705963664
- Sigstore integration time:
-
Permalink:
botassembly/slimschema@941f2efef81317deeca82e4a64aaee2c8a172f20 -
Branch / Tag:
refs/tags/v0.0.0dev3 - Owner: https://github.com/botassembly
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
on-release-main.yml@941f2efef81317deeca82e4a64aaee2c8a172f20 -
Trigger Event:
release
-
Statement type:
File details
Details for the file slimschema-0.0.0.dev3-py3-none-any.whl.
File metadata
- Download URL: slimschema-0.0.0.dev3-py3-none-any.whl
- Upload date:
- Size: 27.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fc3ada58e2153a5c9ba550fbbfa57829949c1fbb0187cdd28b57afaf3fb67629
|
|
| MD5 |
b5c07ea1b6d6f6bfbb068661557b0d4a
|
|
| BLAKE2b-256 |
0181025b42c416bc2d53faef5504ff5ce751fb384b24bf42c726d65acaa54c9e
|
Provenance
The following attestation bundles were made for slimschema-0.0.0.dev3-py3-none-any.whl:
Publisher:
on-release-main.yml on botassembly/slimschema
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
slimschema-0.0.0.dev3-py3-none-any.whl -
Subject digest:
fc3ada58e2153a5c9ba550fbbfa57829949c1fbb0187cdd28b57afaf3fb67629 - Sigstore transparency entry: 705963670
- Sigstore integration time:
-
Permalink:
botassembly/slimschema@941f2efef81317deeca82e4a64aaee2c8a172f20 -
Branch / Tag:
refs/tags/v0.0.0dev3 - Owner: https://github.com/botassembly
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
on-release-main.yml@941f2efef81317deeca82e4a64aaee2c8a172f20 -
Trigger Event:
release
-
Statement type: