Skip to main content

Token-efficient schema language for LLMs with validation and conversion

Project description

SlimSchema

Compact schemas for LLM-generated JSON.

SlimSchema provides 6 core functions:

  • to_data(json, schema) - Validate JSON against a schema
  • from_data(examples) - Infer schema from JSON examples
  • to_schema(input) - Normalize any format to Schema IR
  • to_pydantic(schema) - Convert to Pydantic BaseModel
  • to_msgspec(schema) - Convert to msgspec Struct
  • apply_patch(data, patches) - Apply JSON Patch mutations (RFC 6902)

Quick Start

pip install slimschema

Define Your Schema

Choose your preferred format - all three work identically:

Option 1: YAML
name: str{1..100}
email: email
age: 18..120
country: str{2..2}
status: active | inactive | pending
Option 2: Pydantic
from pydantic import BaseModel, Field
from typing import Literal

class User(BaseModel):
    name: str = Field(min_length=1, max_length=100)
    email: str
    age: int = Field(ge=18, le=120)
    country: str = Field(min_length=2, max_length=2)
    status: Literal["active", "inactive", "pending"]
Option 3: msgspec
import msgspec
from typing import Literal

class User(msgspec.Struct):
    name: str
    email: str
    age: int
    country: str
    status: Literal["active", "inactive", "pending"]

Core API

1. Validate JSON with to_data()

from slimschema import to_data

schema = """
name: str{1..100}
email: email
age: 18..120
country: str{2..2}
status: active | inactive | pending
"""

# Valid JSON
json_response = """
<json>
{
    "name": "Alice",
    "email": "alice@example.com",
    "age": 30,
    "country": "US",
    "status": "active"
}
</json>
"""

user, error = to_data(json_response, schema)
print(user["name"])  # "Alice"
print(error)  # None

Invalid JSON produces clear errors:

bad_json = """
{
    "name": "Bob",
    "email": "not-an-email",
    "age": 150,
    "country": "USA",
    "status": "unknown"
}
"""

user, error = to_data(bad_json, schema)
print(error)
# "Expected `str` matching regex '^[^@]+@[^@]+\.[^@]+$' - at `$.email`"

2. Infer Schema with from_data()

Enums are detected by repetition - if the same value appears multiple times, it becomes an enum:

from slimschema import from_data, to_yaml

examples = [
    {"name": "Alice", "status": "active"},
    {"name": "Bob", "status": "inactive"},
    {"name": "Charlie", "status": "active"},
    {"name": "Diana", "status": "inactive"},
]

schema = from_data(examples, name="User")
print(to_yaml(schema))

Output:

# User
name: str
status: active | inactive

3. Convert Formats with to_schema(), to_pydantic(), to_msgspec()

All schema formats are interchangeable:

from slimschema import to_schema, to_pydantic, to_msgspec, to_yaml

# Start with YAML
yaml_schema = """
name: str
age: 18..120
status: active | inactive
"""

# Convert to Pydantic (class-based API)
pydantic_model = to_pydantic(yaml_schema)
user = pydantic_model(name="Alice", age=30, status="active")

# Convert to msgspec (functional API with msgspec.convert)
import msgspec
msgspec_struct = to_msgspec(yaml_schema)
user = msgspec.convert({"name": "Bob", "age": 25, "status": "active"}, type=msgspec_struct)

# Convert between formats
from pydantic import BaseModel, Field
from typing import Literal

class Product(BaseModel):
    name: str
    status: Literal["draft", "active"]

schema = to_schema(Product)
yaml_output = to_yaml(schema)
print(yaml_output)
# # Product
# name: str
# status: draft | active

4. Apply Patches with apply_patch()

Mutate data using JSON Patch (RFC 6902). Paths use JSON Pointer syntax (/field, /nested/field, /array/0).

from slimschema import apply_patch

data = {"name": "Bob", "age": 30, "tags": ["user"]}

patches = [
    {"op": "replace", "path": "/name", "value": "Alice"},
    {"op": "add", "path": "/email", "value": "alice@example.com"},
    {"op": "add", "path": "/tags/-", "value": "admin"},  # append with -
    {"op": "remove", "path": "/age"}
]

result = apply_patch(data, patches)
# {"name": "Alice", "email": "alice@example.com", "tags": ["user", "admin"]}

Operations: add, remove, replace, move (rename/relocate), copy, test (conditional).

YAML Syntax Reference

# Basic types
name: str
age: int
price: float
active: bool

# String constraints
username: str{3..20}          # length constraint
email: email                  # format validator
url: url
uuid: uuid
date: date                    # YYYY-MM-DD
datetime: datetime            # ISO 8601
sku: /^[A-Z]{3}-\d{4}$/      # regex pattern

# Numeric constraints
age: 18..120                  # int range
price: 0.01..99999.99         # float range
quantity: 1..                 # min only (no max)

# Literals (enums)
status: draft | active | archived
role: admin | user

# Arrays
tags: [str]
scores: [int]
items: [str{1..50}]

# Optional fields
bio?: str
updated?: datetime

# Comments
name: str  # User's full name

Configure Inference

from_data() can be configured to control enum detection, range detection, and more:

from slimschema import InferenceConfig, from_data, to_yaml

# Disable enum detection
config = InferenceConfig(detect_enums=False)
schema = from_data(data, config=config)

# Allow up to 10 unique values for enums (default is 5)
config = InferenceConfig(enum_max_cardinality=10)
schema = from_data(data, config=config)

Other options: detect_ranges, detect_formats, max_samples, max_nesting_depth, int_range_max_delta, float_range_max_delta.

Generate LLM Prompts

Create prompts with embedded schemas for structured output:

from slimschema import to_prompt

# Default: <output>```json...```</output> (most robust)
prompt = to_prompt(schema)

# Customize tagging and fencing
prompt = to_prompt(
    schema,
    instruction="Extract user data from the text.",
    tag="xml",           # "xml" or "none"
    tag_name="output",   # Tag name
    fence="fenced",      # "fenced" or "none"
    format_label="json"  # "json", "xml", "csv", "yaml"
)

# Compact version (no instruction text)
prompt = to_prompt_compact(schema)

Tagging strategies:

  • tag="xml" + fence="fenced": <output>```json...```</output> (recommended)
  • tag="none" + fence="fenced": ```json...```
  • tag="xml" + fence="none": <output>...</output>

Robust Extraction

to_data() extracts structured data from LLM responses with multiple fallback strategies:

Supports 4 formats: JSON, CSV, XML, YAML

Multiple tagging strategies (priority order):

  1. XML wrapped fence: <output>```json...```</output>
  2. Fence alone: ```json...```
  3. XML tag alone: <json>...</json>
  4. Raw format detection

Special handling:

  • JSONL/JSON-ND: Newline-delimited JSON objects (with or without commas)
  • CSV: Auto-delimiter detection (,, ;, \t, |)
  • Case-insensitive: All tags and fence labels
  • Flexible fencing: 3-10 backticks supported
from slimschema import to_data

# Works with any tagging strategy
response = '<output>```json\n{"name": "Alice"}\n```</output>'
data, error = to_data(response, schema)

# JSONL support
response = """<json>
{"name": "Alice"}
{"name": "Bob"}
</json>"""
data, error = to_data(response, schema)  # Returns list

See docs/extraction.md for complete documentation.

Installation

# Basic installation
pip install slimschema

# With XML extraction support
pip install slimschema[xml]

All core dependencies (msgspec, pydantic, ruamel.yaml) are included.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

slimschema-0.0.0.dev3.tar.gz (49.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

slimschema-0.0.0.dev3-py3-none-any.whl (27.6 kB view details)

Uploaded Python 3

File details

Details for the file slimschema-0.0.0.dev3.tar.gz.

File metadata

  • Download URL: slimschema-0.0.0.dev3.tar.gz
  • Upload date:
  • Size: 49.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for slimschema-0.0.0.dev3.tar.gz
Algorithm Hash digest
SHA256 936a3c982f03e2930ceb0439ce8e8f89bb4c4ef5fb83ec40492485b3649229e9
MD5 3ec694d7814cfd855b8c6ce0e696515d
BLAKE2b-256 d2b0652cdcd7212284024e1367d1235eac1206396cac91607a2b46c600a98683

See more details on using hashes here.

Provenance

The following attestation bundles were made for slimschema-0.0.0.dev3.tar.gz:

Publisher: on-release-main.yml on botassembly/slimschema

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file slimschema-0.0.0.dev3-py3-none-any.whl.

File metadata

File hashes

Hashes for slimschema-0.0.0.dev3-py3-none-any.whl
Algorithm Hash digest
SHA256 fc3ada58e2153a5c9ba550fbbfa57829949c1fbb0187cdd28b57afaf3fb67629
MD5 b5c07ea1b6d6f6bfbb068661557b0d4a
BLAKE2b-256 0181025b42c416bc2d53faef5504ff5ce751fb384b24bf42c726d65acaa54c9e

See more details on using hashes here.

Provenance

The following attestation bundles were made for slimschema-0.0.0.dev3-py3-none-any.whl:

Publisher: on-release-main.yml on botassembly/slimschema

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page