Token-efficient schema language for LLMs with validation and conversion

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

imaurer

These details have not been verified by PyPI

Development Status
- 3 - Alpha
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Programming Language

Project description

SlimSchema

Compact schemas for LLM-generated JSON.

SlimSchema provides 6 core functions:

to_data(json, schema) - Validate JSON against a schema
from_data(examples) - Infer schema from JSON examples
to_schema(input) - Normalize any format to Schema IR
to_pydantic(schema) - Convert to Pydantic BaseModel
to_msgspec(schema) - Convert to msgspec Struct
apply_patch(data, patches) - Apply JSON Patch mutations (RFC 6902)

Quick Start

pip install slimschema

Define Your Schema

Choose your preferred format - all three work identically:

Option 1: YAML

name: str{1..100}
email: email
age: 18..120
country: str{2..2}
status: active | inactive | pending

Option 2: Pydantic

from pydantic import BaseModel, Field
from typing import Literal

class User(BaseModel):
    name: str = Field(min_length=1, max_length=100)
    email: str
    age: int = Field(ge=18, le=120)
    country: str = Field(min_length=2, max_length=2)
    status: Literal["active", "inactive", "pending"]

Option 3: msgspec

import msgspec
from typing import Literal

class User(msgspec.Struct):
    name: str
    email: str
    age: int
    country: str
    status: Literal["active", "inactive", "pending"]

Core API

1. Validate JSON with `to_data()`

from slimschema import to_data

schema = """
name: str{1..100}
email: email
age: 18..120
country: str{2..2}
status: active | inactive | pending
"""

# Valid JSON
json_response = """
<json>
{
    "name": "Alice",
    "email": "alice@example.com",
    "age": 30,
    "country": "US",
    "status": "active"
}
</json>
"""

user, error = to_data(json_response, schema)
print(user["name"])  # "Alice"
print(error)  # None

Invalid JSON produces clear errors:

bad_json = """
{
    "name": "Bob",
    "email": "not-an-email",
    "age": 150,
    "country": "USA",
    "status": "unknown"
}
"""

user, error = to_data(bad_json, schema)
print(error)
# "Expected `str` matching regex '^[^@]+@[^@]+\.[^@]+$' - at `$.email`"

2. Infer Schema with `from_data()`

Enums are detected by repetition - if the same value appears multiple times, it becomes an enum:

from slimschema import from_data, to_yaml

examples = [
    {"name": "Alice", "status": "active"},
    {"name": "Bob", "status": "inactive"},
    {"name": "Charlie", "status": "active"},
    {"name": "Diana", "status": "inactive"},
]

schema = from_data(examples, name="User")
print(to_yaml(schema))

Output:

# User
name: str
status: active | inactive

3. Convert Formats with `to_schema()`, `to_pydantic()`, `to_msgspec()`

All schema formats are interchangeable:

from slimschema import to_schema, to_pydantic, to_msgspec, to_yaml

# Start with YAML
yaml_schema = """
name: str
age: 18..120
status: active | inactive
"""

# Convert to Pydantic (class-based API)
pydantic_model = to_pydantic(yaml_schema)
user = pydantic_model(name="Alice", age=30, status="active")

# Convert to msgspec (functional API with msgspec.convert)
import msgspec
msgspec_struct = to_msgspec(yaml_schema)
user = msgspec.convert({"name": "Bob", "age": 25, "status": "active"}, type=msgspec_struct)

# Convert between formats
from pydantic import BaseModel, Field
from typing import Literal

class Product(BaseModel):
    name: str
    status: Literal["draft", "active"]

schema = to_schema(Product)
yaml_output = to_yaml(schema)
print(yaml_output)
# # Product
# name: str
# status: draft | active

4. Apply Patches with `apply_patch()`

Mutate data using JSON Patch (RFC 6902). Paths use JSON Pointer syntax (/field, /nested/field, /array/0).

from slimschema import apply_patch

data = {"name": "Bob", "age": 30, "tags": ["user"]}

patches = [
    {"op": "replace", "path": "/name", "value": "Alice"},
    {"op": "add", "path": "/email", "value": "alice@example.com"},
    {"op": "add", "path": "/tags/-", "value": "admin"},  # append with -
    {"op": "remove", "path": "/age"}
]

result = apply_patch(data, patches)
# {"name": "Alice", "email": "alice@example.com", "tags": ["user", "admin"]}

Operations: add, remove, replace, move (rename/relocate), copy, test (conditional).

YAML Syntax Reference

# Basic types
name: str
age: int
price: float
active: bool

# String constraints
username: str{3..20}          # length constraint
email: email                  # format validator
url: url
uuid: uuid
date: date                    # YYYY-MM-DD
datetime: datetime            # ISO 8601
sku: /^[A-Z]{3}-\d{4}$/      # regex pattern

# Numeric constraints
age: 18..120                  # int range
price: 0.01..99999.99         # float range
quantity: 1..                 # min only (no max)

# Literals (enums)
status: draft | active | archived
role: admin | user

# Arrays
tags: [str]
scores: [int]
items: [str{1..50}]

# Optional fields
bio?: str
updated?: datetime

# Comments
name: str  # User's full name

Configure Inference

from_data() can be configured to control enum detection, range detection, and more:

from slimschema import InferenceConfig, from_data, to_yaml

# Disable enum detection
config = InferenceConfig(detect_enums=False)
schema = from_data(data, config=config)

# Allow up to 10 unique values for enums (default is 5)
config = InferenceConfig(enum_max_cardinality=10)
schema = from_data(data, config=config)

Other options: detect_ranges, detect_formats, max_samples, max_nesting_depth, int_range_max_delta, float_range_max_delta.

Generate LLM Prompts

Create prompts with embedded schemas for structured output:

from slimschema import to_prompt

# Default: <output>```json...```</output> (most robust)
prompt = to_prompt(schema)

# Customize tagging and fencing
prompt = to_prompt(
    schema,
    instruction="Extract user data from the text.",
    tag="xml",           # "xml" or "none"
    tag_name="output",   # Tag name
    fence="fenced",      # "fenced" or "none"
    format_label="json"  # "json", "xml", "csv", "yaml"
)

# Compact version (no instruction text)
prompt = to_prompt_compact(schema)

Tagging strategies:

tag="xml" + fence="fenced": <output>```json...```</output> (recommended)
tag="none" + fence="fenced": ```json...```
tag="xml" + fence="none": <output>...</output>

Robust Extraction

to_data() extracts structured data from LLM responses with multiple fallback strategies:

Supports 4 formats: JSON, CSV, XML, YAML

Multiple tagging strategies (priority order):

XML wrapped fence: <output>```json...```</output>
Fence alone: ```json...```
XML tag alone: <json>...</json>
Raw format detection

Special handling:

JSONL/JSON-ND: Newline-delimited JSON objects (with or without commas)
CSV: Auto-delimiter detection (,, ;, \t, |)
Case-insensitive: All tags and fence labels
Flexible fencing: 3-10 backticks supported

from slimschema import to_data

# Works with any tagging strategy
response = '<output>```json\n{"name": "Alice"}\n```</output>'
data, error = to_data(response, schema)

# JSONL support
response = """<json>
{"name": "Alice"}
{"name": "Bob"}
</json>"""
data, error = to_data(response, schema)  # Returns list

See docs/extraction.md for complete documentation.

Installation

# Basic installation
pip install slimschema

# With XML extraction support
pip install slimschema[xml]

All core dependencies (msgspec, pydantic, ruamel.yaml) are included.

License

MIT

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

imaurer

These details have not been verified by PyPI

Development Status
- 3 - Alpha
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Programming Language

Release history Release notifications | RSS feed

0.0.1.dev9 pre-release

Jan 8, 2026

0.0.1.dev7 pre-release

Jan 8, 2026

0.0.1.dev4 pre-release

Jan 6, 2026

0.0.1.dev0 pre-release

Jan 6, 2026

0.0.0.dev12 pre-release

Dec 4, 2025

0.0.0.dev10 pre-release

Nov 25, 2025

0.0.0.dev9 pre-release

Nov 22, 2025

0.0.0.dev8 pre-release

Nov 21, 2025

0.0.0.dev7 pre-release

Nov 20, 2025

0.0.0.dev6 pre-release

Nov 20, 2025

0.0.0.dev5 pre-release

Nov 20, 2025

0.0.0.dev4 pre-release

Nov 19, 2025

This version

0.0.0.dev3 pre-release

Nov 17, 2025

0.0.0.dev2 pre-release

Nov 16, 2025

0.0.0.dev1 pre-release

Nov 15, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

slimschema-0.0.0.dev3.tar.gz (49.5 kB view details)

Uploaded Nov 17, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

slimschema-0.0.0.dev3-py3-none-any.whl (27.6 kB view details)

Uploaded Nov 17, 2025 Python 3

File details

Details for the file slimschema-0.0.0.dev3.tar.gz.

File metadata

Download URL: slimschema-0.0.0.dev3.tar.gz
Upload date: Nov 17, 2025
Size: 49.5 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for slimschema-0.0.0.dev3.tar.gz
Algorithm	Hash digest
SHA256	`936a3c982f03e2930ceb0439ce8e8f89bb4c4ef5fb83ec40492485b3649229e9`
MD5	`3ec694d7814cfd855b8c6ce0e696515d`
BLAKE2b-256	`d2b0652cdcd7212284024e1367d1235eac1206396cac91607a2b46c600a98683`

See more details on using hashes here.

Provenance

The following attestation bundles were made for slimschema-0.0.0.dev3.tar.gz:

Publisher: on-release-main.yml on botassembly/slimschema

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: slimschema-0.0.0.dev3.tar.gz
- Subject digest: 936a3c982f03e2930ceb0439ce8e8f89bb4c4ef5fb83ec40492485b3649229e9
- Sigstore transparency entry: 705963664
- Sigstore integration time: Nov 17, 2025
Source repository:
- Permalink: botassembly/slimschema@941f2efef81317deeca82e4a64aaee2c8a172f20
- Branch / Tag: refs/tags/v0.0.0dev3
- Owner: https://github.com/botassembly
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: on-release-main.yml@941f2efef81317deeca82e4a64aaee2c8a172f20
- Trigger Event: release

File details

Details for the file slimschema-0.0.0.dev3-py3-none-any.whl.

File metadata

Download URL: slimschema-0.0.0.dev3-py3-none-any.whl
Upload date: Nov 17, 2025
Size: 27.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for slimschema-0.0.0.dev3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`fc3ada58e2153a5c9ba550fbbfa57829949c1fbb0187cdd28b57afaf3fb67629`
MD5	`b5c07ea1b6d6f6bfbb068661557b0d4a`
BLAKE2b-256	`0181025b42c416bc2d53faef5504ff5ce751fb384b24bf42c726d65acaa54c9e`

See more details on using hashes here.

Provenance

The following attestation bundles were made for slimschema-0.0.0.dev3-py3-none-any.whl:

Publisher: on-release-main.yml on botassembly/slimschema

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: slimschema-0.0.0.dev3-py3-none-any.whl
- Subject digest: fc3ada58e2153a5c9ba550fbbfa57829949c1fbb0187cdd28b57afaf3fb67629
- Sigstore transparency entry: 705963670
- Sigstore integration time: Nov 17, 2025
Source repository:
- Permalink: botassembly/slimschema@941f2efef81317deeca82e4a64aaee2c8a172f20
- Branch / Tag: refs/tags/v0.0.0dev3
- Owner: https://github.com/botassembly
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: on-release-main.yml@941f2efef81317deeca82e4a64aaee2c8a172f20
- Trigger Event: release

slimschema 0.0.0.dev3

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

SlimSchema

Quick Start

Define Your Schema

Core API

1. Validate JSON with to_data()

2. Infer Schema with from_data()

3. Convert Formats with to_schema(), to_pydantic(), to_msgspec()

4. Apply Patches with apply_patch()

YAML Syntax Reference

Configure Inference

Generate LLM Prompts

Robust Extraction

Installation

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

1. Validate JSON with `to_data()`

2. Infer Schema with `from_data()`

3. Convert Formats with `to_schema()`, `to_pydantic()`, `to_msgspec()`

4. Apply Patches with `apply_patch()`