Skip to main content

A simple, LLM-friendly schema definition library for converting string syntax to structured schemas

Project description

String Schema

A simple, LLM-friendly schema definition library for Python that converts intuitive string syntax into structured data schemas.

๐Ÿš€ Quick Start

Installation

pip install string-schema

30-Second Example

from string_schema import string_to_json_schema

# Define a schema with simple, readable syntax
schema = string_to_json_schema("name:string, email:email, age:int?")

# Get a complete JSON Schema ready for validation
print(schema)
# {
#   "type": "object",
#   "properties": {
#     "name": {"type": "string"},
#     "email": {"type": "string", "format": "email"},
#     "age": {"type": "integer"}
#   },
#   "required": ["name", "email"]
# }

Create Pydantic Models Directly

from string_schema import string_to_model

# Create Pydantic model from string syntax
UserModel = string_to_model("name:string, email:email, age:int?")

# Use immediately
user = UserModel(name="Alice", email="alice@example.com")
print(user.model_dump_json())  # {"name": "Alice", "email": "alice@example.com"}

๐ŸŽฏ What String Schema Does

String Schema takes human-readable text descriptions and converts them into structured schemas for data validation, extraction, and API documentation. Perfect for LLM data extraction, API development, and configuration validation.

Input: Human-readable string syntax Output: JSON Schema, Pydantic models, or OpenAPI specifications

๐Ÿš€ Core Functions & Use Cases

๐Ÿ“ Schema Conversion Matrix

Forward Conversions (Source โ†’ Target):

Function Input Output Use Case
string_to_json_schema() String syntax JSON Schema dict Main conversion - string to JSON Schema
string_to_model() String syntax Pydantic model class Direct path - string to Pydantic model
string_to_model_code() String syntax Python code string Code generation - for templates
string_to_openapi() String syntax OpenAPI schema dict Direct path - string to OpenAPI
json_schema_to_model() JSON Schema dict Pydantic model class When you already have JSON Schema
json_schema_to_openapi() JSON Schema dict OpenAPI schema dict When you already have JSON Schema

Reverse Conversions (Target โ†’ Source):

Function Input Output Use Case
model_to_string() Pydantic model String syntax Schema introspection - model to string
model_to_json_schema() Pydantic model JSON Schema dict Export - model to JSON Schema
json_schema_to_string() JSON Schema dict String syntax Migration - JSON Schema to string
openapi_to_string() OpenAPI schema String syntax Import - OpenAPI to string
openapi_to_json_schema() OpenAPI schema JSON Schema dict Conversion - OpenAPI to JSON Schema

๐Ÿ” Data Validation Functions

Function Input Output Use Case
validate_to_dict() Data + schema Validated dict API responses - clean dicts
validate_to_model() Data + schema Pydantic model Business logic - typed models
validate_string_syntax() String syntax Validation result Check syntax and get feedback

๐ŸŽจ Function Decorators

Decorator Purpose Returns Use Case
@returns_dict() Auto-validate to dict Validated dict API endpoints
@returns_model() Auto-validate to model Pydantic model Business logic

๐Ÿ”ง Utility Functions

Function Purpose Returns Use Case
get_model_info() Model introspection Model details dict Debugging & analysis
validate_schema_compatibility() Schema validation Compatibility info Schema validation

๐ŸŽฏ Key Scenarios

  • ๐Ÿค– LLM Data Extraction: Define extraction schemas that LLMs can easily follow
  • ๐Ÿ”ง API Development: Generate Pydantic models and OpenAPI docs from simple syntax
  • โœ… Data Validation: Create robust validation schemas with minimal code
  • ๐Ÿ“‹ Configuration: Define and validate application configuration schemas
  • ๐Ÿ”„ Data Transformation: Convert between different schema formats

โœ… Validate Data

from string_schema import validate_to_dict, validate_to_model

# Example raw data (from API, user input, etc.)
raw_data = {
    "name": "John Doe",
    "email": "john@example.com",
    "age": "25",  # String that needs conversion
    "extra_field": "ignored"  # Will be filtered out
}

# Validate to clean dictionaries (perfect for API responses)
user_dict = validate_to_dict(raw_data, "name:string, email:email, age:int?")
print(user_dict)  # {"name": "John Doe", "email": "john@example.com", "age": 25}

# Validate to typed models (perfect for business logic)
user_model = validate_to_model(raw_data, "name:string, email:email, age:int?")
print(user_model.name)  # "John Doe" - Full type safety
print(user_model.age)   # 25 - Converted to int

๐ŸŽจ Function Decorators

from string_schema import returns_dict, returns_model
import uuid

# Auto-validate function returns to dicts
@returns_dict("id:string, name:string, active:bool")
def create_user(name):
    # Input: "Alice"
    # Function returns unvalidated dict
    return {"id": str(uuid.uuid4()), "name": name, "active": True, "extra": "ignored"}
    # Output: {"id": "123e4567-...", "name": "Alice", "active": True}
    # Note: 'extra' field filtered out, types validated

# Auto-validate function returns to models
@returns_model("name:string, email:string")
def process_user(raw_input):
    # Input: {"name": "Bob", "email": "bob@test.com", "junk": "data"}
    # Function returns unvalidated dict
    return {"name": raw_input["name"], "email": raw_input["email"], "junk": "data"}
    # Output: UserModel(name="Bob", email="bob@test.com")
    # Note: Returns typed Pydantic model, 'junk' field filtered out

# Usage examples:
user_dict = create_user("Alice")  # Returns validated dict
user_model = process_user({"name": "Bob", "email": "bob@test.com"})  # Returns Pydantic model
print(user_model.name)  # "Bob" - Full type safety

๐ŸŒ FastAPI Integration

from string_schema import string_to_model, returns_dict

# Create models for FastAPI
UserRequest = string_to_model("name:string, email:email")

@app.post("/users")
@returns_dict("id:int, name:string, email:string")
def create_user_endpoint(user: UserRequest):
    return {"id": 123, "name": user.name, "email": user.email}

Features: Arrays [{name:string}], nested objects {profile:{bio:text?}}, enums, constraints, decorators.

๐Ÿ“– Complete Documentation

๐ŸŽ“ More Examples

๐ŸŒฑ Simple Example - Basic User Data

from string_schema import string_to_json_schema

# Define schema using intuitive string syntax
schema = string_to_json_schema("""
name:string(min=1, max=100),
email:email,
age:int(0, 120)?,
active:bool
""")

# Result: JSON Schema dictionary ready for validation
print(schema)
# {
#   "type": "object",
#   "properties": {
#     "name": {"type": "string", "minLength": 1, "maxLength": 100},
#     "email": {"type": "string", "format": "email"},
#     "age": {"type": "integer", "minimum": 0, "maximum": 120},
#     "active": {"type": "boolean"}
#   },
#   "required": ["name", "email", "active"]
# }

๐ŸŒณ Moderately Complex Example - Product Data

# Product with arrays and nested objects
schema = string_to_json_schema("""
{
    id:uuid,
    name:string,
    price:number(min=0),
    tags:[string]?,
    inventory:{
        in_stock:bool,
        quantity:int?
    }
}
""")

# Result: Complete JSON Schema with nested objects and arrays
print(schema)  # Full JSON Schema ready for validation

๐Ÿ“š For more examples and advanced syntax, see our detailed documentation

๐Ÿ”„ Output Formats & Results

๐Ÿ Pydantic Models (Python Classes)

from string_schema import string_to_model

# Direct conversion: String syntax โ†’ Pydantic model
UserModel = string_to_model("name:string, email:email, active:bool")

# Use the model for validation
user = UserModel(name="John Doe", email="john@example.com", active=True)
print(user.model_dump_json())  # {"name": "John Doe", "email": "john@example.com", "active": true}

๐Ÿ”ง Code Generation (For Templates & Tools)

from string_schema import string_to_model_code

# Generate Pydantic model code as a string
code = string_to_model_code("User", "name:string, email:email, active:bool")
print(code)

# Output:
# from pydantic import BaseModel
# from typing import Optional
#
# class User(BaseModel):
#     name: str
#     email: str
#     active: bool

# Perfect for code generators, templates, or saving to files
with open('models.py', 'w') as f:
    f.write(code)

๐ŸŒ OpenAPI Schemas (API Documentation)

from string_schema import string_to_openapi

# Direct conversion: String syntax โ†’ OpenAPI schema
openapi_schema = string_to_openapi("name:string, email:email")
print(openapi_schema)
# {
#   "type": "object",
#   "properties": {
#     "name": {"type": "string"},
#     "email": {"type": "string", "format": "email"}
#   },
#   "required": ["name", "email"]
# }

๐Ÿ”„ Reverse Conversions (Universal Schema Converter)

String Schema provides complete bidirectional conversion between all schema formats!

โš ๏ธ Information Loss Notice: Reverse conversions (from JSON Schema/OpenAPI/Pydantic back to string syntax) may lose some information due to format differences. However, the resulting schemas are designed to cover the most common use cases and maintain functional equivalence for typical validation scenarios.

๐Ÿ” Schema Introspection

from string_schema import model_to_string, string_to_model

# Create a model first
UserModel = string_to_model("name:string, email:email, active:bool")

# Reverse engineer it back to string syntax
schema_string = model_to_string(UserModel)
print(f"Model schema: {schema_string}")
# Output: "name:string, email:email, active:bool"

๐Ÿ“ฆ Migration & Import

from string_schema import json_schema_to_string

# Example: Convert existing JSON Schema to String Schema syntax
json_schema = {
    "type": "object",
    "properties": {
        "name": {"type": "string"},
        "email": {"type": "string", "format": "email"},
        "age": {"type": "integer"}
    },
    "required": ["name", "email"]
}

simple_syntax = json_schema_to_string(json_schema)
print(f"Converted: {simple_syntax}")
# Output: "name:string, email:email, age:int?"

๐Ÿ”ง Schema Comparison & Analysis

from string_schema import string_to_model, model_to_string

# Create two versions of a model
UserV1 = string_to_model("name:string, email:email")
UserV2 = string_to_model("name:string, email:email, active:bool")

# Compare them
v1_str = model_to_string(UserV1)
v2_str = model_to_string(UserV2)

print("Schema changes:")
print(f"V1: {v1_str}")  # "name:string, email:email"
print(f"V2: {v2_str}")  # "name:string, email:email, active:bool"

๐ŸŽจ String Syntax Reference

Basic Types

  • string, int, number, bool โ†’ Basic data types
  • email, url, datetime, date, uuid, phone โ†’ Special validated types

Field Modifiers

  • field_name:type โ†’ Required field
  • field_name:type? โ†’ Optional field
  • field_name:type(constraints) โ†’ Field with validation

Common Patterns

  • string(min=1, max=100) โ†’ Length constraints
  • int(0, 120) โ†’ Range constraints
  • [string] โ†’ Simple arrays
  • [{name:string, email:email}] โ†’ Object arrays
  • status:enum(active, inactive) โ†’ Enum values
  • id:string|uuid โ†’ Union types

๐Ÿ“– Complete syntax guide: See docs/string-syntax.md for full reference

โœ… Validation

from string_schema import validate_string_syntax

# Validate your schema syntax
result = validate_string_syntax("name:string, email:email, age:int?")

print(f"Valid: {result['valid']}")  # True
print(f"Features used: {result['features_used']}")  # ['basic_types', 'optional_fields']
print(f"Field count: {len(result['parsed_fields'])}")  # 3

# Example with invalid syntax
bad_result = validate_string_syntax("name:invalid_type")
print(f"Valid: {bad_result['valid']}")  # False
print(f"Errors: {bad_result['errors']}")  # ['Unknown type: invalid_type']

๐Ÿ—๏ธ Common Use Cases

๐Ÿค– LLM Data Extraction

from string_schema import string_to_json_schema

# Define what data to extract
schema = string_to_json_schema("company:string, employees:[{name:string, email:email}], founded:int?")
# Use schema in LLM prompts for structured data extraction

๐Ÿ”ง FastAPI Development

from string_schema import string_to_model

# Create model for FastAPI
UserModel = string_to_model("name:string, email:email")

# Use in FastAPI endpoints
@app.post("/users/")
async def create_user(user: UserModel):
    return {"id": 123, "name": user.name, "email": user.email}

๐Ÿ—๏ธ Code Generation & Templates

from string_schema import string_to_model_code

# Generate model code
code = string_to_model_code("User", "name:string, email:email")
print(code)
# Output: Complete Pydantic model class as string

# Save to file
with open('user_model.py', 'w') as f:
    f.write(code)

๐Ÿ“‹ Configuration Validation

from string_schema import string_to_json_schema

# Validate app configuration
config_schema = string_to_json_schema("database:{host:string, port:int}, debug:bool")
# Use for validating config files

๐Ÿ“š Documentation

๐Ÿ“‹ Example Schemas

Ready-to-use schema examples are available in the examples/ directory:

# Import example schemas if needed
from string_schema.examples.presets import user_schema, product_schema
from string_schema.examples.recipes import create_ecommerce_product_schema

# Or better yet, use string syntax directly:
user_schema = string_to_json_schema("name:string, email:email, age:int?")

๐Ÿงช Testing

The library includes comprehensive tests covering all functionality:

# Run tests (requires pytest)
pip install pytest
pytest tests/

# Results: 66 tests passed, 3 skipped (Pydantic tests when not installed)

๐Ÿค Contributing

Contributions are welcome! The codebase is well-organized and documented:

string_schema/
โ”œโ”€โ”€ core/          # Core functionality (fields, builders, validators)
โ”œโ”€โ”€ parsing/       # String parsing and syntax
โ”œโ”€โ”€ integrations/  # Pydantic, JSON Schema, OpenAPI
โ””โ”€โ”€ examples/      # Built-in schemas and recipes

๐Ÿ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.


String Schema - Making data validation simple, intuitive, and LLM-friendly! ๐Ÿš€

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

string_schema-0.1.3.tar.gz (73.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

string_schema-0.1.3-py3-none-any.whl (52.1 kB view details)

Uploaded Python 3

File details

Details for the file string_schema-0.1.3.tar.gz.

File metadata

  • Download URL: string_schema-0.1.3.tar.gz
  • Upload date:
  • Size: 73.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for string_schema-0.1.3.tar.gz
Algorithm Hash digest
SHA256 11480d2acf0f23eaa3e830b188337fc04877e142959b857820602c88683f3102
MD5 dd0dc3edb94c6e8de9bfb2913eb05a2f
BLAKE2b-256 87013128e478c828e5bb847dadf5aa8f2f771b439c75dcc666f34f9f9cbb9159

See more details on using hashes here.

File details

Details for the file string_schema-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: string_schema-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 52.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for string_schema-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 b5115d6012029d11341ca49ea5d79d35f57efef06ea107ca4ed9c67b2dfae6f7
MD5 cf38bd4bf03b9822ccdc4957cdfa4d9a
BLAKE2b-256 05d6dcb5202e092d7aed295b0613337350f7fd070b1fd7a2bc0eca99cbd0a004

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page