A Python package for data contract management with five core services: contract parsing, metadata storage, Pydantic generation, JSON Schema conversion, and runtime validation

These details have not been verified by PyPI

Project description

PyCharter

Data contract management and validation for Python: define schemas, enforce quality, and run pipelines with contracts.

⚡ Quick start (2 minutes)

Install, define a tiny contract, and validate one record. Copy-paste into a new terminal:

pip install pycharter

from pycharter import from_dict, validate

# 1. Define a minimal contract (schema)
schema = {
    "type": "object",
    "version": "1.0.0",
    "properties": {
        "name": {"type": "string"},
        "age": {"type": "integer"},
    },
    "required": ["name", "age"],
}

# 2. Build a validator and validate one record
Person = from_dict(schema, "Person")
result = validate(Person, {"name": "Alice", "age": 30})

print(result.is_valid)   # True
print(result.data.name) # Alice

Invalid data returns result.is_valid == False and result.errors with details. Next: use a contract file with Validator.from_file("contract.yaml"), or add coercion/validation rules and store contracts in a contract store (see Concepts and Installation).

What is PyCharter?

PyCharter is a data contract-as-code platform that brings structure, validation, and governance to data pipelines and applications. At its core, PyCharter automatically converts JSON schemas into fully-functional Pydantic models, fully supporting the JSON Schema Draft 2020-12 standard — including all standard validation keywords (minLength, maxLength, pattern, enum, minimum, maximum, etc.) — while also providing extensions for pre-validation coercion and post-validation checks. All validation logic is stored as data (not Python code). PyCharter also includes a comprehensive quality assurance module for monitoring data quality, tracking violations, and generating quality reports.

It provides:

Data Contract Definition & Management: Define formal agreements that specify data structure, quality rules, and governance policies
Schema Registry: Centralized storage and versioning of data schemas with support for schema evolution
Data Quality Enforcement: Coercion rules (data transformation) and validation rules (business constraints) to ensure data integrity
Data Governance: Track ownership, stewardship, and enforce data policies across your organization
Metadata Management: Store and retrieve data about data (schemas, ownership, rules, lineage)
Runtime Validation: Validate data against contracts in production pipelines, APIs, and data processes

Data Terminology

PyCharter implements key data management concepts:

Data Contract: Formal specification of data structure, quality rules, and governance policies that define the "contract" between data producers and consumers
Schema Registry: Centralized repository for storing and versioning data schemas, enabling schema discovery and evolution tracking
Data Quality: Coercion (pre-validation transformation) and validation (post-validation checks) rules that ensure data integrity
Data Governance: Ownership tracking, stewardship management, and policy enforcement for data assets
Schema Evolution: Versioning and migration capabilities that allow data structures to evolve over time while maintaining backward compatibility
Metadata Management: Storage and retrieval of data about data, including schemas, ownership information, governance rules, and lineage

Use Cases

Data Pipeline Validation: Ensure data conforms to contracts before processing in pipelines
API Contract Enforcement: Validate API request/response data against defined contracts
Data Integration: Standardize data formats across systems and services
Compliance & Governance: Track ownership, enforce data policies, and maintain audit trails
Schema Registry: Centralized schema management for microservices and data platforms
Data Quality Assurance: Catch data quality issues early in the pipeline through automated validation

📐 Concepts

A short mental model so you know what to reach for.

Concept	What it is	When you use it
Schema	The shape of the data (JSON Schema): types, required fields, nested objects.	When you only need structure (e.g. “this field is string, that one is integer”).
Data contract	Schema + coercion rules (e.g. string → int) + validation rules (e.g. min/max, allowed values) + optional metadata (ownership, governance).	When you want one artifact that defines structure, transforms, and business rules.
Metadata store	A database (SQLite, PostgreSQL, etc.) that stores contracts (and their versions) so many apps can reuse them.	When you have multiple services or pipelines and want a single source of truth.
ETL validation	Validating data after extract (source) and before load (target) in a pipeline, using a schema or contract.	When you run ETL and want to reject or quarantine bad rows at stage boundaries.

Flow from “just validate” to “contracts in a store and ETL”:

  ┌─────────────────────────────────────────────────────────────────────────┐
  │  Option A: No database                                                   │
  │  Schema/contract in code or YAML file  →  Validator  →  validate(data)  │
  └─────────────────────────────────────────────────────────────────────────┘

  ┌─────────────────────────────────────────────────────────────────────────┐
  │  Option B: With contract store                                           │
  │  Contract in DB  →  Validator(store=..., schema_id=...)  →  validate()  │
  └─────────────────────────────────────────────────────────────────────────┘

  ┌─────────────────────────────────────────────────────────────────────────┐
  │  Option C: ETL pipeline                                                  │
  │  Extract  →  [optional: validate with contract]  →  Transform  →         │
  │  [optional: validate with contract]  →  Load                            │
  └─────────────────────────────────────────────────────────────────────────┘

Start with Option A (Quick start above); add contract store when you need versioned, shared contracts; add ETL validation when you run pipelines and want contract checks at extract/load.

✨ Features

🚀 Dynamic Model Generation - Convert JSON schemas to Pydantic models at runtime
📋 JSON Schema Compliant - Full support for JSON Schema Draft 2020-12 standard
🔄 Type Coercion - Automatic type conversion before validation (e.g., string → integer)
✅ Custom Validators - Built-in and extensible validation rules
🏗️ Nested Structures - Full support for nested objects and arrays
📦 Multiple Input Formats - Load schemas from dicts, JSON strings, files, or URLs
🎯 Type Safe - Full type hints and Pydantic v2 compatibility
🔧 Extensible - Register custom coercion and validation functions
📊 Data-Driven - All validation logic stored as JSON data, not Python code
📝 Data Contract Management - Complete lifecycle management for data contracts with versioning
💾 Schema Registry - Centralized schema storage with support for PostgreSQL, MongoDB, Redis, and InMemory
🏛️ Data Governance - Track ownership, stewardship, and enforce governance policies
🔍 Metadata Management - Store and query metadata about your data assets
📈 Schema Evolution - Version schemas and track changes over time
✅ Quality Assurance - Monitor data quality, calculate metrics, track violations, and generate reports
📊 Quality Metrics - Calculate quality scores, accuracy, completeness, and violation rates
🚨 Threshold Alerting - Set quality thresholds and get alerts when quality degrades
📐 Ontology & Semantic Layer - Concept schemes, concept types, relationships, field bindings, and a diagrammatic ontology workspace in the UI
📊 Pipeline Diagram Editor - Figma Jam–style visual editor for ETL pipelines (extract/transform/load nodes with labels and annotations) in the Pipelines section of the UI

📦 Installation

Core Library

pip install pycharter

With API Support

pip install pycharter[api]

This installs FastAPI and Uvicorn for running the REST API server.

Full install (all extras)

pip install pycharter[all]

Installs api, worker, ui, pipeline, postgres, docs, ontology, lineage, streaming, and messaging (Kafka + RabbitMQ). Use this when you want the whole toolkit; for smaller installs, pick individual extras below.

With UI Support

pip install pycharter[ui]

This installs the Python dependencies and pre-built UI static files (like Airflow).

After installation, you can immediately start the UI:

pycharter ui serve    # Production mode (uses pre-built static files)

For development (if you have the source code):

cd src/pycharter/ui
npm install          # Install Node.js dependencies
cd ../../../       # back to repository root (src/pycharter/ui → repo root)
pycharter ui dev     # Development mode with hot reload

Note: When installed from pip, the UI works immediately without Node.js. For development with hot reload, Node.js is required; see the UI README for setup.

Database setup (for contract store, API, and seed data)

If you use the contract store (SQLite or PostgreSQL), the REST API, or seed data, initialize the database once:

# Initialize database schema (SQLite default: sqlite:///pycharter.db)
pycharter db init

# Seed reference data (owners, domains, systems, environments, data feeds, compliance frameworks, tags)
pycharter db seed

Default seed directory (when you omit the path) is the bundled data/seed inside the pycharter package (src/pycharter/data/seed in the repo). It loads reference data, contracts, pipelines, and semantic seed YAML in one step. To use a custom seed directory: pycharter db seed /path/to/seed [database_url].

The data/aviation_examples/ tree at the repository root holds sample contract artifacts (files on disk) for tutorials and demos. It is not loaded automatically by pycharter db seed; import those contracts through the Web UI or the REST API as needed. See Seed data and sample files.

Use pycharter db init --force to drop and recreate all tables (SQLite only; destroys existing data). See Configuration Guide for connection options and migrations.

🚀 Quick Start

Quick Start: ETL Pipelines

Build and run ETL pipelines programmatically (with the | operator) or from YAML configs. Pipeline run() is async; use asyncio.run() from scripts or await in async code.

import asyncio
from pycharter import Pipeline, HTTPExtractor, PostgresLoader, Rename, AddField

# Programmatic pipeline
pipeline = (
    Pipeline(HTTPExtractor(url="https://api.example.com/data"))
    | Rename({"old": "new"})
    | AddField("processed_at", "now()")
    | PostgresLoader(connection_string="...", table="users")
)
result = asyncio.run(pipeline.run())

# Config-driven: explicit files
pipeline = Pipeline.from_config_files(
    extract="configs/extract.yaml",
    load="configs/load.yaml",
    variables={"API_KEY": "secret"}
)

# Config-driven: directory (extract.yaml, transform.yaml, load.yaml)
pipeline = Pipeline.from_config_dir("pipelines/users/")

# Config-driven: single file
pipeline = Pipeline.from_config_file("pipelines/users/pipeline.yaml")

result = asyncio.run(pipeline.run())

See ETL Pipelines under Core Services for error handling (error_context, ErrorMode) and variable resolution (PipelineContext(variables={...})).

Quick Start: Convenience Functions (One-off Use)

from pycharter import from_dict, validate

# Define your JSON schema
schema = {
    "type": "object",
    "version": "1.0.0",
    "properties": {
        "name": {"type": "string"},
        "age": {"type": "integer"},
        "email": {"type": "string"}
    },
    "required": ["name", "age"]
}

# Generate a Pydantic model (convenience function)
Person = from_dict(schema, "Person")

# Validate data
result = validate(Person, {"name": "Alice", "age": 30, "email": "alice@example.com"})
if result.is_valid:
    print(f"Valid: {result.data.name}")  # Output: Valid: Alice

Production Use: Validator Class (Recommended)

For production code with multiple validations, use the Validator class for better performance. Create validators via factory methods or from a contract store:

from pycharter import Validator

# From directory (expects schema.yaml, coercion_rules.yaml, validation_rules.yaml)
validator = Validator.from_dir("data/contracts/user")

# From explicit files (any filenames)
validator = Validator.from_files(
    schema="schemas/user.yaml",
    coercion_rules="rules/coercion.yaml",
    validation_rules="rules/validation.yaml"
)

# From a single contract file
validator = Validator.from_file("user_contract.yaml")

# From dictionaries
validator = Validator.from_dict(schema={...}, coercion_rules={...}, validation_rules={...})

# From contract store (with database)
validator = Validator(store=store, schema_id="user_schema_v1")

# Validate multiple records efficiently (model is cached)
result1 = validator.validate({"name": "Alice", "age": 30})
result2 = validator.validate({"name": "Bob", "age": 25})

# Batch validation
results = validator.validate_batch([data1, data2, data3])

With Metadata Store

from pycharter import Validator, SQLiteContractStore

# Connect to contract store
store = SQLiteContractStore("metadata.db")
store.connect()

# Create validator from store
validator = Validator(store=store, schema_id="user_schema_v1")

# Validate data
result = validator.validate({"name": "Alice", "age": 30})

Using ValidatorBuilder (Fluent API)

ValidatorBuilder provides a chainable interface for configuring validators with quality checks, state-specific rules, and metrics tracking:

from pycharter import ValidatorBuilder

validator = (
    ValidatorBuilder()
    .from_dir("contracts/order")
    .with_quality_checks(thresholds={"completeness": 0.95})
    .with_state_rules({
        "DRAFT":  {"optional": ["filled_qty", "fill_price"]},
        "FILLED": {"required": ["filled_qty", "fill_price"]},
    })
    .strict()
    .build()
)

result = validator.validate({"symbol": "AAPL", "order_qty": 100})

# State-aware validation: different required fields per FSM state
result_draft  = validator.validate_for_state({"symbol": "AAPL", "order_qty": 100}, "DRAFT")
result_filled = validator.validate_for_state(
    {"symbol": "AAPL", "order_qty": 100, "filled_qty": 100, "fill_price": 150.0}, "FILLED"
)

validate_for_state(data, state) applies per-state overrides (fields become optional in DRAFT, required in FILLED) on top of the base schema — useful when the same entity has different validation rules at different lifecycle stages.

Domain Lifecycle & FSM Integration

PyCharter supports linking data contracts to finite state machines (e.g. PyStator) via a lifecycle binding convention in contract metadata:

# In your contract YAML
metadata:
  governance_rules:
    lifecycle:
      state_machine_name: order_management
      machine_version: "1.0.0"
      state_field: status          # field in your data that holds the FSM state
      entity_id_field: order_id

from pycharter import check_state_alignment, validate_lifecycle_binding, get_lifecycle_binding

# Validate that contract enum values match your FSM states exactly
fsm_states = {"PENDING", "OPEN", "FILLED", "CANCELLED"}
result = check_state_alignment(contract_dict, fsm_states, state_field="status")
# result.aligned: bool, result.missing_from_contract: set, result.missing_from_fsm: set

# Validate the lifecycle binding structure
errors = validate_lifecycle_binding(contract_dict.get("metadata", {}))
# Returns [] if valid, or a list of error strings

# Read the binding
binding = get_lifecycle_binding(contract_dict.get("metadata", {}))
# binding.state_machine_name, binding.machine_version, binding.state_field, ...

📐 API Organization

PyCharter's API is organized into three tiers to help you choose the right approach for your use case:

Tier 1: Primary Interfaces (⭐ Recommended for Production)

Classes that provide the best performance and most features:

Validator - Primary validation interface (use for multiple validations); create via from_dir(), from_files(), from_file(), from_dict() or from store
Pipeline - ETL pipeline (programmatic or config-driven); create via from_config_files(), from_config_dir(), from_config_file() or constructor
QualityCheck - Primary quality assurance interface
ContractStoreClient - Base class for contract stores

When to use: Production code, batch processing, when you need to validate multiple records or run ETL pipelines.

Tier 2: Convenience Functions (Quick Start)

Functions that make common tasks easy and discoverable:

Input helpers: from_dict(), from_file(), from_json(), from_url()
Output helpers: to_dict(), to_file(), to_json()
Validation helpers: validate_with_store(), validate_with_contract()
Contract helpers: parse_contract_file(), build_contract()

When to use: Quick scripts, one-off validations, exploratory work, learning the library.

Tier 3: Low-Level Utilities

Functions for when you already have models or need fine-grained control:

validate() - Validate with existing Pydantic model
validate_batch() - Batch validate with existing model
model_to_schema() - Core conversion function

When to use: Advanced use cases, when you've already generated models, custom workflows.

Choosing the Right Approach

Use Case	Recommended Approach	Example
Production pipeline with multiple validations	`Validator` class	`validator = Validator(store=store, schema_id="schema"); validator.validate(data)`
Quick one-off validation	Convenience function	`validate_with_contract("contract.yaml", data)`
You already have a model	Low-level function	`validate(UserModel, data)`
Batch processing	`Validator.validate_batch()`	`validator.validate_batch([data1, data2, data3])`

🏗️ Core Services & Data Production Journey

PyCharter provides eight core services that work together to support a complete data production journey, from contract specification to quality assurance. Each service plays a critical role in managing data contracts and ensuring data quality throughout your pipeline.

The Data Production Journey

The typical data production workflow follows this path:

1. Data Contract Specification
   ↓
2. Contract Parsing
   ↓
3. Metadata Storage
   ↓
4. Pydantic Model Generation
   ↓
5. Runtime Validation
   ↓
6. Quality Assurance & Monitoring

1. 📄 Contract Parser (`pycharter.contract_parser`)

Purpose: Reads and decomposes data contract files into structured metadata components.

When to Use: At the beginning of your data production journey, when you have data contract files (YAML or JSON) that need to be processed and understood.

How It Works:

Accepts data contract files containing schema definitions, governance rules, ownership information, and metadata
Decomposes the contract into distinct components: schema, governance_rules, ownership, and metadata
Returns a ContractMetadata object that separates concerns and makes each component accessible
Extracts and tracks versions of all components

Example:

from pycharter import parse_contract_file, ContractMetadata

# Parse a contract file (YAML or JSON)
metadata = parse_contract_file("data_contract.yaml")

# Access decomposed components
schema = metadata.schema              # JSON Schema definition
governance = metadata.governance_rules # Governance policies
ownership = metadata.ownership         # Owner/team information
metadata_info = metadata.metadata      # Additional metadata
versions = metadata.versions          # Component versions

Contribution to Journey: The contract parser is the entry point that takes raw contract specifications and prepares them for downstream processing. It ensures that contracts are properly structured and that all components (schema, governance, ownership) are separated for independent handling.

1b. 🏗️ Contract Builder (`pycharter.contract_builder`)

Purpose: Constructs consolidated data contracts from separate artifacts (schema, coercion rules, validation rules, metadata).

When to Use: When you have separate artifacts stored independently and need to combine them into a single consolidated contract for runtime validation or distribution.

How It Works:

Takes separate artifacts (schema, coercion rules, validation rules, metadata, ownership, governance rules)
Merges coercion and validation rules into the schema
Tracks versions of all components
Produces a consolidated contract suitable for runtime validation
Can build from artifacts directly or retrieve from contract store

Example:

from pycharter import build_contract, build_contract_from_store, ContractArtifacts

# Build from separate artifacts
artifacts = ContractArtifacts(
    schema={"type": "object", "version": "1.0.0", "properties": {...}},
    coercion_rules={"version": "1.0.0", "rules": {"age": "coerce_to_integer"}},
    validation_rules={"version": "1.0.0", "rules": {"age": {"is_positive": {...}}}},
    metadata={"version": "1.0.0", "description": "User contract"},
    ownership={"owner": "data-team", "team": "engineering"},
)

contract = build_contract(artifacts)
# Contract now has:
# - schema (RAW - rules NOT merged into it)
# - coercion_rules, validation_rules (separate)
# - metadata, ownership, governance_rules
# - versions tracking all components

# Or build from contract store
contract = build_contract_from_store(store, "user_schema_v1")

# Use for validation - Validator merges rules internally
from pycharter import validate_with_contract
result = validate_with_contract(contract, {"name": "Alice", "age": "30"})

Contribution to Journey: The contract builder is the consolidation layer that combines separate artifacts (stored independently in the database) into a single contract artifact. The contract contains raw schema + separate rules (not merged). The Validator class handles merging internally during validation, keeping the contract structure clear and editable.

2. 💾 Contract Store Client (`pycharter.contract_store`)

Purpose: Manages persistent storage and retrieval of decomposed metadata in databases.

When to Use: After parsing contracts, when you need to store metadata components (schemas, governance rules, ownership) in a database for versioning, querying, and governance.

How It Works:

Provides methods to store and retrieve schemas, governance rules, ownership information, and metadata
Supports versioning and querying of stored metadata
Multiple implementations available: PostgreSQL, MongoDB, Redis, and In-Memory (for testing)

Available Implementations:

PostgresContractStore - For PostgreSQL databases (recommended for production)
SQLiteContractStore - For SQLite databases (great for development and small deployments)
MongoDBContractStore - For MongoDB databases
RedisContractStore - For Redis databases
InMemoryContractStore - For testing and development (no persistence)

Example:

from pycharter import SQLiteContractStore, parse_contract_file

# Parse contract
metadata = parse_contract_file("contract.yaml")

# Use SQLite contract store (or PostgresContractStore, MongoDBContractStore, RedisContractStore, etc.)
store = SQLiteContractStore("metadata.db")
store.connect()

# Store decomposed components
schema_id = store.store_schema("user_schema", metadata.schema, version="1.0")

# Merge ownership and governance into metadata before storing
# Ownership and governance are part of metadata, not separate entities
metadata_dict = metadata.metadata.copy() if metadata.metadata else {}
if metadata.ownership:
    metadata_dict["business_owners"] = [metadata.ownership.get("owner", "unknown")] if metadata.ownership.get("owner") else []
if metadata.governance_rules:
    metadata_dict["governance_rules"] = metadata.governance_rules

# Store metadata once with all information (ownership and governance included)
store.store_metadata(resource_id=schema_id, resource_type="schema", metadata=metadata_dict)

# Store coercion and validation rules
store.store_coercion_rules(schema_id, {"age": "coerce_to_integer"}, version="1.0")
store.store_validation_rules(schema_id, {"age": {"is_positive": {}}}, version="1.0")

# Retrieve later
stored_schema = store.get_schema(schema_id)
coercion_rules = store.get_coercion_rules(schema_id)
validation_rules = store.get_validation_rules(schema_id)

Contribution to Journey: The contract store is the persistence layer that ensures contracts and their components are versioned, searchable, and accessible across your organization. It enables governance, audit trails, and schema evolution tracking.

See Configuration Guide for database setup and initialization instructions.

3. 🏭 Pydantic Generator (`pycharter.pydantic_generator`)

Purpose: Dynamically generates fully-functional Pydantic models from JSON Schema definitions.

When to Use: After storing schemas (or directly from parsed contracts), when you need to generate Python models for type-safe data validation and processing.

How It Works:

Takes JSON Schema definitions (from contracts or contract store)
Programmatically generates Pydantic model classes at runtime
Supports all JSON Schema Draft 2020-12 features plus custom coercions and validations
Can generate models from dictionaries, JSON strings, files, or URLs
Optionally generates Python files with model definitions

Example:

from pycharter import from_dict, generate_model_file, ContractStoreClient

# Option 1: Generate from parsed contract
metadata = parse_contract_file("contract.yaml")
UserModel = from_dict(metadata.schema, "User")

# Option 2: Generate from stored schema
client = ContractStoreClient(...)
schema = client.get_schema("user_schema_v1")
UserModel = from_dict(schema, "User")

# Option 3: Generate and save to file
generate_model_file(schema, "user_model.py", "User")

Contribution to Journey: The Pydantic generator is the transformation engine that converts declarative JSON Schema definitions into executable Python models. It bridges the gap between contract specifications (data) and runtime validation (code), enabling type-safe data processing.

4. 🔄 JSON Schema Converter (`pycharter.json_schema_converter`)

Purpose: Converts existing Pydantic models back into JSON Schema format (reverse conversion).

When to Use: When you have existing Pydantic models and need to generate JSON Schema definitions, or when you want to round-trip between schemas and models.

How It Works:

Takes Pydantic model classes as input
Generates JSON Schema dictionaries that represent the model structure
Preserves validation rules, types, and constraints
Can output to dictionaries, JSON strings, or files

Example:

from pycharter import to_dict, to_file, to_json
from pydantic import BaseModel

class Product(BaseModel):
    name: str
    price: float
    in_stock: bool = True

# Convert to JSON Schema
schema = to_dict(Product)
json_string = to_json(Product)
to_file(Product, "product_schema.json")

# Now you can use the schema with other services
ProductModel = from_dict(schema, "Product")  # Round-trip

Contribution to Journey: The JSON Schema converter enables bidirectional conversion between models and schemas. It's useful for:

Generating schemas from existing code
Round-trip validation (schema → model → schema)
Integrating with systems that require JSON Schema format
Documenting existing models as schemas

5. ✅ Runtime Validator (`pycharter.runtime_validator`)

Purpose: Lightweight validation utility for validating data against generated Pydantic models in production data pipelines.

When to Use: In your data processing scripts, ETL pipelines, API endpoints, or any place where you need to validate incoming data against contract specifications.

API Organization:

PyCharter provides validation through three tiers:

Tier 1: Validator Class (⭐ PRIMARY INTERFACE - Recommended for production)
- Best performance for multiple validations (model is cached)
- Supports all data sources (contract files, directories, stores, dictionaries)
- Reusable instance for batch processing
Tier 2: Convenience Functions (Quick start - one-off validations)
- validate_with_store() - Quick validation with contract store
- validate_with_contract() - Quick validation with contract file/dict
- get_model_from_store() / get_model_from_contract() - Get model for reuse
Tier 3: Low-Level Functions (When you already have a model)
- validate() - Validate single record with existing model
- validate_batch() - Batch validate with existing model

How It Works:

Takes a Pydantic model (generated from a schema) and raw data
Validates data against the model's constraints
Returns a ValidationResult with validation status, validated data, and errors
Supports single record and batch validation
Can be used in strict mode (raises exceptions) or lenient mode (returns results)

Example - Validator Class (Recommended):

from pycharter import Validator, SQLiteContractStore

# Option 1: From directory (schema.yaml, coercion_rules.yaml, validation_rules.yaml)
validator = Validator.from_dir("data/contracts/user")
result = validator.validate({"name": "Alice", "age": 30})

# Option 2: From explicit files
validator = Validator.from_files(schema="schemas/user.yaml", coercion_rules="rules/coercion.yaml")
result = validator.validate({"name": "Alice", "age": 30})

# Option 3: From single contract file
validator = Validator.from_file("user_contract.yaml")
result = validator.validate({"name": "Alice", "age": 30})

# Option 4: From contract store (with database)
store = SQLiteContractStore("metadata.db")
store.connect()
validator = Validator(store=store, schema_id="user_schema_v1")
result = validator.validate({"name": "Alice", "age": 30})

# Batch validation (efficient - model cached)
results = validator.validate_batch([data1, data2, data3])

Example - Convenience Functions (Quick Start):

from pycharter import validate_with_store, validate_with_contract, SQLiteContractStore

# Quick validation with store
store = SQLiteContractStore("metadata.db")
store.connect()
result = validate_with_store(store, "user_schema_v1", {"name": "Alice", "age": 30})

# Quick validation with contract file (no database)
result = validate_with_contract("user_contract.yaml", {"name": "Alice", "age": 30})

Example - Low-Level (When You Have a Model):

from pycharter import from_dict, validate, validate_batch

# Generate model
UserModel = from_dict(schema, "User")

# Validate single record
result = validate(UserModel, {"name": "Alice", "age": 30})

# Batch validate
results = validate_batch(UserModel, [data1, data2, data3])

Performance Tips:

⚡ For multiple validations: Use Validator class (model is cached)
⚡ For one-off validations: Convenience functions are fine
⚡ For batch processing: Use Validator.validate_batch() or validate_batch()

Contribution to Journey: The runtime validator is the enforcement layer that ensures data quality in production. It validates actual data against contract specifications, catching violations early and preventing bad data from propagating through your systems. It supports both database-backed workflows (for production systems with contract stores) and contract-based workflows (for simpler use cases without database dependencies).

5b. 🔄 Pipelines (`pycharter.pipeline_generator`)

Purpose: Build and run ETL pipelines programmatically (with the | operator) or from YAML configs. No assumptions about project layout—you specify file paths or use a directory with standard filenames.

When to Use: When you need to extract, transform, and load data from config-driven or code-defined pipelines (HTTP, files, databases, cloud storage → transforms → Postgres, files, cloud).

How It Works:

Programmatic: Pipeline(extractor) | transformer | loader; chain with |; call await pipeline.run().
Config-driven: Load from explicit files (from_config_files), from a directory with extract.yaml, transform.yaml, load.yaml (from_config_dir), or from a single pipeline.yaml (from_config_file).
Variables: Pass PipelineContext(variables={"API_KEY": "x"}) or variables={...} in factory methods; ${VAR} and ${VAR:-default} in configs are resolved from these (no built-in CONTRACT_DIR).
Async: run() is async; use asyncio.run(pipeline.run()) in scripts or await pipeline.run() in async code.
Error handling: Optional error_context with ErrorMode (STRICT, LENIENT, COLLECT) controls whether extraction/load failures raise or are collected in result.errors.

Example:

import asyncio
from pycharter import Pipeline, PipelineContext, HTTPExtractor, PostgresLoader, Rename, AddField

# Programmatic
pipeline = (
    Pipeline(HTTPExtractor(url="https://api.example.com/users"))
    | Rename({"userName": "name"})
    | AddField("processed_at", "now()")
    | PostgresLoader(connection_string="...", table="users")
)
result = asyncio.run(pipeline.run())

# Config-driven (explicit files)
pipeline = Pipeline.from_config_files(
    extract="configs/extract.yaml",
    load="configs/load.yaml",
    variables={"API_KEY": "secret"}
)

# Config-driven (directory: extract.yaml, transform.yaml, load.yaml)
pipeline = Pipeline.from_config_dir("pipelines/users/")

# Config-driven (single file)
pipeline = Pipeline.from_config_file("pipelines/users/pipeline.yaml")

result = asyncio.run(pipeline.run())

Exceptions: Pipeline and config loading use PyCharter’s exception hierarchy: PyCharterError (base), ConfigError, ConfigValidationError, ExpressionError. See Exceptions under API Reference.

See pycharter/pipeline_generator/ASYNC_AND_EXECUTION.md for async usage and error modes.

6. 🔍 Quality Assurance (`pycharter.quality`)

Purpose: Data quality assurance pipeline that polices data according to data contracts, calculates quality metrics, tracks violations, and generates quality reports.

When to Use: When you need to:

Monitor data quality over time
Calculate quality scores and metrics
Track and manage data quality violations
Set quality thresholds and get alerts
Generate quality reports for governance

How It Works:

Validates data against contracts (using Runtime Validator)
Calculates quality metrics (accuracy, completeness, violation rates)
Tracks violations for audit and remediation
Checks quality thresholds and generates alerts
Produces comprehensive quality reports

Example:

from pycharter import QualityCheck, QualityCheckOptions, QualityThresholds

# Define quality thresholds
thresholds = QualityThresholds(
    min_overall_score=95.0,
    max_violation_rate=0.05
)

# Run quality check
check = QualityCheck(store=store)
report = check.run(
    schema_id="user_schema_v1",
    data="data/users.json",
    options=QualityCheckOptions(
        calculate_metrics=True,
        record_violations=True,
        check_thresholds=True,
        thresholds=thresholds
    )
)

print(f"Quality Score: {report.quality_score.overall_score:.2f}/100")
print(f"Passed: {report.passed}")

Contribution to Journey: The quality assurance module is the policing layer that ensures data quality is maintained throughout the pipeline. It provides metrics, tracking, and alerting capabilities that transform PyCharter from a contract management tool into a complete data quality assurance platform.

See Quality Module README for detailed documentation.

Complete Workflow Example

Here's how all services work together in a complete data production journey:

from pycharter import (
    parse_contract_file,
    SQLiteContractStore,
    from_dict,
    Validator,
    to_dict
)

# Step 1: Parse contract specification
metadata = parse_contract_file("user_contract.yaml")

# Step 2: Store metadata in database
store = SQLiteContractStore("metadata.db")
store.connect()
schema_id = store.store_schema("user", metadata.schema, version="1.0")

# Merge ownership and governance into metadata before storing
# Ownership and governance are part of metadata, not separate entities
metadata_dict = metadata.metadata.copy() if metadata.metadata else {}
if metadata.ownership:
    metadata_dict["business_owners"] = [metadata.ownership.get("owner", "unknown")] if metadata.ownership.get("owner") else []
if metadata.governance_rules:
    metadata_dict["governance_rules"] = metadata.governance_rules

# Store metadata once with all information (ownership and governance included)
store.store_metadata(resource_id=schema_id, resource_type="schema", metadata=metadata_dict)

# Store coercion and validation rules
store.store_coercion_rules(schema_id, {"age": "coerce_to_integer"}, version="1.0")
store.store_validation_rules(schema_id, {"age": {"is_positive": {}}}, version="1.0")

# Step 3: Generate Pydantic model from stored schema
schema = store.get_schema(schema_id)
UserModel = from_dict(schema, "User")

# Step 4: (Optional) Convert model back to schema for documentation
schema_doc = to_dict(UserModel)

# Step 5: Validate data in production pipeline
# Option A: Using Validator class (recommended for production)
validator = Validator(store=store, schema_id=schema_id)

def process_user_data(raw_data):
    result = validator.validate(raw_data)
    if result.is_valid:
        # Process validated data
        return result.data
    else:
        # Handle validation errors
        raise ValueError(f"Invalid data: {result.errors}")

# Option B: Using convenience function (quick start)
from pycharter import validate_with_store

def process_user_data_quick(raw_data):
    result = validate_with_store(store, schema_id, raw_data)
    if result.is_valid:
        return result.data
    else:
        raise ValueError(f"Invalid data: {result.errors}")

7. 🌐 REST API (`pycharter.api`)

Purpose: Expose all PyCharter services as REST API endpoints.

When to Use: When you need to use PyCharter from non-Python applications, microservices, or want to provide a web-based interface.

How It Works:

Provides HTTP endpoints for all core services
Uses FastAPI for automatic OpenAPI/Swagger documentation
Supports both store-based and contract-based operations
Handles request/response validation with Pydantic models
Located at the root level (api/) as a separate application
All endpoints are async-ready for better performance

Example:

# Start the API server (uses PYCHARTER_DATABASE_URL or sqlite:///pycharter.db)
pycharter api

# With host/port
pycharter api --host 0.0.0.0 --port 8080

Endpoints (see Swagger for full list):

Contracts: POST /api/v1/contracts/parse, POST /api/v1/contracts/build
Metadata: POST /api/v1/metadata/schemas, GET /api/v1/metadata/schemas/{schema_id}
Validation: POST /api/v1/validation/validate, POST /api/v1/validation/validate-batch
Quality: POST /api/v1/quality/check, GET /api/v1/quality/metrics
ETL: POST /api/v1/etl/run (extract/transform/load YAML; optional pipeline quality checks)
Pipeline runs: GET /api/v1/runs, GET /api/v1/runs/stats, GET /api/v1/pipelines
Ontology: GET /api/v1/semantic/schemes, concepts, relationships, semantic health, and related ontology endpoints

Documentation:

Swagger UI: http://localhost:8002/docs (default port 8002)
ReDoc: http://localhost:8002/redoc

See src/pycharter/api/README.md for complete API documentation.

8. 🌐 Web UI (`pycharter.ui`)

Purpose: Browser-based interface for contracts, pipelines, quality, and ontology. Served by pycharter ui serve (production) or pycharter ui dev (development with hot reload).

Main sections:

Contracts — Registry, diagram/form editor, and validation workspace
Pipelines — Overview dashboard, pipeline runs, ETL generator (YAML panels), and Pipeline diagram (Figma Jam–style visual editor for extract/transform/load nodes)
Quality — Quality metrics, thresholds, and violation tracking
Ontology — Concepts, concept schemes, diagrammatic workspace (ReactFlow), proposals, and semantic health
Documentation — In-app API playground and docs

Configure the API base URL in the UI Settings page. Authentication uses PYCHARTER_AUTH_USERS (see Configuration Guide).

Service Integration Summary

Service	Input	Output	Journey Stage
Contract Parser	Contract files (YAML/JSON)	`ContractMetadata`	Contract Specification → Parsing
Contract Builder	Separate artifacts or Store	Consolidated contract	Storage → Consolidation
Metadata Store	`ContractMetadata`	Stored metadata (DB)	Parsing → Storage
Pydantic Generator	JSON Schema	Pydantic models	Storage → Model Generation
JSON Schema Converter	Pydantic models	JSON Schema	(Bidirectional)
Runtime Validator	Pydantic models + Data	`ValidationResult`	Model Generation → Validation
ETL Pipelines	Config files or code	`PipelineResult`	Extract → Transform → Load
Quality Assurance	Contract + Data	`QualityReport`	Validation → Quality Monitoring

Each service is designed to be independent yet composable, allowing you to use them individually or together as part of a complete data contract management system.

📖 Documentation

Full documentation (Python API, tutorials, guides): https://optophi.github.io/pycharter/
Serve docs locally: pycharter docs serve (default: http://127.0.0.1:5002). Build static site: pycharter docs build. Requires pip install pycharter[docs].
Configuration Guide - Database connection, pycharter db init / upgrade / seed, migrations, and variable injection
Data Journey Guide - Data production journey: contract specification → storage → validation → quality
Database ERD - Database schema and entity relationship diagrams
Examples & Notebooks - Tutorials and guides (ETL, contracts, validation, quality, contract store, schema conversion); optional Marimo for interactive .py notebooks
REST API - API endpoints and usage (install with pip install pycharter[api])

📚 Usage Examples

Basic Usage

Using Convenience Functions (Quick Start):

from pycharter import from_dict, from_json, from_file

# From dictionary
schema = {
    "type": "object",
    "version": "1.0.0",
    "properties": {
        "title": {"type": "string"},
        "published": {"type": "boolean", "default": False}
    }
}
Article = from_dict(schema, "Article")

# From JSON string
schema_json = '{"type": "object", "version": "1.0.0", "properties": {"name": {"type": "string"}}}'
User = from_json(schema_json, "User")

# From file
Product = from_file("product_schema.json", "Product")

Using Validator Class (Production):

from pycharter import Validator

# From directory or single file
validator = Validator.from_dir("data/contracts/article")
# or: validator = Validator.from_file("article_contract.yaml")
result = validator.validate({"title": "My Article", "published": True})

Nested Objects

from pycharter import from_dict

schema = {
    "type": "object",
    "version": "1.0.0",
    "properties": {
        "name": {"type": "string"},
        "address": {
            "type": "object",
            "properties": {
                "street": {"type": "string"},
                "city": {"type": "string"},
                "zipcode": {"type": "string"}
            }
        }
    }
}

Person = from_dict(schema, "Person")
person = Person(
    name="Alice",
    address={
        "street": "123 Main St",
        "city": "New York",
        "zipcode": "10001"
    }
)

print(person.address.city)  # Output: New York

Arrays and Collections

from pycharter import from_dict

schema = {
    "type": "object",
    "version": "1.0.0",
    "properties": {
        "tags": {
            "type": "array",
            "items": {"type": "string"}
        },
        "items": {
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "name": {"type": "string"},
                    "price": {"type": "number"}
                }
            }
        }
    }
}

Cart = from_dict(schema, "Cart")
cart = Cart(
    tags=["python", "pydantic"],
    items=[
        {"name": "Apple", "price": 1.50},
        {"name": "Banana", "price": 0.75}
    ]
)

print(cart.items[0].name)  # Output: Apple

Coercion and Validation

PyCharter supports coercion (pre-validation transformation) and validation (post-validation checks):

from pycharter import from_dict

schema = {
    "type": "object",
    "version": "1.0.0",
    "properties": {
        "flight_number": {
            "type": "integer",
            "coercion": "coerce_to_integer"  # Convert string/float to int
        },
        "destination": {
            "type": "string",
            "coercion": "coerce_to_string",
            "validations": {
                "min_length": {"threshold": 3},
                "max_length": {"threshold": 3},
                "no_capital_characters": None,
                "only_allow": {"allowed_values": ["abc", "def", "ghi"]}
            }
        },
        "distance": {
            "type": "number",
            "coercion": "coerce_to_float",
            "validations": {
                "greater_than_or_equal_to": {"threshold": 0}
            }
        }
    }
}

Flight = from_dict(schema, "Flight")

# Coercion happens automatically
flight = Flight(
    flight_number="123",    # Coerced to int: 123
    destination="abc",      # Passes all validations
    distance="100.5"        # Coerced to float: 100.5
)

📋 Standard JSON Schema Support

Charter supports all standard JSON Schema Draft 2020-12 validation keywords:

Keyword	Type	Description	Example
`minLength`	string	Minimum string length	`{"minLength": 3}`
`maxLength`	string	Maximum string length	`{"maxLength": 10}`
`pattern`	string	Regular expression pattern	`{"pattern": "^[a-z]+$"}`
`enum`	any	Allowed values	`{"enum": ["a", "b", "c"]}`
`const`	any	Single allowed value	`{"const": "fixed"}`
`minimum`	number	Minimum value (inclusive)	`{"minimum": 0}`
`maximum`	number	Maximum value (inclusive)	`{"maximum": 100}`
`exclusiveMinimum`	number	Minimum value (exclusive)	`{"exclusiveMinimum": 0}`
`exclusiveMaximum`	number	Maximum value (exclusive)	`{"exclusiveMaximum": 100}`
`multipleOf`	number	Must be multiple of	`{"multipleOf": 2}`
`minItems`	array	Minimum array length	`{"minItems": 1}`
`maxItems`	array	Maximum array length	`{"maxItems": 10}`
`uniqueItems`	array	Array items must be unique	`{"uniqueItems": true}`

All schemas are validated against JSON Schema standard before processing, ensuring compliance.

🔧 Built-in Coercions (Charter Extensions)

Coercion	Description
`coerce_to_string`	Convert int, float, bool, datetime, dict, list to string
`coerce_to_integer`	Convert float, string (numeric), bool, datetime to int
`coerce_to_float`	Convert int, string (numeric), bool to float
`coerce_to_boolean`	Convert int, string to bool
`coerce_to_datetime`	Convert string (ISO format), timestamp to datetime
`coerce_to_date`	Convert string (date format), datetime to date (date only, no time)
`coerce_to_uuid`	Convert string to UUID
`coerce_to_lowercase`	Convert string to lowercase
`coerce_to_uppercase`	Convert string to uppercase
`coerce_to_stripped_string`	Strip leading and trailing whitespace from string
`coerce_to_list`	Convert single value to list `[value]` (preserves None)
`coerce_empty_to_null`	Convert empty strings/lists/dicts to None (useful for nullable fields)

✅ Built-in Validations (Charter Extensions)

Validation	Description	Configuration
`min_length`	Minimum length for strings/arrays	`{"threshold": N}`
`max_length`	Maximum length for strings/arrays	`{"threshold": N}`
`only_allow`	Only allow specific values	`{"allowed_values": [...]}`
`greater_than_or_equal_to`	Numeric minimum	`{"threshold": N}`
`less_than_or_equal_to`	Numeric maximum	`{"threshold": N}`
`is_positive`	Value must be positive	`{"threshold": 0}`
`no_capital_characters`	No uppercase letters	`null`
`no_special_characters`	Only alphanumeric and spaces	`null`
`non_empty_string`	String must not be empty	`null`
`matches_regex`	String must match regex pattern	`{"pattern": "..."}`
`is_email`	String must be valid email address	`null`
`is_url`	String must be valid URL	`null`
`is_alphanumeric`	Only alphanumeric characters (no spaces/special)	`null`
`is_numeric_string`	String must be numeric (digits, optional decimal)	`null`
`is_unique`	All items in array must be unique	`null`

Note: Charter extensions (coercion and validations) are optional and can be used alongside standard JSON Schema keywords. All validation logic is stored as data in the JSON schema, making it fully data-driven.

🎨 Custom Coercions and Validations

Extend Charter with your own coercion and validation functions:

from pycharter.shared.coercions import register_coercion
from pycharter.shared.validations import register_validation

# Register custom coercion
def coerce_to_uppercase(data):
    if isinstance(data, str):
        return data.upper()
    return data

register_coercion("coerce_to_uppercase", coerce_to_uppercase)

# Register custom validation
def must_be_positive(threshold=0):
    def _validate(value, info):
        if value <= threshold:
            raise ValueError(f"Value must be > {threshold}")
        return value
    return _validate

register_validation("must_be_positive", must_be_positive)

📖 API Reference

PyCharter's API is organized into three tiers to help you choose the right approach:

Tier 1: Primary Interfaces (Classes - Best Performance)

Validator - Primary validation interface (recommended for production)

from pycharter import Validator

# Create validator via factory methods or store
validator = Validator.from_dir("data/contracts/user")
validator = Validator.from_files(schema="schema.yaml", coercion_rules="coercion.yaml")
validator = Validator.from_file("contract.yaml")
validator = Validator.from_dict(schema={...}, coercion_rules={...})
validator = Validator(store=store, schema_id="user_schema")  # from contract store

# Validate data
result = validator.validate(data)
results = validator.validate_batch([data1, data2])
model = validator.get_model()  # Get the generated Pydantic model

# State-aware validation (per FSM lifecycle state)
result = validator.validate_for_state(data, state="DRAFT")
result = validator.validate_for_state(data, state="FILLED",
    state_rules={"FILLED": {"required": ["filled_qty"]}})

ValidatorBuilder - Fluent API for building validators

from pycharter import ValidatorBuilder

validator = (
    ValidatorBuilder()
    .from_dir("contracts/order")          # or .from_file(), .from_files(), .from_dict(), .from_store()
    .with_state_rules({                   # optional: per-state required/optional overrides
        "DRAFT":  {"optional": ["filled_qty"]},
        "FILLED": {"required": ["filled_qty"]},
    })
    .with_quality_checks(thresholds={"completeness": 0.95})  # optional: quality metrics
    .strict()                             # optional: raise on validation failure
    .build()
)

QualityCheck - Primary quality assurance interface

from pycharter import QualityCheck, QualityCheckOptions

check = QualityCheck(store=store)
report = check.run(schema_id="user_schema", data=data, options=QualityCheckOptions(...))

ContractStoreClient - Base class for contract stores

from pycharter import ContractStoreClient, SQLiteContractStore, PostgresContractStore

store = SQLiteContractStore("metadata.db")
store.connect()

Tier 2: Convenience Functions (Quick Start)

Pydantic Generator - Input type helpers

from_dict(schema: dict, model_name: str = "DynamicModel") - Create model from dictionary
from_json(json_string: str, model_name: str = "DynamicModel") - Create model from JSON string
from_file(file_path: str, model_name: str = None) - Create model from file (JSON/YAML)
from_url(url: str, model_name: str = "DynamicModel") - Create model from URL
generate_model(schema: dict, model_name: str = "DynamicModel") - Advanced: more control
generate_model_file(schema: dict, output_path: str, model_name: str = "DynamicModel") - Generate and save to file

JSON Schema Converter - Output type helpers

to_dict(model: Type[BaseModel], ...) - Convert model to JSON Schema dictionary
to_file(model: Type[BaseModel], file_path: str, ...) - Convert model to file
to_json(model: Type[BaseModel], ...) - Convert model to JSON string
model_to_schema(model: Type[BaseModel], ...) - Advanced: core conversion function

Runtime Validator - Data source helpers

validate_with_store(store, schema_id, data, ...) - Quick validation with contract store
validate_batch_with_store(store, schema_id, data_list, ...) - Batch validation with store
validate_with_contract(contract, data, ...) - Quick validation with contract file/dict
validate_batch_with_contract(contract, data_list, ...) - Batch validation with contract
get_model_from_store(store, schema_id, ...) - Get model from contract store
get_model_from_contract(contract, ...) - Get model from contract
validate_input(contract, ...) - Decorator for function input validation
validate_output(contract, ...) - Decorator for function output validation
validate_with_contract_decorator(contract, ...) - Decorator for contract-based validation

Contract Management

parse_contract(contract_dict: dict) - Parse contract dictionary
parse_contract_file(file_path: str) - Parse contract file (YAML/JSON)
build_contract(artifacts: ContractArtifacts) - Build contract from artifacts
build_contract_from_store(store, schema_id, ...) - Build contract from contract store

Tier 3: Low-Level Utilities (When You Have Models)

validate(model: Type[BaseModel], data: dict, strict: bool = False) - Validate single record
validate_batch(model: Type[BaseModel], data_list: List[dict], strict: bool = False) - Batch validate
ValidationResult - Result class with is_valid, data, and errors attributes

Metadata Store Implementations

InMemoryContractStore() - In-memory store (testing/development)
SQLiteContractStore(database_path: str) - SQLite database
PostgresContractStore(connection_string: str) - PostgreSQL database
MongoDBContractStore(connection_string: str) - MongoDB database
RedisContractStore(connection_string: str) - Redis database

Domain Lifecycle (FSM Integration)

from pycharter import (
    check_state_alignment,
    validate_lifecycle_binding,
    get_lifecycle_binding,
    get_domain_entity_info,
    DEFAULT_STATE_FIELD,
)

# Check that contract enum values match FSM states
result = check_state_alignment(contract_dict, fsm_states={"PENDING","OPEN","FILLED"},
                               state_field="status")
# result.aligned, result.missing_from_contract, result.missing_from_fsm

# Validate lifecycle binding structure in contract metadata
errors = validate_lifecycle_binding(metadata_dict)

# Read binding (state_machine_name, machine_version, state_field, entity_id_field)
binding = get_lifecycle_binding(metadata_dict)

DEFAULT_STATE_FIELD  # "status" — the conventional field name for FSM state

Exceptions

PyCharter uses a small exception hierarchy for config and pipeline errors. Catch PyCharterError to handle any PyCharter failure:

PyCharterError - Base for all PyCharter exceptions
ConfigError - Config loading/parsing failures (missing file, invalid YAML)
ConfigValidationError - Schema validation failures (e.g. missing required type field)
ConfigLoadError - Config file load errors
ExpressionError - Expression evaluation failures (e.g. invalid syntax in AddField)

Pipeline run(error_context=...) supports ErrorMode: STRICT (raise on failure), LENIENT (log and continue), COLLECT (append to result.errors). Import from pycharter.shared.errors.

🎯 Design Principles & Requirements

Charter is designed to meet the following core requirements:

✅ JSON Schema Standard Compliance

All schemas must abide by conventional JSON Schema syntax and qualify as valid JSON Schema:

Validation: All schemas are validated against JSON Schema Draft 2020-12 standard before processing
Standard Keywords: Full support for all standard validation keywords (minLength, pattern, enum, minimum, maximum, etc.)
Compliance: Uses jsonschema library for validation with graceful fallback

✅ Data-Driven Validation Logic

All schema information and complex field validation logic is stored as data, not Python code:

Coercion: Referenced by name (string) in JSON: "coercion": "coerce_to_integer"
Validations: Referenced by name with configuration (dict) in JSON: "validations": {"min_length": {"threshold": 3}}
No Code Required: Validation rules are defined entirely in JSON schema files
Example: {"coercion": "coerce_to_string", "validations": {"min_length": {"threshold": 3}}}

✅ Dynamic Pydantic Model Generation

Models are created dynamically at runtime from JSON schemas:

Runtime Generation: Uses pydantic.create_model() to generate models on-the-fly
Dynamic Validators: Field validators are dynamically attached using field_validator decorators
Multiple Sources: Models can be created from dicts, JSON strings, files, or URLs
No Static Code: All models are generated from data, not pre-defined classes

✅ Nested Schema Support

Full support for nested object schemas and complex structures:

Recursive Processing: Nested objects are recursively processed into their own Pydantic models
Arrays of Objects: Arrays containing nested objects are fully supported
Deep Nesting: Deeply nested structures work correctly with full type safety
Type Safety: Each nested object becomes its own typed Pydantic model

✅ Extension Fields

Custom fields can be added to JSON Schema to extend functionality:

coercion: Pre-validation type conversion (e.g., string → integer)
validations: Post-validation custom rules
Optional: Extensions work alongside standard JSON Schema keywords
Separated: Extensions are clearly distinguished from standard JSON Schema

✅ Complex Field Validation

Support for both standard and custom field validators:

Standard Validators: minLength, pattern, enum, minimum, maximum, etc. (JSON Schema standard)
Custom Validators: Extensible validation rules via validations field
Validation Order: Coercion → Standard Validation → Pydantic Validation → Custom Validations
Factory Pattern: Validators are factory functions that return validation functions

🚀 Development Setup

Quick Setup

# Run setup script
./scripts/setup.sh

# Activate environment
source venv/bin/activate

# Run tests
pytest

Using Make

make install-dev    # Install package and dev dependencies
make test          # Run tests
make format        # Format with Ruff (ruff format + ruff check --fix)
make lint          # Check formatting and lint with Ruff (no writes)
make type-check    # Run mypy on src/pycharter
make check         # Run format, lint, type-check, and test

Building the package: Run make clean && make build for a reliable build (clears stale egg-info; see Publishing).

🧪 Testing

# Run all tests
pytest

# Run with coverage
pytest --cov=pycharter --cov-report=html

# Run specific test file
pytest tests/test_converter.py

# Run tests matching a pattern
pytest -k "coercion"

📦 Publishing to PyPI

Automatic publishing via GitHub Releases (Trusted Publishing - no tokens needed!):

# 1. Update version in pyproject.toml
# version = "0.0.21"

# 2. Commit and push
git add pyproject.toml
git commit -m "Bump version to 0.0.21"
git push

# 3. Create GitHub Release (automatically publishes to PyPI)
gh release create v0.0.21 --title "v0.0.21" --notes "Release notes"

The workflow automatically:

✅ Builds UI
✅ Builds Python package
✅ Publishes to PyPI (using Trusted Publishing)

Local build (reliable): clean first to avoid stale build artifacts, then build:

make clean && make build

Core package builds without Node.js; the UI is included when built (see Publishing guide).

📋 JSON Schema Compliance

PyCharter is fully compliant with JSON Schema Draft 2020-12 standard:

All schemas are validated against the standard before processing
Full support for all standard keywords (minLength, maxLength, pattern, enum, minimum, maximum, etc.)
Optional extensions (coercion and validations) work alongside standard keywords
Strict mode available to enforce standard-only schemas

🔗 Requirements

Python 3.11+
Pydantic >= 2.0.0
jsonschema >= 4.0.0 (optional, for enhanced validation)

See pyproject.toml for full dependencies and optional extras (api, ui, dev, etl, etc.).

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Fork the repository
Create your feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add some amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🤝 Contributing

See CONTRIBUTING.md. Report security issues per SECURITY.md. Community expectations: CODE_OF_CONDUCT.md.

🔗 Links

Homepage: https://github.com/optophi/pycharter
Repository: GitHub
Issues: GitHub Issues
Documentation: Configuration & guides · API docs

Made with ❤️ for the Python community

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.0.57

Apr 19, 2026

0.0.56

Apr 18, 2026

0.0.54

Apr 16, 2026

0.0.45

Mar 19, 2026

0.0.40

Mar 5, 2026

0.0.35

Feb 13, 2026

0.0.30

Feb 10, 2026

0.0.25

Jan 29, 2026

0.0.20

Jan 22, 2026

0.0.10

Dec 22, 2025

0.0.3

Nov 25, 2025

0.0.2

Nov 19, 2025

0.0.1

Nov 17, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pycharter-0.0.57.tar.gz (2.6 MB view details)

Uploaded Apr 19, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pycharter-0.0.57-py3-none-any.whl (2.9 MB view details)

Uploaded Apr 19, 2026 Python 3

File details

Details for the file pycharter-0.0.57.tar.gz.

File metadata

Download URL: pycharter-0.0.57.tar.gz
Upload date: Apr 19, 2026
Size: 2.6 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for pycharter-0.0.57.tar.gz
Algorithm	Hash digest
SHA256	`84c2826486cfcd4e2e25df91251933f9888591fe80f8bdad3d4e48fa16d673c0`
MD5	`a9eb5a4adeb1a01a4c9f604321c4d1dc`
BLAKE2b-256	`e91154914c99fbba5741b389e74f90801eb7b488d76bd16045eb965b2efd2b23`

See more details on using hashes here.

Provenance

The following attestation bundles were made for pycharter-0.0.57.tar.gz:

Publisher: publish.yml on optophi/pycharter

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: pycharter-0.0.57.tar.gz
- Subject digest: 84c2826486cfcd4e2e25df91251933f9888591fe80f8bdad3d4e48fa16d673c0
- Sigstore transparency entry: 1340108766
- Sigstore integration time: Apr 19, 2026
Source repository:
- Permalink: optophi/pycharter@88b7d003dda8b96b88802403d9ef9bd0f35bf884
- Branch / Tag: refs/tags/v0.0.57
- Owner: https://github.com/optophi
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@88b7d003dda8b96b88802403d9ef9bd0f35bf884
- Trigger Event: push

File details

Details for the file pycharter-0.0.57-py3-none-any.whl.

File metadata

Download URL: pycharter-0.0.57-py3-none-any.whl
Upload date: Apr 19, 2026
Size: 2.9 MB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for pycharter-0.0.57-py3-none-any.whl
Algorithm	Hash digest
SHA256	`07cb484d021f50aad559754111ead9a3f4faaae8f0ba3986e57bbaf3d9765f17`
MD5	`eaefd441143d668a3ff641fe5fc982c9`
BLAKE2b-256	`82fcd11dc07b37f98a7a1724b5acc9f65f8839df0aef47bcef86c96baef81c9e`

See more details on using hashes here.

Provenance

The following attestation bundles were made for pycharter-0.0.57-py3-none-any.whl:

Publisher: publish.yml on optophi/pycharter

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: pycharter-0.0.57-py3-none-any.whl
- Subject digest: 07cb484d021f50aad559754111ead9a3f4faaae8f0ba3986e57bbaf3d9765f17
- Sigstore transparency entry: 1340108770
- Sigstore integration time: Apr 19, 2026
Source repository:
- Permalink: optophi/pycharter@88b7d003dda8b96b88802403d9ef9bd0f35bf884
- Branch / Tag: refs/tags/v0.0.57
- Owner: https://github.com/optophi
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@88b7d003dda8b96b88802403d9ef9bd0f35bf884
- Trigger Event: push

pycharter 0.0.57

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

PyCharter

⚡ Quick start (2 minutes)

What is PyCharter?

Data Terminology

Use Cases

📐 Concepts

✨ Features

📦 Installation

Core Library

With API Support

Full install (all extras)

With UI Support

Database setup (for contract store, API, and seed data)

🚀 Quick Start

Quick Start: ETL Pipelines

Quick Start: Convenience Functions (One-off Use)

Production Use: Validator Class (Recommended)

With Metadata Store

Using ValidatorBuilder (Fluent API)

Domain Lifecycle & FSM Integration

📐 API Organization

Tier 1: Primary Interfaces (⭐ Recommended for Production)

Tier 2: Convenience Functions (Quick Start)

Tier 3: Low-Level Utilities

Choosing the Right Approach

🏗️ Core Services & Data Production Journey

The Data Production Journey

1. 📄 Contract Parser (pycharter.contract_parser)

1b. 🏗️ Contract Builder (pycharter.contract_builder)

2. 💾 Contract Store Client (pycharter.contract_store)

3. 🏭 Pydantic Generator (pycharter.pydantic_generator)

4. 🔄 JSON Schema Converter (pycharter.json_schema_converter)

5. ✅ Runtime Validator (pycharter.runtime_validator)

5b. 🔄 Pipelines (pycharter.pipeline_generator)

6. 🔍 Quality Assurance (pycharter.quality)

Complete Workflow Example

7. 🌐 REST API (pycharter.api)

8. 🌐 Web UI (pycharter.ui)

Service Integration Summary

📖 Documentation

📚 Usage Examples

Basic Usage

Nested Objects

Arrays and Collections

Coercion and Validation

📋 Standard JSON Schema Support

🔧 Built-in Coercions (Charter Extensions)

✅ Built-in Validations (Charter Extensions)

🎨 Custom Coercions and Validations

📖 API Reference

Tier 1: Primary Interfaces (Classes - Best Performance)

Tier 2: Convenience Functions (Quick Start)

Tier 3: Low-Level Utilities (When You Have Models)

Metadata Store Implementations

Domain Lifecycle (FSM Integration)

Exceptions

🎯 Design Principles & Requirements

✅ JSON Schema Standard Compliance

✅ Data-Driven Validation Logic

✅ Dynamic Pydantic Model Generation

✅ Nested Schema Support

✅ Extension Fields

✅ Complex Field Validation

🚀 Development Setup

Quick Setup

Using Make

🧪 Testing

📦 Publishing to PyPI

📋 JSON Schema Compliance

🔗 Requirements

1. 📄 Contract Parser (`pycharter.contract_parser`)

1b. 🏗️ Contract Builder (`pycharter.contract_builder`)

2. 💾 Contract Store Client (`pycharter.contract_store`)

3. 🏭 Pydantic Generator (`pycharter.pydantic_generator`)

4. 🔄 JSON Schema Converter (`pycharter.json_schema_converter`)

5. ✅ Runtime Validator (`pycharter.runtime_validator`)

5b. 🔄 Pipelines (`pycharter.pipeline_generator`)

6. 🔍 Quality Assurance (`pycharter.quality`)

7. 🌐 REST API (`pycharter.api`)

8. 🌐 Web UI (`pycharter.ui`)