Skip to main content

A Python package for data contract management with five core services: contract parsing, metadata storage, Pydantic generation, JSON Schema conversion, and runtime validation

Project description

PyCharter

Data contract management and validation for Python: define schemas, enforce quality, and run pipelines with contracts.

Python 3.11+ License: MIT Ruff


โšก Quick start (2 minutes)

Install, define a tiny contract, and validate one record. Copy-paste into a new terminal:

pip install pycharter
from pycharter import from_dict, validate

# 1. Define a minimal contract (schema)
schema = {
    "type": "object",
    "version": "1.0.0",
    "properties": {
        "name": {"type": "string"},
        "age": {"type": "integer"},
    },
    "required": ["name", "age"],
}

# 2. Build a validator and validate one record
Person = from_dict(schema, "Person")
result = validate(Person, {"name": "Alice", "age": 30})

print(result.is_valid)   # True
print(result.data.name) # Alice

Invalid data returns result.is_valid == False and result.errors with details. Next: use a contract file with Validator.from_file("contract.yaml"), or add coercion/validation rules and store contracts in a contract store (see Concepts and Installation).


What is PyCharter?

PyCharter is a data contract-as-code platform that brings structure, validation, and governance to data pipelines and applications. At its core, PyCharter automatically converts JSON schemas into fully-functional Pydantic models, fully supporting the JSON Schema Draft 2020-12 standard โ€” including all standard validation keywords (minLength, maxLength, pattern, enum, minimum, maximum, etc.) โ€” while also providing extensions for pre-validation coercion and post-validation checks. All validation logic is stored as data (not Python code). PyCharter also includes a comprehensive quality assurance module for monitoring data quality, tracking violations, and generating quality reports.

It provides:

  • Data Contract Definition & Management: Define formal agreements that specify data structure, quality rules, and governance policies
  • Schema Registry: Centralized storage and versioning of data schemas with support for schema evolution
  • Data Quality Enforcement: Coercion rules (data transformation) and validation rules (business constraints) to ensure data integrity
  • Data Governance: Track ownership, stewardship, and enforce data policies across your organization
  • Metadata Management: Store and retrieve data about data (schemas, ownership, rules, lineage)
  • Runtime Validation: Validate data against contracts in production pipelines, APIs, and data processes

Data Terminology

PyCharter implements key data management concepts:

  • Data Contract: Formal specification of data structure, quality rules, and governance policies that define the "contract" between data producers and consumers
  • Schema Registry: Centralized repository for storing and versioning data schemas, enabling schema discovery and evolution tracking
  • Data Quality: Coercion (pre-validation transformation) and validation (post-validation checks) rules that ensure data integrity
  • Data Governance: Ownership tracking, stewardship management, and policy enforcement for data assets
  • Schema Evolution: Versioning and migration capabilities that allow data structures to evolve over time while maintaining backward compatibility
  • Metadata Management: Storage and retrieval of data about data, including schemas, ownership information, governance rules, and lineage

Use Cases

  • Data Pipeline Validation: Ensure data conforms to contracts before processing in pipelines
  • API Contract Enforcement: Validate API request/response data against defined contracts
  • Data Integration: Standardize data formats across systems and services
  • Compliance & Governance: Track ownership, enforce data policies, and maintain audit trails
  • Schema Registry: Centralized schema management for microservices and data platforms
  • Data Quality Assurance: Catch data quality issues early in the pipeline through automated validation

๐Ÿ“ Concepts

A short mental model so you know what to reach for.

Concept What it is When you use it
Schema The shape of the data (JSON Schema): types, required fields, nested objects. When you only need structure (e.g. โ€œthis field is string, that one is integerโ€).
Data contract Schema + coercion rules (e.g. string โ†’ int) + validation rules (e.g. min/max, allowed values) + optional metadata (ownership, governance). When you want one artifact that defines structure, transforms, and business rules.
Metadata store A database (SQLite, PostgreSQL, etc.) that stores contracts (and their versions) so many apps can reuse them. When you have multiple services or pipelines and want a single source of truth.
ETL validation Validating data after extract (source) and before load (target) in a pipeline, using a schema or contract. When you run ETL and want to reject or quarantine bad rows at stage boundaries.

Flow from โ€œjust validateโ€ to โ€œcontracts in a store and ETLโ€:

  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
  โ”‚  Option A: No database                                                   โ”‚
  โ”‚  Schema/contract in code or YAML file  โ†’  Validator  โ†’  validate(data)  โ”‚
  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
  โ”‚  Option B: With contract store                                           โ”‚
  โ”‚  Contract in DB  โ†’  Validator(store=..., schema_id=...)  โ†’  validate()  โ”‚
  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
  โ”‚  Option C: ETL pipeline                                                  โ”‚
  โ”‚  Extract  โ†’  [optional: validate with contract]  โ†’  Transform  โ†’         โ”‚
  โ”‚  [optional: validate with contract]  โ†’  Load                            โ”‚
  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Start with Option A (Quick start above); add contract store when you need versioned, shared contracts; add ETL validation when you run pipelines and want contract checks at extract/load.


โœจ Features

  • ๐Ÿš€ Dynamic Model Generation - Convert JSON schemas to Pydantic models at runtime
  • ๐Ÿ“‹ JSON Schema Compliant - Full support for JSON Schema Draft 2020-12 standard
  • ๐Ÿ”„ Type Coercion - Automatic type conversion before validation (e.g., string โ†’ integer)
  • โœ… Custom Validators - Built-in and extensible validation rules
  • ๐Ÿ—๏ธ Nested Structures - Full support for nested objects and arrays
  • ๐Ÿ“ฆ Multiple Input Formats - Load schemas from dicts, JSON strings, files, or URLs
  • ๐ŸŽฏ Type Safe - Full type hints and Pydantic v2 compatibility
  • ๐Ÿ”ง Extensible - Register custom coercion and validation functions
  • ๐Ÿ“Š Data-Driven - All validation logic stored as JSON data, not Python code
  • ๐Ÿ“ Data Contract Management - Complete lifecycle management for data contracts with versioning
  • ๐Ÿ’พ Schema Registry - Centralized schema storage with support for PostgreSQL, MongoDB, Redis, and InMemory
  • ๐Ÿ›๏ธ Data Governance - Track ownership, stewardship, and enforce governance policies
  • ๐Ÿ” Metadata Management - Store and query metadata about your data assets
  • ๐Ÿ“ˆ Schema Evolution - Version schemas and track changes over time
  • โœ… Quality Assurance - Monitor data quality, calculate metrics, track violations, and generate reports
  • ๐Ÿ“Š Quality Metrics - Calculate quality scores, accuracy, completeness, and violation rates
  • ๐Ÿšจ Threshold Alerting - Set quality thresholds and get alerts when quality degrades
  • ๐Ÿ“ Ontology & Semantic Layer - Concept schemes, concept types, relationships, field bindings, and a diagrammatic ontology workspace in the UI
  • ๐Ÿ“Š Pipeline Diagram Editor - Figma Jamโ€“style visual editor for ETL pipelines (extract/transform/load nodes with labels and annotations) in the Pipelines section of the UI

๐Ÿ“ฆ Installation

Core Library

pip install pycharter

With API Support

pip install pycharter[api]

This installs FastAPI and Uvicorn for running the REST API server.

Full install (all extras)

pip install pycharter[all]

Installs api, worker, ui, pipeline, postgres, docs, ontology, lineage, streaming, and messaging (Kafka + RabbitMQ). Use this when you want the whole toolkit; for smaller installs, pick individual extras below.

With UI Support

pip install pycharter[ui]

This installs the Python dependencies and pre-built UI static files (like Airflow).

After installation, you can immediately start the UI:

pycharter ui serve    # Production mode (uses pre-built static files)

For development (if you have the source code):

cd src/pycharter/ui
npm install          # Install Node.js dependencies
cd ../../../       # back to repository root (src/pycharter/ui โ†’ repo root)
pycharter ui dev     # Development mode with hot reload

Note: When installed from pip, the UI works immediately without Node.js. For development with hot reload, Node.js is required; see the UI README for setup.

Database setup (for contract store, API, and seed data)

If you use the contract store (SQLite or PostgreSQL), the REST API, or seed data, initialize the database once:

# Initialize database schema (SQLite default: sqlite:///pycharter.db)
pycharter db init

# Seed reference data (owners, domains, systems, environments, data feeds, compliance frameworks, tags)
pycharter db seed

Default seed directory (when you omit the path) is the bundled data/seed inside the pycharter package (src/pycharter/data/seed in the repo). It loads reference data, contracts, pipelines, and semantic seed YAML in one step. To use a custom seed directory: pycharter db seed /path/to/seed [database_url].

The data/aviation_examples/ tree at the repository root holds sample contract artifacts (files on disk) for tutorials and demos. It is not loaded automatically by pycharter db seed; import those contracts through the Web UI or the REST API as needed. See Seed data and sample files.

Use pycharter db init --force to drop and recreate all tables (SQLite only; destroys existing data). See Configuration Guide for connection options and migrations.

๐Ÿš€ Quick Start

Quick Start: ETL Pipelines

Build and run ETL pipelines programmatically (with the | operator) or from YAML configs. Pipeline run() is async; use asyncio.run() from scripts or await in async code.

import asyncio
from pycharter import Pipeline, HTTPExtractor, PostgresLoader, Rename, AddField

# Programmatic pipeline
pipeline = (
    Pipeline(HTTPExtractor(url="https://api.example.com/data"))
    | Rename({"old": "new"})
    | AddField("processed_at", "now()")
    | PostgresLoader(connection_string="...", table="users")
)
result = asyncio.run(pipeline.run())

# Config-driven: explicit files
pipeline = Pipeline.from_config_files(
    extract="configs/extract.yaml",
    load="configs/load.yaml",
    variables={"API_KEY": "secret"}
)

# Config-driven: directory (extract.yaml, transform.yaml, load.yaml)
pipeline = Pipeline.from_config_dir("pipelines/users/")

# Config-driven: single file
pipeline = Pipeline.from_config_file("pipelines/users/pipeline.yaml")

result = asyncio.run(pipeline.run())

See ETL Pipelines under Core Services for error handling (error_context, ErrorMode) and variable resolution (PipelineContext(variables={...})).

Quick Start: Convenience Functions (One-off Use)

from pycharter import from_dict, validate

# Define your JSON schema
schema = {
    "type": "object",
    "version": "1.0.0",
    "properties": {
        "name": {"type": "string"},
        "age": {"type": "integer"},
        "email": {"type": "string"}
    },
    "required": ["name", "age"]
}

# Generate a Pydantic model (convenience function)
Person = from_dict(schema, "Person")

# Validate data
result = validate(Person, {"name": "Alice", "age": 30, "email": "alice@example.com"})
if result.is_valid:
    print(f"Valid: {result.data.name}")  # Output: Valid: Alice

Production Use: Validator Class (Recommended)

For production code with multiple validations, use the Validator class for better performance. Create validators via factory methods or from a contract store:

from pycharter import Validator

# From directory (expects schema.yaml, coercion_rules.yaml, validation_rules.yaml)
validator = Validator.from_dir("data/contracts/user")

# From explicit files (any filenames)
validator = Validator.from_files(
    schema="schemas/user.yaml",
    coercion_rules="rules/coercion.yaml",
    validation_rules="rules/validation.yaml"
)

# From a single contract file
validator = Validator.from_file("user_contract.yaml")

# From dictionaries
validator = Validator.from_dict(schema={...}, coercion_rules={...}, validation_rules={...})

# From contract store (with database)
validator = Validator(store=store, schema_id="user_schema_v1")

# Validate multiple records efficiently (model is cached)
result1 = validator.validate({"name": "Alice", "age": 30})
result2 = validator.validate({"name": "Bob", "age": 25})

# Batch validation
results = validator.validate_batch([data1, data2, data3])

With Metadata Store

from pycharter import Validator, SQLiteContractStore

# Connect to contract store
store = SQLiteContractStore("metadata.db")
store.connect()

# Create validator from store
validator = Validator(store=store, schema_id="user_schema_v1")

# Validate data
result = validator.validate({"name": "Alice", "age": 30})

Using ValidatorBuilder (Fluent API)

ValidatorBuilder provides a chainable interface for configuring validators with quality checks, state-specific rules, and metrics tracking:

from pycharter import ValidatorBuilder

validator = (
    ValidatorBuilder()
    .from_dir("contracts/order")
    .with_quality_checks(thresholds={"completeness": 0.95})
    .with_state_rules({
        "DRAFT":  {"optional": ["filled_qty", "fill_price"]},
        "FILLED": {"required": ["filled_qty", "fill_price"]},
    })
    .strict()
    .build()
)

result = validator.validate({"symbol": "AAPL", "order_qty": 100})

# State-aware validation: different required fields per FSM state
result_draft  = validator.validate_for_state({"symbol": "AAPL", "order_qty": 100}, "DRAFT")
result_filled = validator.validate_for_state(
    {"symbol": "AAPL", "order_qty": 100, "filled_qty": 100, "fill_price": 150.0}, "FILLED"
)

validate_for_state(data, state) applies per-state overrides (fields become optional in DRAFT, required in FILLED) on top of the base schema โ€” useful when the same entity has different validation rules at different lifecycle stages.

Domain Lifecycle & FSM Integration

PyCharter supports linking data contracts to finite state machines (e.g. PyStator) via a lifecycle binding convention in contract metadata:

# In your contract YAML
metadata:
  governance_rules:
    lifecycle:
      state_machine_name: order_management
      machine_version: "1.0.0"
      state_field: status          # field in your data that holds the FSM state
      entity_id_field: order_id
from pycharter import check_state_alignment, validate_lifecycle_binding, get_lifecycle_binding

# Validate that contract enum values match your FSM states exactly
fsm_states = {"PENDING", "OPEN", "FILLED", "CANCELLED"}
result = check_state_alignment(contract_dict, fsm_states, state_field="status")
# result.aligned: bool, result.missing_from_contract: set, result.missing_from_fsm: set

# Validate the lifecycle binding structure
errors = validate_lifecycle_binding(contract_dict.get("metadata", {}))
# Returns [] if valid, or a list of error strings

# Read the binding
binding = get_lifecycle_binding(contract_dict.get("metadata", {}))
# binding.state_machine_name, binding.machine_version, binding.state_field, ...

๐Ÿ“ API Organization

PyCharter's API is organized into three tiers to help you choose the right approach for your use case:

Tier 1: Primary Interfaces (โญ Recommended for Production)

Classes that provide the best performance and most features:

  • Validator - Primary validation interface (use for multiple validations); create via from_dir(), from_files(), from_file(), from_dict() or from store
  • Pipeline - ETL pipeline (programmatic or config-driven); create via from_config_files(), from_config_dir(), from_config_file() or constructor
  • QualityCheck - Primary quality assurance interface
  • ContractStoreClient - Base class for contract stores

When to use: Production code, batch processing, when you need to validate multiple records or run ETL pipelines.

Tier 2: Convenience Functions (Quick Start)

Functions that make common tasks easy and discoverable:

  • Input helpers: from_dict(), from_file(), from_json(), from_url()
  • Output helpers: to_dict(), to_file(), to_json()
  • Validation helpers: validate_with_store(), validate_with_contract()
  • Contract helpers: parse_contract_file(), build_contract()

When to use: Quick scripts, one-off validations, exploratory work, learning the library.

Tier 3: Low-Level Utilities

Functions for when you already have models or need fine-grained control:

  • validate() - Validate with existing Pydantic model
  • validate_batch() - Batch validate with existing model
  • model_to_schema() - Core conversion function

When to use: Advanced use cases, when you've already generated models, custom workflows.

Choosing the Right Approach

Use Case Recommended Approach Example
Production pipeline with multiple validations Validator class validator = Validator(store=store, schema_id="schema"); validator.validate(data)
Quick one-off validation Convenience function validate_with_contract("contract.yaml", data)
You already have a model Low-level function validate(UserModel, data)
Batch processing Validator.validate_batch() validator.validate_batch([data1, data2, data3])

๐Ÿ—๏ธ Core Services & Data Production Journey

PyCharter provides eight core services that work together to support a complete data production journey, from contract specification to quality assurance. Each service plays a critical role in managing data contracts and ensuring data quality throughout your pipeline.

The Data Production Journey

The typical data production workflow follows this path:

1. Data Contract Specification
   โ†“
2. Contract Parsing
   โ†“
3. Metadata Storage
   โ†“
4. Pydantic Model Generation
   โ†“
5. Runtime Validation
   โ†“
6. Quality Assurance & Monitoring

1. ๐Ÿ“„ Contract Parser (pycharter.contract_parser)

Purpose: Reads and decomposes data contract files into structured metadata components.

When to Use: At the beginning of your data production journey, when you have data contract files (YAML or JSON) that need to be processed and understood.

How It Works:

  • Accepts data contract files containing schema definitions, governance rules, ownership information, and metadata
  • Decomposes the contract into distinct components: schema, governance_rules, ownership, and metadata
  • Returns a ContractMetadata object that separates concerns and makes each component accessible
  • Extracts and tracks versions of all components

Example:

from pycharter import parse_contract_file, ContractMetadata

# Parse a contract file (YAML or JSON)
metadata = parse_contract_file("data_contract.yaml")

# Access decomposed components
schema = metadata.schema              # JSON Schema definition
governance = metadata.governance_rules # Governance policies
ownership = metadata.ownership         # Owner/team information
metadata_info = metadata.metadata      # Additional metadata
versions = metadata.versions          # Component versions

Contribution to Journey: The contract parser is the entry point that takes raw contract specifications and prepares them for downstream processing. It ensures that contracts are properly structured and that all components (schema, governance, ownership) are separated for independent handling.


1b. ๐Ÿ—๏ธ Contract Builder (pycharter.contract_builder)

Purpose: Constructs consolidated data contracts from separate artifacts (schema, coercion rules, validation rules, metadata).

When to Use: When you have separate artifacts stored independently and need to combine them into a single consolidated contract for runtime validation or distribution.

How It Works:

  • Takes separate artifacts (schema, coercion rules, validation rules, metadata, ownership, governance rules)
  • Merges coercion and validation rules into the schema
  • Tracks versions of all components
  • Produces a consolidated contract suitable for runtime validation
  • Can build from artifacts directly or retrieve from contract store

Example:

from pycharter import build_contract, build_contract_from_store, ContractArtifacts

# Build from separate artifacts
artifacts = ContractArtifacts(
    schema={"type": "object", "version": "1.0.0", "properties": {...}},
    coercion_rules={"version": "1.0.0", "rules": {"age": "coerce_to_integer"}},
    validation_rules={"version": "1.0.0", "rules": {"age": {"is_positive": {...}}}},
    metadata={"version": "1.0.0", "description": "User contract"},
    ownership={"owner": "data-team", "team": "engineering"},
)

contract = build_contract(artifacts)
# Contract now has:
# - schema (RAW - rules NOT merged into it)
# - coercion_rules, validation_rules (separate)
# - metadata, ownership, governance_rules
# - versions tracking all components

# Or build from contract store
contract = build_contract_from_store(store, "user_schema_v1")

# Use for validation - Validator merges rules internally
from pycharter import validate_with_contract
result = validate_with_contract(contract, {"name": "Alice", "age": "30"})

Contribution to Journey: The contract builder is the consolidation layer that combines separate artifacts (stored independently in the database) into a single contract artifact. The contract contains raw schema + separate rules (not merged). The Validator class handles merging internally during validation, keeping the contract structure clear and editable.


2. ๐Ÿ’พ Contract Store Client (pycharter.contract_store)

Purpose: Manages persistent storage and retrieval of decomposed metadata in databases.

When to Use: After parsing contracts, when you need to store metadata components (schemas, governance rules, ownership) in a database for versioning, querying, and governance.

How It Works:

  • Provides methods to store and retrieve schemas, governance rules, ownership information, and metadata
  • Supports versioning and querying of stored metadata
  • Multiple implementations available: PostgreSQL, MongoDB, Redis, and In-Memory (for testing)

Available Implementations:

  • PostgresContractStore - For PostgreSQL databases (recommended for production)
  • SQLiteContractStore - For SQLite databases (great for development and small deployments)
  • MongoDBContractStore - For MongoDB databases
  • RedisContractStore - For Redis databases
  • InMemoryContractStore - For testing and development (no persistence)

Example:

from pycharter import SQLiteContractStore, parse_contract_file

# Parse contract
metadata = parse_contract_file("contract.yaml")

# Use SQLite contract store (or PostgresContractStore, MongoDBContractStore, RedisContractStore, etc.)
store = SQLiteContractStore("metadata.db")
store.connect()

# Store decomposed components
schema_id = store.store_schema("user_schema", metadata.schema, version="1.0")

# Merge ownership and governance into metadata before storing
# Ownership and governance are part of metadata, not separate entities
metadata_dict = metadata.metadata.copy() if metadata.metadata else {}
if metadata.ownership:
    metadata_dict["business_owners"] = [metadata.ownership.get("owner", "unknown")] if metadata.ownership.get("owner") else []
if metadata.governance_rules:
    metadata_dict["governance_rules"] = metadata.governance_rules

# Store metadata once with all information (ownership and governance included)
store.store_metadata(resource_id=schema_id, resource_type="schema", metadata=metadata_dict)

# Store coercion and validation rules
store.store_coercion_rules(schema_id, {"age": "coerce_to_integer"}, version="1.0")
store.store_validation_rules(schema_id, {"age": {"is_positive": {}}}, version="1.0")

# Retrieve later
stored_schema = store.get_schema(schema_id)
coercion_rules = store.get_coercion_rules(schema_id)
validation_rules = store.get_validation_rules(schema_id)

Contribution to Journey: The contract store is the persistence layer that ensures contracts and their components are versioned, searchable, and accessible across your organization. It enables governance, audit trails, and schema evolution tracking.

See Configuration Guide for database setup and initialization instructions.


3. ๐Ÿญ Pydantic Generator (pycharter.pydantic_generator)

Purpose: Dynamically generates fully-functional Pydantic models from JSON Schema definitions.

When to Use: After storing schemas (or directly from parsed contracts), when you need to generate Python models for type-safe data validation and processing.

How It Works:

  • Takes JSON Schema definitions (from contracts or contract store)
  • Programmatically generates Pydantic model classes at runtime
  • Supports all JSON Schema Draft 2020-12 features plus custom coercions and validations
  • Can generate models from dictionaries, JSON strings, files, or URLs
  • Optionally generates Python files with model definitions

Example:

from pycharter import from_dict, generate_model_file, ContractStoreClient

# Option 1: Generate from parsed contract
metadata = parse_contract_file("contract.yaml")
UserModel = from_dict(metadata.schema, "User")

# Option 2: Generate from stored schema
client = ContractStoreClient(...)
schema = client.get_schema("user_schema_v1")
UserModel = from_dict(schema, "User")

# Option 3: Generate and save to file
generate_model_file(schema, "user_model.py", "User")

Contribution to Journey: The Pydantic generator is the transformation engine that converts declarative JSON Schema definitions into executable Python models. It bridges the gap between contract specifications (data) and runtime validation (code), enabling type-safe data processing.


4. ๐Ÿ”„ JSON Schema Converter (pycharter.json_schema_converter)

Purpose: Converts existing Pydantic models back into JSON Schema format (reverse conversion).

When to Use: When you have existing Pydantic models and need to generate JSON Schema definitions, or when you want to round-trip between schemas and models.

How It Works:

  • Takes Pydantic model classes as input
  • Generates JSON Schema dictionaries that represent the model structure
  • Preserves validation rules, types, and constraints
  • Can output to dictionaries, JSON strings, or files

Example:

from pycharter import to_dict, to_file, to_json
from pydantic import BaseModel

class Product(BaseModel):
    name: str
    price: float
    in_stock: bool = True

# Convert to JSON Schema
schema = to_dict(Product)
json_string = to_json(Product)
to_file(Product, "product_schema.json")

# Now you can use the schema with other services
ProductModel = from_dict(schema, "Product")  # Round-trip

Contribution to Journey: The JSON Schema converter enables bidirectional conversion between models and schemas. It's useful for:

  • Generating schemas from existing code
  • Round-trip validation (schema โ†’ model โ†’ schema)
  • Integrating with systems that require JSON Schema format
  • Documenting existing models as schemas

5. โœ… Runtime Validator (pycharter.runtime_validator)

Purpose: Lightweight validation utility for validating data against generated Pydantic models in production data pipelines.

When to Use: In your data processing scripts, ETL pipelines, API endpoints, or any place where you need to validate incoming data against contract specifications.

API Organization:

PyCharter provides validation through three tiers:

  1. Tier 1: Validator Class (โญ PRIMARY INTERFACE - Recommended for production)

    • Best performance for multiple validations (model is cached)
    • Supports all data sources (contract files, directories, stores, dictionaries)
    • Reusable instance for batch processing
  2. Tier 2: Convenience Functions (Quick start - one-off validations)

    • validate_with_store() - Quick validation with contract store
    • validate_with_contract() - Quick validation with contract file/dict
    • get_model_from_store() / get_model_from_contract() - Get model for reuse
  3. Tier 3: Low-Level Functions (When you already have a model)

    • validate() - Validate single record with existing model
    • validate_batch() - Batch validate with existing model

How It Works:

  • Takes a Pydantic model (generated from a schema) and raw data
  • Validates data against the model's constraints
  • Returns a ValidationResult with validation status, validated data, and errors
  • Supports single record and batch validation
  • Can be used in strict mode (raises exceptions) or lenient mode (returns results)

Example - Validator Class (Recommended):

from pycharter import Validator, SQLiteContractStore

# Option 1: From directory (schema.yaml, coercion_rules.yaml, validation_rules.yaml)
validator = Validator.from_dir("data/contracts/user")
result = validator.validate({"name": "Alice", "age": 30})

# Option 2: From explicit files
validator = Validator.from_files(schema="schemas/user.yaml", coercion_rules="rules/coercion.yaml")
result = validator.validate({"name": "Alice", "age": 30})

# Option 3: From single contract file
validator = Validator.from_file("user_contract.yaml")
result = validator.validate({"name": "Alice", "age": 30})

# Option 4: From contract store (with database)
store = SQLiteContractStore("metadata.db")
store.connect()
validator = Validator(store=store, schema_id="user_schema_v1")
result = validator.validate({"name": "Alice", "age": 30})

# Batch validation (efficient - model cached)
results = validator.validate_batch([data1, data2, data3])

Example - Convenience Functions (Quick Start):

from pycharter import validate_with_store, validate_with_contract, SQLiteContractStore

# Quick validation with store
store = SQLiteContractStore("metadata.db")
store.connect()
result = validate_with_store(store, "user_schema_v1", {"name": "Alice", "age": 30})

# Quick validation with contract file (no database)
result = validate_with_contract("user_contract.yaml", {"name": "Alice", "age": 30})

Example - Low-Level (When You Have a Model):

from pycharter import from_dict, validate, validate_batch

# Generate model
UserModel = from_dict(schema, "User")

# Validate single record
result = validate(UserModel, {"name": "Alice", "age": 30})

# Batch validate
results = validate_batch(UserModel, [data1, data2, data3])

Performance Tips:

  • โšก For multiple validations: Use Validator class (model is cached)
  • โšก For one-off validations: Convenience functions are fine
  • โšก For batch processing: Use Validator.validate_batch() or validate_batch()

Contribution to Journey: The runtime validator is the enforcement layer that ensures data quality in production. It validates actual data against contract specifications, catching violations early and preventing bad data from propagating through your systems. It supports both database-backed workflows (for production systems with contract stores) and contract-based workflows (for simpler use cases without database dependencies).


5b. ๐Ÿ”„ Pipelines (pycharter.pipeline_generator)

Purpose: Build and run ETL pipelines programmatically (with the | operator) or from YAML configs. No assumptions about project layoutโ€”you specify file paths or use a directory with standard filenames.

When to Use: When you need to extract, transform, and load data from config-driven or code-defined pipelines (HTTP, files, databases, cloud storage โ†’ transforms โ†’ Postgres, files, cloud).

How It Works:

  • Programmatic: Pipeline(extractor) | transformer | loader; chain with |; call await pipeline.run().
  • Config-driven: Load from explicit files (from_config_files), from a directory with extract.yaml, transform.yaml, load.yaml (from_config_dir), or from a single pipeline.yaml (from_config_file).
  • Variables: Pass PipelineContext(variables={"API_KEY": "x"}) or variables={...} in factory methods; ${VAR} and ${VAR:-default} in configs are resolved from these (no built-in CONTRACT_DIR).
  • Async: run() is async; use asyncio.run(pipeline.run()) in scripts or await pipeline.run() in async code.
  • Error handling: Optional error_context with ErrorMode (STRICT, LENIENT, COLLECT) controls whether extraction/load failures raise or are collected in result.errors.

Example:

import asyncio
from pycharter import Pipeline, PipelineContext, HTTPExtractor, PostgresLoader, Rename, AddField

# Programmatic
pipeline = (
    Pipeline(HTTPExtractor(url="https://api.example.com/users"))
    | Rename({"userName": "name"})
    | AddField("processed_at", "now()")
    | PostgresLoader(connection_string="...", table="users")
)
result = asyncio.run(pipeline.run())

# Config-driven (explicit files)
pipeline = Pipeline.from_config_files(
    extract="configs/extract.yaml",
    load="configs/load.yaml",
    variables={"API_KEY": "secret"}
)

# Config-driven (directory: extract.yaml, transform.yaml, load.yaml)
pipeline = Pipeline.from_config_dir("pipelines/users/")

# Config-driven (single file)
pipeline = Pipeline.from_config_file("pipelines/users/pipeline.yaml")

result = asyncio.run(pipeline.run())

Exceptions: Pipeline and config loading use PyCharterโ€™s exception hierarchy: PyCharterError (base), ConfigError, ConfigValidationError, ExpressionError. See Exceptions under API Reference.

See pycharter/pipeline_generator/ASYNC_AND_EXECUTION.md for async usage and error modes.


6. ๐Ÿ” Quality Assurance (pycharter.quality)

Purpose: Data quality assurance pipeline that polices data according to data contracts, calculates quality metrics, tracks violations, and generates quality reports.

When to Use: When you need to:

  • Monitor data quality over time
  • Calculate quality scores and metrics
  • Track and manage data quality violations
  • Set quality thresholds and get alerts
  • Generate quality reports for governance

How It Works:

  • Validates data against contracts (using Runtime Validator)
  • Calculates quality metrics (accuracy, completeness, violation rates)
  • Tracks violations for audit and remediation
  • Checks quality thresholds and generates alerts
  • Produces comprehensive quality reports

Example:

from pycharter import QualityCheck, QualityCheckOptions, QualityThresholds

# Define quality thresholds
thresholds = QualityThresholds(
    min_overall_score=95.0,
    max_violation_rate=0.05
)

# Run quality check
check = QualityCheck(store=store)
report = check.run(
    schema_id="user_schema_v1",
    data="data/users.json",
    options=QualityCheckOptions(
        calculate_metrics=True,
        record_violations=True,
        check_thresholds=True,
        thresholds=thresholds
    )
)

print(f"Quality Score: {report.quality_score.overall_score:.2f}/100")
print(f"Passed: {report.passed}")

Contribution to Journey: The quality assurance module is the policing layer that ensures data quality is maintained throughout the pipeline. It provides metrics, tracking, and alerting capabilities that transform PyCharter from a contract management tool into a complete data quality assurance platform.

See Quality Module README for detailed documentation.


Complete Workflow Example

Here's how all services work together in a complete data production journey:

from pycharter import (
    parse_contract_file,
    SQLiteContractStore,
    from_dict,
    Validator,
    to_dict
)

# Step 1: Parse contract specification
metadata = parse_contract_file("user_contract.yaml")

# Step 2: Store metadata in database
store = SQLiteContractStore("metadata.db")
store.connect()
schema_id = store.store_schema("user", metadata.schema, version="1.0")

# Merge ownership and governance into metadata before storing
# Ownership and governance are part of metadata, not separate entities
metadata_dict = metadata.metadata.copy() if metadata.metadata else {}
if metadata.ownership:
    metadata_dict["business_owners"] = [metadata.ownership.get("owner", "unknown")] if metadata.ownership.get("owner") else []
if metadata.governance_rules:
    metadata_dict["governance_rules"] = metadata.governance_rules

# Store metadata once with all information (ownership and governance included)
store.store_metadata(resource_id=schema_id, resource_type="schema", metadata=metadata_dict)

# Store coercion and validation rules
store.store_coercion_rules(schema_id, {"age": "coerce_to_integer"}, version="1.0")
store.store_validation_rules(schema_id, {"age": {"is_positive": {}}}, version="1.0")

# Step 3: Generate Pydantic model from stored schema
schema = store.get_schema(schema_id)
UserModel = from_dict(schema, "User")

# Step 4: (Optional) Convert model back to schema for documentation
schema_doc = to_dict(UserModel)

# Step 5: Validate data in production pipeline
# Option A: Using Validator class (recommended for production)
validator = Validator(store=store, schema_id=schema_id)

def process_user_data(raw_data):
    result = validator.validate(raw_data)
    if result.is_valid:
        # Process validated data
        return result.data
    else:
        # Handle validation errors
        raise ValueError(f"Invalid data: {result.errors}")

# Option B: Using convenience function (quick start)
from pycharter import validate_with_store

def process_user_data_quick(raw_data):
    result = validate_with_store(store, schema_id, raw_data)
    if result.is_valid:
        return result.data
    else:
        raise ValueError(f"Invalid data: {result.errors}")

7. ๐ŸŒ REST API (pycharter.api)

Purpose: Expose all PyCharter services as REST API endpoints.

When to Use: When you need to use PyCharter from non-Python applications, microservices, or want to provide a web-based interface.

How It Works:

  • Provides HTTP endpoints for all core services
  • Uses FastAPI for automatic OpenAPI/Swagger documentation
  • Supports both store-based and contract-based operations
  • Handles request/response validation with Pydantic models
  • Located at the root level (api/) as a separate application
  • All endpoints are async-ready for better performance

Example:

# Start the API server (uses PYCHARTER_DATABASE_URL or sqlite:///pycharter.db)
pycharter api

# With host/port
pycharter api --host 0.0.0.0 --port 8080

Endpoints (see Swagger for full list):

  • Contracts: POST /api/v1/contracts/parse, POST /api/v1/contracts/build
  • Metadata: POST /api/v1/metadata/schemas, GET /api/v1/metadata/schemas/{schema_id}
  • Validation: POST /api/v1/validation/validate, POST /api/v1/validation/validate-batch
  • Quality: POST /api/v1/quality/check, GET /api/v1/quality/metrics
  • ETL: POST /api/v1/etl/run (extract/transform/load YAML; optional pipeline quality checks)
  • Pipeline runs: GET /api/v1/runs, GET /api/v1/runs/stats, GET /api/v1/pipelines
  • Ontology: GET /api/v1/semantic/schemes, concepts, relationships, semantic health, and related ontology endpoints

Documentation:

See src/pycharter/api/README.md for complete API documentation.

8. ๐ŸŒ Web UI (pycharter.ui)

Purpose: Browser-based interface for contracts, pipelines, quality, and ontology. Served by pycharter ui serve (production) or pycharter ui dev (development with hot reload).

Main sections:

  • Contracts โ€” Registry, diagram/form editor, and validation workspace
  • Pipelines โ€” Overview dashboard, pipeline runs, ETL generator (YAML panels), and Pipeline diagram (Figma Jamโ€“style visual editor for extract/transform/load nodes)
  • Quality โ€” Quality metrics, thresholds, and violation tracking
  • Ontology โ€” Concepts, concept schemes, diagrammatic workspace (ReactFlow), proposals, and semantic health
  • Documentation โ€” In-app API playground and docs

Configure the API base URL in the UI Settings page. Authentication uses PYCHARTER_AUTH_USERS (see Configuration Guide).

Service Integration Summary

Service Input Output Journey Stage
Contract Parser Contract files (YAML/JSON) ContractMetadata Contract Specification โ†’ Parsing
Contract Builder Separate artifacts or Store Consolidated contract Storage โ†’ Consolidation
Metadata Store ContractMetadata Stored metadata (DB) Parsing โ†’ Storage
Pydantic Generator JSON Schema Pydantic models Storage โ†’ Model Generation
JSON Schema Converter Pydantic models JSON Schema (Bidirectional)
Runtime Validator Pydantic models + Data ValidationResult Model Generation โ†’ Validation
ETL Pipelines Config files or code PipelineResult Extract โ†’ Transform โ†’ Load
Quality Assurance Contract + Data QualityReport Validation โ†’ Quality Monitoring

Each service is designed to be independent yet composable, allowing you to use them individually or together as part of a complete data contract management system.

๐Ÿ“– Documentation

  • Full documentation (Python API, tutorials, guides): https://optophi.github.io/pycharter/
  • Serve docs locally: pycharter docs serve (default: http://127.0.0.1:5002). Build static site: pycharter docs build. Requires pip install pycharter[docs].
  • Configuration Guide - Database connection, pycharter db init / upgrade / seed, migrations, and variable injection
  • Data Journey Guide - Data production journey: contract specification โ†’ storage โ†’ validation โ†’ quality
  • Database ERD - Database schema and entity relationship diagrams
  • Examples & Notebooks - Tutorials and guides (ETL, contracts, validation, quality, contract store, schema conversion); optional Marimo for interactive .py notebooks
  • REST API - API endpoints and usage (install with pip install pycharter[api])

๐Ÿ“š Usage Examples

Basic Usage

Using Convenience Functions (Quick Start):

from pycharter import from_dict, from_json, from_file

# From dictionary
schema = {
    "type": "object",
    "version": "1.0.0",
    "properties": {
        "title": {"type": "string"},
        "published": {"type": "boolean", "default": False}
    }
}
Article = from_dict(schema, "Article")

# From JSON string
schema_json = '{"type": "object", "version": "1.0.0", "properties": {"name": {"type": "string"}}}'
User = from_json(schema_json, "User")

# From file
Product = from_file("product_schema.json", "Product")

Using Validator Class (Production):

from pycharter import Validator

# From directory or single file
validator = Validator.from_dir("data/contracts/article")
# or: validator = Validator.from_file("article_contract.yaml")
result = validator.validate({"title": "My Article", "published": True})

Nested Objects

from pycharter import from_dict

schema = {
    "type": "object",
    "version": "1.0.0",
    "properties": {
        "name": {"type": "string"},
        "address": {
            "type": "object",
            "properties": {
                "street": {"type": "string"},
                "city": {"type": "string"},
                "zipcode": {"type": "string"}
            }
        }
    }
}

Person = from_dict(schema, "Person")
person = Person(
    name="Alice",
    address={
        "street": "123 Main St",
        "city": "New York",
        "zipcode": "10001"
    }
)

print(person.address.city)  # Output: New York

Arrays and Collections

from pycharter import from_dict

schema = {
    "type": "object",
    "version": "1.0.0",
    "properties": {
        "tags": {
            "type": "array",
            "items": {"type": "string"}
        },
        "items": {
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "name": {"type": "string"},
                    "price": {"type": "number"}
                }
            }
        }
    }
}

Cart = from_dict(schema, "Cart")
cart = Cart(
    tags=["python", "pydantic"],
    items=[
        {"name": "Apple", "price": 1.50},
        {"name": "Banana", "price": 0.75}
    ]
)

print(cart.items[0].name)  # Output: Apple

Coercion and Validation

PyCharter supports coercion (pre-validation transformation) and validation (post-validation checks):

from pycharter import from_dict

schema = {
    "type": "object",
    "version": "1.0.0",
    "properties": {
        "flight_number": {
            "type": "integer",
            "coercion": "coerce_to_integer"  # Convert string/float to int
        },
        "destination": {
            "type": "string",
            "coercion": "coerce_to_string",
            "validations": {
                "min_length": {"threshold": 3},
                "max_length": {"threshold": 3},
                "no_capital_characters": None,
                "only_allow": {"allowed_values": ["abc", "def", "ghi"]}
            }
        },
        "distance": {
            "type": "number",
            "coercion": "coerce_to_float",
            "validations": {
                "greater_than_or_equal_to": {"threshold": 0}
            }
        }
    }
}

Flight = from_dict(schema, "Flight")

# Coercion happens automatically
flight = Flight(
    flight_number="123",    # Coerced to int: 123
    destination="abc",      # Passes all validations
    distance="100.5"        # Coerced to float: 100.5
)

๐Ÿ“‹ Standard JSON Schema Support

Charter supports all standard JSON Schema Draft 2020-12 validation keywords:

Keyword Type Description Example
minLength string Minimum string length {"minLength": 3}
maxLength string Maximum string length {"maxLength": 10}
pattern string Regular expression pattern {"pattern": "^[a-z]+$"}
enum any Allowed values {"enum": ["a", "b", "c"]}
const any Single allowed value {"const": "fixed"}
minimum number Minimum value (inclusive) {"minimum": 0}
maximum number Maximum value (inclusive) {"maximum": 100}
exclusiveMinimum number Minimum value (exclusive) {"exclusiveMinimum": 0}
exclusiveMaximum number Maximum value (exclusive) {"exclusiveMaximum": 100}
multipleOf number Must be multiple of {"multipleOf": 2}
minItems array Minimum array length {"minItems": 1}
maxItems array Maximum array length {"maxItems": 10}
uniqueItems array Array items must be unique {"uniqueItems": true}

All schemas are validated against JSON Schema standard before processing, ensuring compliance.

๐Ÿ”ง Built-in Coercions (Charter Extensions)

Coercion Description
coerce_to_string Convert int, float, bool, datetime, dict, list to string
coerce_to_integer Convert float, string (numeric), bool, datetime to int
coerce_to_float Convert int, string (numeric), bool to float
coerce_to_boolean Convert int, string to bool
coerce_to_datetime Convert string (ISO format), timestamp to datetime
coerce_to_date Convert string (date format), datetime to date (date only, no time)
coerce_to_uuid Convert string to UUID
coerce_to_lowercase Convert string to lowercase
coerce_to_uppercase Convert string to uppercase
coerce_to_stripped_string Strip leading and trailing whitespace from string
coerce_to_list Convert single value to list [value] (preserves None)
coerce_empty_to_null Convert empty strings/lists/dicts to None (useful for nullable fields)

โœ… Built-in Validations (Charter Extensions)

Validation Description Configuration
min_length Minimum length for strings/arrays {"threshold": N}
max_length Maximum length for strings/arrays {"threshold": N}
only_allow Only allow specific values {"allowed_values": [...]}
greater_than_or_equal_to Numeric minimum {"threshold": N}
less_than_or_equal_to Numeric maximum {"threshold": N}
is_positive Value must be positive {"threshold": 0}
no_capital_characters No uppercase letters null
no_special_characters Only alphanumeric and spaces null
non_empty_string String must not be empty null
matches_regex String must match regex pattern {"pattern": "..."}
is_email String must be valid email address null
is_url String must be valid URL null
is_alphanumeric Only alphanumeric characters (no spaces/special) null
is_numeric_string String must be numeric (digits, optional decimal) null
is_unique All items in array must be unique null

Note: Charter extensions (coercion and validations) are optional and can be used alongside standard JSON Schema keywords. All validation logic is stored as data in the JSON schema, making it fully data-driven.

๐ŸŽจ Custom Coercions and Validations

Extend Charter with your own coercion and validation functions:

from pycharter.shared.coercions import register_coercion
from pycharter.shared.validations import register_validation

# Register custom coercion
def coerce_to_uppercase(data):
    if isinstance(data, str):
        return data.upper()
    return data

register_coercion("coerce_to_uppercase", coerce_to_uppercase)

# Register custom validation
def must_be_positive(threshold=0):
    def _validate(value, info):
        if value <= threshold:
            raise ValueError(f"Value must be > {threshold}")
        return value
    return _validate

register_validation("must_be_positive", must_be_positive)

๐Ÿ“– API Reference

PyCharter's API is organized into three tiers to help you choose the right approach:

Tier 1: Primary Interfaces (Classes - Best Performance)

Validator - Primary validation interface (recommended for production)

from pycharter import Validator

# Create validator via factory methods or store
validator = Validator.from_dir("data/contracts/user")
validator = Validator.from_files(schema="schema.yaml", coercion_rules="coercion.yaml")
validator = Validator.from_file("contract.yaml")
validator = Validator.from_dict(schema={...}, coercion_rules={...})
validator = Validator(store=store, schema_id="user_schema")  # from contract store

# Validate data
result = validator.validate(data)
results = validator.validate_batch([data1, data2])
model = validator.get_model()  # Get the generated Pydantic model

# State-aware validation (per FSM lifecycle state)
result = validator.validate_for_state(data, state="DRAFT")
result = validator.validate_for_state(data, state="FILLED",
    state_rules={"FILLED": {"required": ["filled_qty"]}})

ValidatorBuilder - Fluent API for building validators

from pycharter import ValidatorBuilder

validator = (
    ValidatorBuilder()
    .from_dir("contracts/order")          # or .from_file(), .from_files(), .from_dict(), .from_store()
    .with_state_rules({                   # optional: per-state required/optional overrides
        "DRAFT":  {"optional": ["filled_qty"]},
        "FILLED": {"required": ["filled_qty"]},
    })
    .with_quality_checks(thresholds={"completeness": 0.95})  # optional: quality metrics
    .strict()                             # optional: raise on validation failure
    .build()
)

QualityCheck - Primary quality assurance interface

from pycharter import QualityCheck, QualityCheckOptions

check = QualityCheck(store=store)
report = check.run(schema_id="user_schema", data=data, options=QualityCheckOptions(...))

ContractStoreClient - Base class for contract stores

from pycharter import ContractStoreClient, SQLiteContractStore, PostgresContractStore

store = SQLiteContractStore("metadata.db")
store.connect()

Tier 2: Convenience Functions (Quick Start)

Pydantic Generator - Input type helpers

  • from_dict(schema: dict, model_name: str = "DynamicModel") - Create model from dictionary
  • from_json(json_string: str, model_name: str = "DynamicModel") - Create model from JSON string
  • from_file(file_path: str, model_name: str = None) - Create model from file (JSON/YAML)
  • from_url(url: str, model_name: str = "DynamicModel") - Create model from URL
  • generate_model(schema: dict, model_name: str = "DynamicModel") - Advanced: more control
  • generate_model_file(schema: dict, output_path: str, model_name: str = "DynamicModel") - Generate and save to file

JSON Schema Converter - Output type helpers

  • to_dict(model: Type[BaseModel], ...) - Convert model to JSON Schema dictionary
  • to_file(model: Type[BaseModel], file_path: str, ...) - Convert model to file
  • to_json(model: Type[BaseModel], ...) - Convert model to JSON string
  • model_to_schema(model: Type[BaseModel], ...) - Advanced: core conversion function

Runtime Validator - Data source helpers

  • validate_with_store(store, schema_id, data, ...) - Quick validation with contract store
  • validate_batch_with_store(store, schema_id, data_list, ...) - Batch validation with store
  • validate_with_contract(contract, data, ...) - Quick validation with contract file/dict
  • validate_batch_with_contract(contract, data_list, ...) - Batch validation with contract
  • get_model_from_store(store, schema_id, ...) - Get model from contract store
  • get_model_from_contract(contract, ...) - Get model from contract
  • validate_input(contract, ...) - Decorator for function input validation
  • validate_output(contract, ...) - Decorator for function output validation
  • validate_with_contract_decorator(contract, ...) - Decorator for contract-based validation

Contract Management

  • parse_contract(contract_dict: dict) - Parse contract dictionary
  • parse_contract_file(file_path: str) - Parse contract file (YAML/JSON)
  • build_contract(artifacts: ContractArtifacts) - Build contract from artifacts
  • build_contract_from_store(store, schema_id, ...) - Build contract from contract store

Tier 3: Low-Level Utilities (When You Have Models)

  • validate(model: Type[BaseModel], data: dict, strict: bool = False) - Validate single record
  • validate_batch(model: Type[BaseModel], data_list: List[dict], strict: bool = False) - Batch validate
  • ValidationResult - Result class with is_valid, data, and errors attributes

Metadata Store Implementations

  • InMemoryContractStore() - In-memory store (testing/development)
  • SQLiteContractStore(database_path: str) - SQLite database
  • PostgresContractStore(connection_string: str) - PostgreSQL database
  • MongoDBContractStore(connection_string: str) - MongoDB database
  • RedisContractStore(connection_string: str) - Redis database

Domain Lifecycle (FSM Integration)

from pycharter import (
    check_state_alignment,
    validate_lifecycle_binding,
    get_lifecycle_binding,
    get_domain_entity_info,
    DEFAULT_STATE_FIELD,
)

# Check that contract enum values match FSM states
result = check_state_alignment(contract_dict, fsm_states={"PENDING","OPEN","FILLED"},
                               state_field="status")
# result.aligned, result.missing_from_contract, result.missing_from_fsm

# Validate lifecycle binding structure in contract metadata
errors = validate_lifecycle_binding(metadata_dict)

# Read binding (state_machine_name, machine_version, state_field, entity_id_field)
binding = get_lifecycle_binding(metadata_dict)

DEFAULT_STATE_FIELD  # "status" โ€” the conventional field name for FSM state

Exceptions

PyCharter uses a small exception hierarchy for config and pipeline errors. Catch PyCharterError to handle any PyCharter failure:

  • PyCharterError - Base for all PyCharter exceptions
  • ConfigError - Config loading/parsing failures (missing file, invalid YAML)
  • ConfigValidationError - Schema validation failures (e.g. missing required type field)
  • ConfigLoadError - Config file load errors
  • ExpressionError - Expression evaluation failures (e.g. invalid syntax in AddField)

Pipeline run(error_context=...) supports ErrorMode: STRICT (raise on failure), LENIENT (log and continue), COLLECT (append to result.errors). Import from pycharter.shared.errors.

๐ŸŽฏ Design Principles & Requirements

Charter is designed to meet the following core requirements:

โœ… JSON Schema Standard Compliance

All schemas must abide by conventional JSON Schema syntax and qualify as valid JSON Schema:

  • Validation: All schemas are validated against JSON Schema Draft 2020-12 standard before processing
  • Standard Keywords: Full support for all standard validation keywords (minLength, pattern, enum, minimum, maximum, etc.)
  • Compliance: Uses jsonschema library for validation with graceful fallback

โœ… Data-Driven Validation Logic

All schema information and complex field validation logic is stored as data, not Python code:

  • Coercion: Referenced by name (string) in JSON: "coercion": "coerce_to_integer"
  • Validations: Referenced by name with configuration (dict) in JSON: "validations": {"min_length": {"threshold": 3}}
  • No Code Required: Validation rules are defined entirely in JSON schema files
  • Example: {"coercion": "coerce_to_string", "validations": {"min_length": {"threshold": 3}}}

โœ… Dynamic Pydantic Model Generation

Models are created dynamically at runtime from JSON schemas:

  • Runtime Generation: Uses pydantic.create_model() to generate models on-the-fly
  • Dynamic Validators: Field validators are dynamically attached using field_validator decorators
  • Multiple Sources: Models can be created from dicts, JSON strings, files, or URLs
  • No Static Code: All models are generated from data, not pre-defined classes

โœ… Nested Schema Support

Full support for nested object schemas and complex structures:

  • Recursive Processing: Nested objects are recursively processed into their own Pydantic models
  • Arrays of Objects: Arrays containing nested objects are fully supported
  • Deep Nesting: Deeply nested structures work correctly with full type safety
  • Type Safety: Each nested object becomes its own typed Pydantic model

โœ… Extension Fields

Custom fields can be added to JSON Schema to extend functionality:

  • coercion: Pre-validation type conversion (e.g., string โ†’ integer)
  • validations: Post-validation custom rules
  • Optional: Extensions work alongside standard JSON Schema keywords
  • Separated: Extensions are clearly distinguished from standard JSON Schema

โœ… Complex Field Validation

Support for both standard and custom field validators:

  • Standard Validators: minLength, pattern, enum, minimum, maximum, etc. (JSON Schema standard)
  • Custom Validators: Extensible validation rules via validations field
  • Validation Order: Coercion โ†’ Standard Validation โ†’ Pydantic Validation โ†’ Custom Validations
  • Factory Pattern: Validators are factory functions that return validation functions

๐Ÿš€ Development Setup

Quick Setup

# Run setup script
./scripts/setup.sh

# Activate environment
source venv/bin/activate

# Run tests
pytest

Using Make

make install-dev    # Install package and dev dependencies
make test          # Run tests
make format        # Format with Ruff (ruff format + ruff check --fix)
make lint          # Check formatting and lint with Ruff (no writes)
make type-check    # Run mypy on src/pycharter
make check         # Run format, lint, type-check, and test

Building the package: Run make clean && make build for a reliable build (clears stale egg-info; see Publishing).

๐Ÿงช Testing

# Run all tests
pytest

# Run with coverage
pytest --cov=pycharter --cov-report=html

# Run specific test file
pytest tests/test_converter.py

# Run tests matching a pattern
pytest -k "coercion"

๐Ÿ“ฆ Publishing to PyPI

Automatic publishing via GitHub Releases (Trusted Publishing - no tokens needed!):

# 1. Update version in pyproject.toml
# version = "0.0.21"

# 2. Commit and push
git add pyproject.toml
git commit -m "Bump version to 0.0.21"
git push

# 3. Create GitHub Release (automatically publishes to PyPI)
gh release create v0.0.21 --title "v0.0.21" --notes "Release notes"

The workflow automatically:

  • โœ… Builds UI
  • โœ… Builds Python package
  • โœ… Publishes to PyPI (using Trusted Publishing)

Local build (reliable): clean first to avoid stale build artifacts, then build:

make clean && make build

Core package builds without Node.js; the UI is included when built (see Publishing guide).

๐Ÿ“‹ JSON Schema Compliance

PyCharter is fully compliant with JSON Schema Draft 2020-12 standard:

  • All schemas are validated against the standard before processing
  • Full support for all standard keywords (minLength, maxLength, pattern, enum, minimum, maximum, etc.)
  • Optional extensions (coercion and validations) work alongside standard keywords
  • Strict mode available to enforce standard-only schemas

๐Ÿ”— Requirements

  • Python 3.11+
  • Pydantic >= 2.0.0
  • jsonschema >= 4.0.0 (optional, for enhanced validation)

See pyproject.toml for full dependencies and optional extras (api, ui, dev, etl, etc.).

๐Ÿค Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add some amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

๐Ÿ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

๐Ÿค Contributing

See CONTRIBUTING.md. Report security issues per SECURITY.md. Community expectations: CODE_OF_CONDUCT.md.

๐Ÿ”— Links


Made with โค๏ธ for the Python community

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pycharter-0.0.57.tar.gz (2.6 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pycharter-0.0.57-py3-none-any.whl (2.9 MB view details)

Uploaded Python 3

File details

Details for the file pycharter-0.0.57.tar.gz.

File metadata

  • Download URL: pycharter-0.0.57.tar.gz
  • Upload date:
  • Size: 2.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for pycharter-0.0.57.tar.gz
Algorithm Hash digest
SHA256 84c2826486cfcd4e2e25df91251933f9888591fe80f8bdad3d4e48fa16d673c0
MD5 a9eb5a4adeb1a01a4c9f604321c4d1dc
BLAKE2b-256 e91154914c99fbba5741b389e74f90801eb7b488d76bd16045eb965b2efd2b23

See more details on using hashes here.

Provenance

The following attestation bundles were made for pycharter-0.0.57.tar.gz:

Publisher: publish.yml on optophi/pycharter

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pycharter-0.0.57-py3-none-any.whl.

File metadata

  • Download URL: pycharter-0.0.57-py3-none-any.whl
  • Upload date:
  • Size: 2.9 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for pycharter-0.0.57-py3-none-any.whl
Algorithm Hash digest
SHA256 07cb484d021f50aad559754111ead9a3f4faaae8f0ba3986e57bbaf3d9765f17
MD5 eaefd441143d668a3ff641fe5fc982c9
BLAKE2b-256 82fcd11dc07b37f98a7a1724b5acc9f65f8839df0aef47bcef86c96baef81c9e

See more details on using hashes here.

Provenance

The following attestation bundles were made for pycharter-0.0.57-py3-none-any.whl:

Publisher: publish.yml on optophi/pycharter

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page