A Python package for data contract management with five core services: contract parsing, metadata storage, Pydantic generation, JSON Schema conversion, and runtime validation
Project description
PyCharter
Data contract management and validation for Python: define schemas, enforce quality, and run pipelines with contracts.
โก Quick start (2 minutes)
Install, define a tiny contract, and validate one record. Copy-paste into a new terminal:
pip install pycharter
from pycharter import from_dict, validate
# 1. Define a minimal contract (schema)
schema = {
"type": "object",
"version": "1.0.0",
"properties": {
"name": {"type": "string"},
"age": {"type": "integer"},
},
"required": ["name", "age"],
}
# 2. Build a validator and validate one record
Person = from_dict(schema, "Person")
result = validate(Person, {"name": "Alice", "age": 30})
print(result.is_valid) # True
print(result.data.name) # Alice
Invalid data returns result.is_valid == False and result.errors with details. Next: use a contract file with Validator.from_file("contract.yaml"), or add coercion/validation rules and store contracts in a contract store (see Concepts and Installation).
What is PyCharter?
PyCharter is a data contract-as-code platform that brings structure, validation, and governance to data pipelines and applications. At its core, PyCharter automatically converts JSON schemas into fully-functional Pydantic models, fully supporting the JSON Schema Draft 2020-12 standard โ including all standard validation keywords (minLength, maxLength, pattern, enum, minimum, maximum, etc.) โ while also providing extensions for pre-validation coercion and post-validation checks. All validation logic is stored as data (not Python code). PyCharter also includes a comprehensive quality assurance module for monitoring data quality, tracking violations, and generating quality reports.
It provides:
- Data Contract Definition & Management: Define formal agreements that specify data structure, quality rules, and governance policies
- Schema Registry: Centralized storage and versioning of data schemas with support for schema evolution
- Data Quality Enforcement: Coercion rules (data transformation) and validation rules (business constraints) to ensure data integrity
- Data Governance: Track ownership, stewardship, and enforce data policies across your organization
- Metadata Management: Store and retrieve data about data (schemas, ownership, rules, lineage)
- Runtime Validation: Validate data against contracts in production pipelines, APIs, and data processes
Data Terminology
PyCharter implements key data management concepts:
- Data Contract: Formal specification of data structure, quality rules, and governance policies that define the "contract" between data producers and consumers
- Schema Registry: Centralized repository for storing and versioning data schemas, enabling schema discovery and evolution tracking
- Data Quality: Coercion (pre-validation transformation) and validation (post-validation checks) rules that ensure data integrity
- Data Governance: Ownership tracking, stewardship management, and policy enforcement for data assets
- Schema Evolution: Versioning and migration capabilities that allow data structures to evolve over time while maintaining backward compatibility
- Metadata Management: Storage and retrieval of data about data, including schemas, ownership information, governance rules, and lineage
Use Cases
- Data Pipeline Validation: Ensure data conforms to contracts before processing in pipelines
- API Contract Enforcement: Validate API request/response data against defined contracts
- Data Integration: Standardize data formats across systems and services
- Compliance & Governance: Track ownership, enforce data policies, and maintain audit trails
- Schema Registry: Centralized schema management for microservices and data platforms
- Data Quality Assurance: Catch data quality issues early in the pipeline through automated validation
๐ Concepts
A short mental model so you know what to reach for.
| Concept | What it is | When you use it |
|---|---|---|
| Schema | The shape of the data (JSON Schema): types, required fields, nested objects. | When you only need structure (e.g. โthis field is string, that one is integerโ). |
| Data contract | Schema + coercion rules (e.g. string โ int) + validation rules (e.g. min/max, allowed values) + optional metadata (ownership, governance). | When you want one artifact that defines structure, transforms, and business rules. |
| Metadata store | A database (SQLite, PostgreSQL, etc.) that stores contracts (and their versions) so many apps can reuse them. | When you have multiple services or pipelines and want a single source of truth. |
| ETL validation | Validating data after extract (source) and before load (target) in a pipeline, using a schema or contract. | When you run ETL and want to reject or quarantine bad rows at stage boundaries. |
Flow from โjust validateโ to โcontracts in a store and ETLโ:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Option A: No database โ
โ Schema/contract in code or YAML file โ Validator โ validate(data) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Option B: With contract store โ
โ Contract in DB โ Validator(store=..., schema_id=...) โ validate() โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Option C: ETL pipeline โ
โ Extract โ [optional: validate with contract] โ Transform โ โ
โ [optional: validate with contract] โ Load โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Start with Option A (Quick start above); add contract store when you need versioned, shared contracts; add ETL validation when you run pipelines and want contract checks at extract/load.
โจ Features
- ๐ Dynamic Model Generation - Convert JSON schemas to Pydantic models at runtime
- ๐ JSON Schema Compliant - Full support for JSON Schema Draft 2020-12 standard
- ๐ Type Coercion - Automatic type conversion before validation (e.g., string โ integer)
- โ Custom Validators - Built-in and extensible validation rules
- ๐๏ธ Nested Structures - Full support for nested objects and arrays
- ๐ฆ Multiple Input Formats - Load schemas from dicts, JSON strings, files, or URLs
- ๐ฏ Type Safe - Full type hints and Pydantic v2 compatibility
- ๐ง Extensible - Register custom coercion and validation functions
- ๐ Data-Driven - All validation logic stored as JSON data, not Python code
- ๐ Data Contract Management - Complete lifecycle management for data contracts with versioning
- ๐พ Schema Registry - Centralized schema storage with support for PostgreSQL, MongoDB, Redis, and InMemory
- ๐๏ธ Data Governance - Track ownership, stewardship, and enforce governance policies
- ๐ Metadata Management - Store and query metadata about your data assets
- ๐ Schema Evolution - Version schemas and track changes over time
- โ Quality Assurance - Monitor data quality, calculate metrics, track violations, and generate reports
- ๐ Quality Metrics - Calculate quality scores, accuracy, completeness, and violation rates
- ๐จ Threshold Alerting - Set quality thresholds and get alerts when quality degrades
- ๐ Ontology & Semantic Layer - Concept schemes, concept types, relationships, field bindings, and a diagrammatic ontology workspace in the UI
- ๐ Pipeline Diagram Editor - Figma Jamโstyle visual editor for ETL pipelines (extract/transform/load nodes with labels and annotations) in the Pipelines section of the UI
๐ฆ Installation
Core Library
pip install pycharter
With API Support
pip install pycharter[api]
This installs FastAPI and Uvicorn for running the REST API server.
Full install (all extras)
pip install pycharter[all]
Installs api, worker, ui, pipeline, postgres, docs, ontology, lineage, streaming, and messaging (Kafka + RabbitMQ). Use this when you want the whole toolkit; for smaller installs, pick individual extras below.
With UI Support
pip install pycharter[ui]
This installs the Python dependencies and pre-built UI static files (like Airflow).
After installation, you can immediately start the UI:
pycharter ui serve # Production mode (uses pre-built static files)
For development (if you have the source code):
cd src/pycharter/ui
npm install # Install Node.js dependencies
cd ../../../ # back to repository root (src/pycharter/ui โ repo root)
pycharter ui dev # Development mode with hot reload
Note: When installed from pip, the UI works immediately without Node.js. For development with hot reload, Node.js is required; see the UI README for setup.
Database setup (for contract store, API, and seed data)
If you use the contract store (SQLite or PostgreSQL), the REST API, or seed data, initialize the database once:
# Initialize database schema (SQLite default: sqlite:///pycharter.db)
pycharter db init
# Seed reference data (owners, domains, systems, environments, data feeds, compliance frameworks, tags)
pycharter db seed
Default seed directory (when you omit the path) is the bundled data/seed inside the pycharter package (src/pycharter/data/seed in the repo). It loads reference data, contracts, pipelines, and semantic seed YAML in one step. To use a custom seed directory: pycharter db seed /path/to/seed [database_url].
The data/aviation_examples/ tree at the repository root holds sample contract artifacts (files on disk) for tutorials and demos. It is not loaded automatically by pycharter db seed; import those contracts through the Web UI or the REST API as needed. See Seed data and sample files.
Use pycharter db init --force to drop and recreate all tables (SQLite only; destroys existing data). See Configuration Guide for connection options and migrations.
๐ Quick Start
Quick Start: ETL Pipelines
Build and run ETL pipelines programmatically (with the | operator) or from YAML configs. Pipeline run() is async; use asyncio.run() from scripts or await in async code.
import asyncio
from pycharter import Pipeline, HTTPExtractor, PostgresLoader, Rename, AddField
# Programmatic pipeline
pipeline = (
Pipeline(HTTPExtractor(url="https://api.example.com/data"))
| Rename({"old": "new"})
| AddField("processed_at", "now()")
| PostgresLoader(connection_string="...", table="users")
)
result = asyncio.run(pipeline.run())
# Config-driven: explicit files
pipeline = Pipeline.from_config_files(
extract="configs/extract.yaml",
load="configs/load.yaml",
variables={"API_KEY": "secret"}
)
# Config-driven: directory (extract.yaml, transform.yaml, load.yaml)
pipeline = Pipeline.from_config_dir("pipelines/users/")
# Config-driven: single file
pipeline = Pipeline.from_config_file("pipelines/users/pipeline.yaml")
result = asyncio.run(pipeline.run())
See ETL Pipelines under Core Services for error handling (error_context, ErrorMode) and variable resolution (PipelineContext(variables={...})).
Quick Start: Convenience Functions (One-off Use)
from pycharter import from_dict, validate
# Define your JSON schema
schema = {
"type": "object",
"version": "1.0.0",
"properties": {
"name": {"type": "string"},
"age": {"type": "integer"},
"email": {"type": "string"}
},
"required": ["name", "age"]
}
# Generate a Pydantic model (convenience function)
Person = from_dict(schema, "Person")
# Validate data
result = validate(Person, {"name": "Alice", "age": 30, "email": "alice@example.com"})
if result.is_valid:
print(f"Valid: {result.data.name}") # Output: Valid: Alice
Production Use: Validator Class (Recommended)
For production code with multiple validations, use the Validator class for better performance. Create validators via factory methods or from a contract store:
from pycharter import Validator
# From directory (expects schema.yaml, coercion_rules.yaml, validation_rules.yaml)
validator = Validator.from_dir("data/contracts/user")
# From explicit files (any filenames)
validator = Validator.from_files(
schema="schemas/user.yaml",
coercion_rules="rules/coercion.yaml",
validation_rules="rules/validation.yaml"
)
# From a single contract file
validator = Validator.from_file("user_contract.yaml")
# From dictionaries
validator = Validator.from_dict(schema={...}, coercion_rules={...}, validation_rules={...})
# From contract store (with database)
validator = Validator(store=store, schema_id="user_schema_v1")
# Validate multiple records efficiently (model is cached)
result1 = validator.validate({"name": "Alice", "age": 30})
result2 = validator.validate({"name": "Bob", "age": 25})
# Batch validation
results = validator.validate_batch([data1, data2, data3])
With Metadata Store
from pycharter import Validator, SQLiteContractStore
# Connect to contract store
store = SQLiteContractStore("metadata.db")
store.connect()
# Create validator from store
validator = Validator(store=store, schema_id="user_schema_v1")
# Validate data
result = validator.validate({"name": "Alice", "age": 30})
Using ValidatorBuilder (Fluent API)
ValidatorBuilder provides a chainable interface for configuring validators with quality checks, state-specific rules, and metrics tracking:
from pycharter import ValidatorBuilder
validator = (
ValidatorBuilder()
.from_dir("contracts/order")
.with_quality_checks(thresholds={"completeness": 0.95})
.with_state_rules({
"DRAFT": {"optional": ["filled_qty", "fill_price"]},
"FILLED": {"required": ["filled_qty", "fill_price"]},
})
.strict()
.build()
)
result = validator.validate({"symbol": "AAPL", "order_qty": 100})
# State-aware validation: different required fields per FSM state
result_draft = validator.validate_for_state({"symbol": "AAPL", "order_qty": 100}, "DRAFT")
result_filled = validator.validate_for_state(
{"symbol": "AAPL", "order_qty": 100, "filled_qty": 100, "fill_price": 150.0}, "FILLED"
)
validate_for_state(data, state) applies per-state overrides (fields become optional in DRAFT, required in FILLED) on top of the base schema โ useful when the same entity has different validation rules at different lifecycle stages.
Domain Lifecycle & FSM Integration
PyCharter supports linking data contracts to finite state machines (e.g. PyStator) via a lifecycle binding convention in contract metadata:
# In your contract YAML
metadata:
governance_rules:
lifecycle:
state_machine_name: order_management
machine_version: "1.0.0"
state_field: status # field in your data that holds the FSM state
entity_id_field: order_id
from pycharter import check_state_alignment, validate_lifecycle_binding, get_lifecycle_binding
# Validate that contract enum values match your FSM states exactly
fsm_states = {"PENDING", "OPEN", "FILLED", "CANCELLED"}
result = check_state_alignment(contract_dict, fsm_states, state_field="status")
# result.aligned: bool, result.missing_from_contract: set, result.missing_from_fsm: set
# Validate the lifecycle binding structure
errors = validate_lifecycle_binding(contract_dict.get("metadata", {}))
# Returns [] if valid, or a list of error strings
# Read the binding
binding = get_lifecycle_binding(contract_dict.get("metadata", {}))
# binding.state_machine_name, binding.machine_version, binding.state_field, ...
๐ API Organization
PyCharter's API is organized into three tiers to help you choose the right approach for your use case:
Tier 1: Primary Interfaces (โญ Recommended for Production)
Classes that provide the best performance and most features:
Validator- Primary validation interface (use for multiple validations); create viafrom_dir(),from_files(),from_file(),from_dict()or from storePipeline- ETL pipeline (programmatic or config-driven); create viafrom_config_files(),from_config_dir(),from_config_file()or constructorQualityCheck- Primary quality assurance interfaceContractStoreClient- Base class for contract stores
When to use: Production code, batch processing, when you need to validate multiple records or run ETL pipelines.
Tier 2: Convenience Functions (Quick Start)
Functions that make common tasks easy and discoverable:
- Input helpers:
from_dict(),from_file(),from_json(),from_url() - Output helpers:
to_dict(),to_file(),to_json() - Validation helpers:
validate_with_store(),validate_with_contract() - Contract helpers:
parse_contract_file(),build_contract()
When to use: Quick scripts, one-off validations, exploratory work, learning the library.
Tier 3: Low-Level Utilities
Functions for when you already have models or need fine-grained control:
validate()- Validate with existing Pydantic modelvalidate_batch()- Batch validate with existing modelmodel_to_schema()- Core conversion function
When to use: Advanced use cases, when you've already generated models, custom workflows.
Choosing the Right Approach
| Use Case | Recommended Approach | Example |
|---|---|---|
| Production pipeline with multiple validations | Validator class |
validator = Validator(store=store, schema_id="schema"); validator.validate(data) |
| Quick one-off validation | Convenience function | validate_with_contract("contract.yaml", data) |
| You already have a model | Low-level function | validate(UserModel, data) |
| Batch processing | Validator.validate_batch() |
validator.validate_batch([data1, data2, data3]) |
๐๏ธ Core Services & Data Production Journey
PyCharter provides eight core services that work together to support a complete data production journey, from contract specification to quality assurance. Each service plays a critical role in managing data contracts and ensuring data quality throughout your pipeline.
The Data Production Journey
The typical data production workflow follows this path:
1. Data Contract Specification
โ
2. Contract Parsing
โ
3. Metadata Storage
โ
4. Pydantic Model Generation
โ
5. Runtime Validation
โ
6. Quality Assurance & Monitoring
1. ๐ Contract Parser (pycharter.contract_parser)
Purpose: Reads and decomposes data contract files into structured metadata components.
When to Use: At the beginning of your data production journey, when you have data contract files (YAML or JSON) that need to be processed and understood.
How It Works:
- Accepts data contract files containing schema definitions, governance rules, ownership information, and metadata
- Decomposes the contract into distinct components:
schema,governance_rules,ownership, andmetadata - Returns a
ContractMetadataobject that separates concerns and makes each component accessible - Extracts and tracks versions of all components
Example:
from pycharter import parse_contract_file, ContractMetadata
# Parse a contract file (YAML or JSON)
metadata = parse_contract_file("data_contract.yaml")
# Access decomposed components
schema = metadata.schema # JSON Schema definition
governance = metadata.governance_rules # Governance policies
ownership = metadata.ownership # Owner/team information
metadata_info = metadata.metadata # Additional metadata
versions = metadata.versions # Component versions
Contribution to Journey: The contract parser is the entry point that takes raw contract specifications and prepares them for downstream processing. It ensures that contracts are properly structured and that all components (schema, governance, ownership) are separated for independent handling.
1b. ๐๏ธ Contract Builder (pycharter.contract_builder)
Purpose: Constructs consolidated data contracts from separate artifacts (schema, coercion rules, validation rules, metadata).
When to Use: When you have separate artifacts stored independently and need to combine them into a single consolidated contract for runtime validation or distribution.
How It Works:
- Takes separate artifacts (schema, coercion rules, validation rules, metadata, ownership, governance rules)
- Merges coercion and validation rules into the schema
- Tracks versions of all components
- Produces a consolidated contract suitable for runtime validation
- Can build from artifacts directly or retrieve from contract store
Example:
from pycharter import build_contract, build_contract_from_store, ContractArtifacts
# Build from separate artifacts
artifacts = ContractArtifacts(
schema={"type": "object", "version": "1.0.0", "properties": {...}},
coercion_rules={"version": "1.0.0", "rules": {"age": "coerce_to_integer"}},
validation_rules={"version": "1.0.0", "rules": {"age": {"is_positive": {...}}}},
metadata={"version": "1.0.0", "description": "User contract"},
ownership={"owner": "data-team", "team": "engineering"},
)
contract = build_contract(artifacts)
# Contract now has:
# - schema (RAW - rules NOT merged into it)
# - coercion_rules, validation_rules (separate)
# - metadata, ownership, governance_rules
# - versions tracking all components
# Or build from contract store
contract = build_contract_from_store(store, "user_schema_v1")
# Use for validation - Validator merges rules internally
from pycharter import validate_with_contract
result = validate_with_contract(contract, {"name": "Alice", "age": "30"})
Contribution to Journey: The contract builder is the consolidation layer that combines separate artifacts (stored independently in the database) into a single contract artifact. The contract contains raw schema + separate rules (not merged). The Validator class handles merging internally during validation, keeping the contract structure clear and editable.
2. ๐พ Contract Store Client (pycharter.contract_store)
Purpose: Manages persistent storage and retrieval of decomposed metadata in databases.
When to Use: After parsing contracts, when you need to store metadata components (schemas, governance rules, ownership) in a database for versioning, querying, and governance.
How It Works:
- Provides methods to store and retrieve schemas, governance rules, ownership information, and metadata
- Supports versioning and querying of stored metadata
- Multiple implementations available: PostgreSQL, MongoDB, Redis, and In-Memory (for testing)
Available Implementations:
- PostgresContractStore - For PostgreSQL databases (recommended for production)
- SQLiteContractStore - For SQLite databases (great for development and small deployments)
- MongoDBContractStore - For MongoDB databases
- RedisContractStore - For Redis databases
- InMemoryContractStore - For testing and development (no persistence)
Example:
from pycharter import SQLiteContractStore, parse_contract_file
# Parse contract
metadata = parse_contract_file("contract.yaml")
# Use SQLite contract store (or PostgresContractStore, MongoDBContractStore, RedisContractStore, etc.)
store = SQLiteContractStore("metadata.db")
store.connect()
# Store decomposed components
schema_id = store.store_schema("user_schema", metadata.schema, version="1.0")
# Merge ownership and governance into metadata before storing
# Ownership and governance are part of metadata, not separate entities
metadata_dict = metadata.metadata.copy() if metadata.metadata else {}
if metadata.ownership:
metadata_dict["business_owners"] = [metadata.ownership.get("owner", "unknown")] if metadata.ownership.get("owner") else []
if metadata.governance_rules:
metadata_dict["governance_rules"] = metadata.governance_rules
# Store metadata once with all information (ownership and governance included)
store.store_metadata(resource_id=schema_id, resource_type="schema", metadata=metadata_dict)
# Store coercion and validation rules
store.store_coercion_rules(schema_id, {"age": "coerce_to_integer"}, version="1.0")
store.store_validation_rules(schema_id, {"age": {"is_positive": {}}}, version="1.0")
# Retrieve later
stored_schema = store.get_schema(schema_id)
coercion_rules = store.get_coercion_rules(schema_id)
validation_rules = store.get_validation_rules(schema_id)
Contribution to Journey: The contract store is the persistence layer that ensures contracts and their components are versioned, searchable, and accessible across your organization. It enables governance, audit trails, and schema evolution tracking.
See Configuration Guide for database setup and initialization instructions.
3. ๐ญ Pydantic Generator (pycharter.pydantic_generator)
Purpose: Dynamically generates fully-functional Pydantic models from JSON Schema definitions.
When to Use: After storing schemas (or directly from parsed contracts), when you need to generate Python models for type-safe data validation and processing.
How It Works:
- Takes JSON Schema definitions (from contracts or contract store)
- Programmatically generates Pydantic model classes at runtime
- Supports all JSON Schema Draft 2020-12 features plus custom coercions and validations
- Can generate models from dictionaries, JSON strings, files, or URLs
- Optionally generates Python files with model definitions
Example:
from pycharter import from_dict, generate_model_file, ContractStoreClient
# Option 1: Generate from parsed contract
metadata = parse_contract_file("contract.yaml")
UserModel = from_dict(metadata.schema, "User")
# Option 2: Generate from stored schema
client = ContractStoreClient(...)
schema = client.get_schema("user_schema_v1")
UserModel = from_dict(schema, "User")
# Option 3: Generate and save to file
generate_model_file(schema, "user_model.py", "User")
Contribution to Journey: The Pydantic generator is the transformation engine that converts declarative JSON Schema definitions into executable Python models. It bridges the gap between contract specifications (data) and runtime validation (code), enabling type-safe data processing.
4. ๐ JSON Schema Converter (pycharter.json_schema_converter)
Purpose: Converts existing Pydantic models back into JSON Schema format (reverse conversion).
When to Use: When you have existing Pydantic models and need to generate JSON Schema definitions, or when you want to round-trip between schemas and models.
How It Works:
- Takes Pydantic model classes as input
- Generates JSON Schema dictionaries that represent the model structure
- Preserves validation rules, types, and constraints
- Can output to dictionaries, JSON strings, or files
Example:
from pycharter import to_dict, to_file, to_json
from pydantic import BaseModel
class Product(BaseModel):
name: str
price: float
in_stock: bool = True
# Convert to JSON Schema
schema = to_dict(Product)
json_string = to_json(Product)
to_file(Product, "product_schema.json")
# Now you can use the schema with other services
ProductModel = from_dict(schema, "Product") # Round-trip
Contribution to Journey: The JSON Schema converter enables bidirectional conversion between models and schemas. It's useful for:
- Generating schemas from existing code
- Round-trip validation (schema โ model โ schema)
- Integrating with systems that require JSON Schema format
- Documenting existing models as schemas
5. โ
Runtime Validator (pycharter.runtime_validator)
Purpose: Lightweight validation utility for validating data against generated Pydantic models in production data pipelines.
When to Use: In your data processing scripts, ETL pipelines, API endpoints, or any place where you need to validate incoming data against contract specifications.
API Organization:
PyCharter provides validation through three tiers:
-
Tier 1: Validator Class (โญ PRIMARY INTERFACE - Recommended for production)
- Best performance for multiple validations (model is cached)
- Supports all data sources (contract files, directories, stores, dictionaries)
- Reusable instance for batch processing
-
Tier 2: Convenience Functions (Quick start - one-off validations)
validate_with_store()- Quick validation with contract storevalidate_with_contract()- Quick validation with contract file/dictget_model_from_store()/get_model_from_contract()- Get model for reuse
-
Tier 3: Low-Level Functions (When you already have a model)
validate()- Validate single record with existing modelvalidate_batch()- Batch validate with existing model
How It Works:
- Takes a Pydantic model (generated from a schema) and raw data
- Validates data against the model's constraints
- Returns a
ValidationResultwith validation status, validated data, and errors - Supports single record and batch validation
- Can be used in strict mode (raises exceptions) or lenient mode (returns results)
Example - Validator Class (Recommended):
from pycharter import Validator, SQLiteContractStore
# Option 1: From directory (schema.yaml, coercion_rules.yaml, validation_rules.yaml)
validator = Validator.from_dir("data/contracts/user")
result = validator.validate({"name": "Alice", "age": 30})
# Option 2: From explicit files
validator = Validator.from_files(schema="schemas/user.yaml", coercion_rules="rules/coercion.yaml")
result = validator.validate({"name": "Alice", "age": 30})
# Option 3: From single contract file
validator = Validator.from_file("user_contract.yaml")
result = validator.validate({"name": "Alice", "age": 30})
# Option 4: From contract store (with database)
store = SQLiteContractStore("metadata.db")
store.connect()
validator = Validator(store=store, schema_id="user_schema_v1")
result = validator.validate({"name": "Alice", "age": 30})
# Batch validation (efficient - model cached)
results = validator.validate_batch([data1, data2, data3])
Example - Convenience Functions (Quick Start):
from pycharter import validate_with_store, validate_with_contract, SQLiteContractStore
# Quick validation with store
store = SQLiteContractStore("metadata.db")
store.connect()
result = validate_with_store(store, "user_schema_v1", {"name": "Alice", "age": 30})
# Quick validation with contract file (no database)
result = validate_with_contract("user_contract.yaml", {"name": "Alice", "age": 30})
Example - Low-Level (When You Have a Model):
from pycharter import from_dict, validate, validate_batch
# Generate model
UserModel = from_dict(schema, "User")
# Validate single record
result = validate(UserModel, {"name": "Alice", "age": 30})
# Batch validate
results = validate_batch(UserModel, [data1, data2, data3])
Performance Tips:
- โก For multiple validations: Use
Validatorclass (model is cached) - โก For one-off validations: Convenience functions are fine
- โก For batch processing: Use
Validator.validate_batch()orvalidate_batch()
Contribution to Journey: The runtime validator is the enforcement layer that ensures data quality in production. It validates actual data against contract specifications, catching violations early and preventing bad data from propagating through your systems. It supports both database-backed workflows (for production systems with contract stores) and contract-based workflows (for simpler use cases without database dependencies).
5b. ๐ Pipelines (pycharter.pipeline_generator)
Purpose: Build and run ETL pipelines programmatically (with the | operator) or from YAML configs. No assumptions about project layoutโyou specify file paths or use a directory with standard filenames.
When to Use: When you need to extract, transform, and load data from config-driven or code-defined pipelines (HTTP, files, databases, cloud storage โ transforms โ Postgres, files, cloud).
How It Works:
- Programmatic:
Pipeline(extractor) | transformer | loader; chain with|; callawait pipeline.run(). - Config-driven: Load from explicit files (
from_config_files), from a directory withextract.yaml,transform.yaml,load.yaml(from_config_dir), or from a singlepipeline.yaml(from_config_file). - Variables: Pass
PipelineContext(variables={"API_KEY": "x"})orvariables={...}in factory methods;${VAR}and${VAR:-default}in configs are resolved from these (no built-inCONTRACT_DIR). - Async:
run()is async; useasyncio.run(pipeline.run())in scripts orawait pipeline.run()in async code. - Error handling: Optional
error_contextwithErrorMode(STRICT, LENIENT, COLLECT) controls whether extraction/load failures raise or are collected inresult.errors.
Example:
import asyncio
from pycharter import Pipeline, PipelineContext, HTTPExtractor, PostgresLoader, Rename, AddField
# Programmatic
pipeline = (
Pipeline(HTTPExtractor(url="https://api.example.com/users"))
| Rename({"userName": "name"})
| AddField("processed_at", "now()")
| PostgresLoader(connection_string="...", table="users")
)
result = asyncio.run(pipeline.run())
# Config-driven (explicit files)
pipeline = Pipeline.from_config_files(
extract="configs/extract.yaml",
load="configs/load.yaml",
variables={"API_KEY": "secret"}
)
# Config-driven (directory: extract.yaml, transform.yaml, load.yaml)
pipeline = Pipeline.from_config_dir("pipelines/users/")
# Config-driven (single file)
pipeline = Pipeline.from_config_file("pipelines/users/pipeline.yaml")
result = asyncio.run(pipeline.run())
Exceptions: Pipeline and config loading use PyCharterโs exception hierarchy: PyCharterError (base), ConfigError, ConfigValidationError, ExpressionError. See Exceptions under API Reference.
See pycharter/pipeline_generator/ASYNC_AND_EXECUTION.md for async usage and error modes.
6. ๐ Quality Assurance (pycharter.quality)
Purpose: Data quality assurance pipeline that polices data according to data contracts, calculates quality metrics, tracks violations, and generates quality reports.
When to Use: When you need to:
- Monitor data quality over time
- Calculate quality scores and metrics
- Track and manage data quality violations
- Set quality thresholds and get alerts
- Generate quality reports for governance
How It Works:
- Validates data against contracts (using Runtime Validator)
- Calculates quality metrics (accuracy, completeness, violation rates)
- Tracks violations for audit and remediation
- Checks quality thresholds and generates alerts
- Produces comprehensive quality reports
Example:
from pycharter import QualityCheck, QualityCheckOptions, QualityThresholds
# Define quality thresholds
thresholds = QualityThresholds(
min_overall_score=95.0,
max_violation_rate=0.05
)
# Run quality check
check = QualityCheck(store=store)
report = check.run(
schema_id="user_schema_v1",
data="data/users.json",
options=QualityCheckOptions(
calculate_metrics=True,
record_violations=True,
check_thresholds=True,
thresholds=thresholds
)
)
print(f"Quality Score: {report.quality_score.overall_score:.2f}/100")
print(f"Passed: {report.passed}")
Contribution to Journey: The quality assurance module is the policing layer that ensures data quality is maintained throughout the pipeline. It provides metrics, tracking, and alerting capabilities that transform PyCharter from a contract management tool into a complete data quality assurance platform.
See Quality Module README for detailed documentation.
Complete Workflow Example
Here's how all services work together in a complete data production journey:
from pycharter import (
parse_contract_file,
SQLiteContractStore,
from_dict,
Validator,
to_dict
)
# Step 1: Parse contract specification
metadata = parse_contract_file("user_contract.yaml")
# Step 2: Store metadata in database
store = SQLiteContractStore("metadata.db")
store.connect()
schema_id = store.store_schema("user", metadata.schema, version="1.0")
# Merge ownership and governance into metadata before storing
# Ownership and governance are part of metadata, not separate entities
metadata_dict = metadata.metadata.copy() if metadata.metadata else {}
if metadata.ownership:
metadata_dict["business_owners"] = [metadata.ownership.get("owner", "unknown")] if metadata.ownership.get("owner") else []
if metadata.governance_rules:
metadata_dict["governance_rules"] = metadata.governance_rules
# Store metadata once with all information (ownership and governance included)
store.store_metadata(resource_id=schema_id, resource_type="schema", metadata=metadata_dict)
# Store coercion and validation rules
store.store_coercion_rules(schema_id, {"age": "coerce_to_integer"}, version="1.0")
store.store_validation_rules(schema_id, {"age": {"is_positive": {}}}, version="1.0")
# Step 3: Generate Pydantic model from stored schema
schema = store.get_schema(schema_id)
UserModel = from_dict(schema, "User")
# Step 4: (Optional) Convert model back to schema for documentation
schema_doc = to_dict(UserModel)
# Step 5: Validate data in production pipeline
# Option A: Using Validator class (recommended for production)
validator = Validator(store=store, schema_id=schema_id)
def process_user_data(raw_data):
result = validator.validate(raw_data)
if result.is_valid:
# Process validated data
return result.data
else:
# Handle validation errors
raise ValueError(f"Invalid data: {result.errors}")
# Option B: Using convenience function (quick start)
from pycharter import validate_with_store
def process_user_data_quick(raw_data):
result = validate_with_store(store, schema_id, raw_data)
if result.is_valid:
return result.data
else:
raise ValueError(f"Invalid data: {result.errors}")
7. ๐ REST API (pycharter.api)
Purpose: Expose all PyCharter services as REST API endpoints.
When to Use: When you need to use PyCharter from non-Python applications, microservices, or want to provide a web-based interface.
How It Works:
- Provides HTTP endpoints for all core services
- Uses FastAPI for automatic OpenAPI/Swagger documentation
- Supports both store-based and contract-based operations
- Handles request/response validation with Pydantic models
- Located at the root level (
api/) as a separate application - All endpoints are async-ready for better performance
Example:
# Start the API server (uses PYCHARTER_DATABASE_URL or sqlite:///pycharter.db)
pycharter api
# With host/port
pycharter api --host 0.0.0.0 --port 8080
Endpoints (see Swagger for full list):
- Contracts:
POST /api/v1/contracts/parse,POST /api/v1/contracts/build - Metadata:
POST /api/v1/metadata/schemas,GET /api/v1/metadata/schemas/{schema_id} - Validation:
POST /api/v1/validation/validate,POST /api/v1/validation/validate-batch - Quality:
POST /api/v1/quality/check,GET /api/v1/quality/metrics - ETL:
POST /api/v1/etl/run(extract/transform/load YAML; optional pipeline quality checks) - Pipeline runs:
GET /api/v1/runs,GET /api/v1/runs/stats,GET /api/v1/pipelines - Ontology:
GET /api/v1/semantic/schemes, concepts, relationships, semantic health, and related ontology endpoints
Documentation:
- Swagger UI: http://localhost:8002/docs (default port 8002)
- ReDoc: http://localhost:8002/redoc
See src/pycharter/api/README.md for complete API documentation.
8. ๐ Web UI (pycharter.ui)
Purpose: Browser-based interface for contracts, pipelines, quality, and ontology. Served by pycharter ui serve (production) or pycharter ui dev (development with hot reload).
Main sections:
- Contracts โ Registry, diagram/form editor, and validation workspace
- Pipelines โ Overview dashboard, pipeline runs, ETL generator (YAML panels), and Pipeline diagram (Figma Jamโstyle visual editor for extract/transform/load nodes)
- Quality โ Quality metrics, thresholds, and violation tracking
- Ontology โ Concepts, concept schemes, diagrammatic workspace (ReactFlow), proposals, and semantic health
- Documentation โ In-app API playground and docs
Configure the API base URL in the UI Settings page. Authentication uses PYCHARTER_AUTH_USERS (see Configuration Guide).
Service Integration Summary
| Service | Input | Output | Journey Stage |
|---|---|---|---|
| Contract Parser | Contract files (YAML/JSON) | ContractMetadata |
Contract Specification โ Parsing |
| Contract Builder | Separate artifacts or Store | Consolidated contract | Storage โ Consolidation |
| Metadata Store | ContractMetadata |
Stored metadata (DB) | Parsing โ Storage |
| Pydantic Generator | JSON Schema | Pydantic models | Storage โ Model Generation |
| JSON Schema Converter | Pydantic models | JSON Schema | (Bidirectional) |
| Runtime Validator | Pydantic models + Data | ValidationResult |
Model Generation โ Validation |
| ETL Pipelines | Config files or code | PipelineResult |
Extract โ Transform โ Load |
| Quality Assurance | Contract + Data | QualityReport |
Validation โ Quality Monitoring |
Each service is designed to be independent yet composable, allowing you to use them individually or together as part of a complete data contract management system.
๐ Documentation
- Full documentation (Python API, tutorials, guides): https://optophi.github.io/pycharter/
- Serve docs locally:
pycharter docs serve(default: http://127.0.0.1:5002). Build static site:pycharter docs build. Requirespip install pycharter[docs]. - Configuration Guide - Database connection,
pycharter db init/upgrade/seed, migrations, and variable injection - Data Journey Guide - Data production journey: contract specification โ storage โ validation โ quality
- Database ERD - Database schema and entity relationship diagrams
- Examples & Notebooks - Tutorials and guides (ETL, contracts, validation, quality, contract store, schema conversion); optional Marimo for interactive
.pynotebooks - REST API - API endpoints and usage (install with
pip install pycharter[api])
๐ Usage Examples
Basic Usage
Using Convenience Functions (Quick Start):
from pycharter import from_dict, from_json, from_file
# From dictionary
schema = {
"type": "object",
"version": "1.0.0",
"properties": {
"title": {"type": "string"},
"published": {"type": "boolean", "default": False}
}
}
Article = from_dict(schema, "Article")
# From JSON string
schema_json = '{"type": "object", "version": "1.0.0", "properties": {"name": {"type": "string"}}}'
User = from_json(schema_json, "User")
# From file
Product = from_file("product_schema.json", "Product")
Using Validator Class (Production):
from pycharter import Validator
# From directory or single file
validator = Validator.from_dir("data/contracts/article")
# or: validator = Validator.from_file("article_contract.yaml")
result = validator.validate({"title": "My Article", "published": True})
Nested Objects
from pycharter import from_dict
schema = {
"type": "object",
"version": "1.0.0",
"properties": {
"name": {"type": "string"},
"address": {
"type": "object",
"properties": {
"street": {"type": "string"},
"city": {"type": "string"},
"zipcode": {"type": "string"}
}
}
}
}
Person = from_dict(schema, "Person")
person = Person(
name="Alice",
address={
"street": "123 Main St",
"city": "New York",
"zipcode": "10001"
}
)
print(person.address.city) # Output: New York
Arrays and Collections
from pycharter import from_dict
schema = {
"type": "object",
"version": "1.0.0",
"properties": {
"tags": {
"type": "array",
"items": {"type": "string"}
},
"items": {
"type": "array",
"items": {
"type": "object",
"properties": {
"name": {"type": "string"},
"price": {"type": "number"}
}
}
}
}
}
Cart = from_dict(schema, "Cart")
cart = Cart(
tags=["python", "pydantic"],
items=[
{"name": "Apple", "price": 1.50},
{"name": "Banana", "price": 0.75}
]
)
print(cart.items[0].name) # Output: Apple
Coercion and Validation
PyCharter supports coercion (pre-validation transformation) and validation (post-validation checks):
from pycharter import from_dict
schema = {
"type": "object",
"version": "1.0.0",
"properties": {
"flight_number": {
"type": "integer",
"coercion": "coerce_to_integer" # Convert string/float to int
},
"destination": {
"type": "string",
"coercion": "coerce_to_string",
"validations": {
"min_length": {"threshold": 3},
"max_length": {"threshold": 3},
"no_capital_characters": None,
"only_allow": {"allowed_values": ["abc", "def", "ghi"]}
}
},
"distance": {
"type": "number",
"coercion": "coerce_to_float",
"validations": {
"greater_than_or_equal_to": {"threshold": 0}
}
}
}
}
Flight = from_dict(schema, "Flight")
# Coercion happens automatically
flight = Flight(
flight_number="123", # Coerced to int: 123
destination="abc", # Passes all validations
distance="100.5" # Coerced to float: 100.5
)
๐ Standard JSON Schema Support
Charter supports all standard JSON Schema Draft 2020-12 validation keywords:
| Keyword | Type | Description | Example |
|---|---|---|---|
minLength |
string | Minimum string length | {"minLength": 3} |
maxLength |
string | Maximum string length | {"maxLength": 10} |
pattern |
string | Regular expression pattern | {"pattern": "^[a-z]+$"} |
enum |
any | Allowed values | {"enum": ["a", "b", "c"]} |
const |
any | Single allowed value | {"const": "fixed"} |
minimum |
number | Minimum value (inclusive) | {"minimum": 0} |
maximum |
number | Maximum value (inclusive) | {"maximum": 100} |
exclusiveMinimum |
number | Minimum value (exclusive) | {"exclusiveMinimum": 0} |
exclusiveMaximum |
number | Maximum value (exclusive) | {"exclusiveMaximum": 100} |
multipleOf |
number | Must be multiple of | {"multipleOf": 2} |
minItems |
array | Minimum array length | {"minItems": 1} |
maxItems |
array | Maximum array length | {"maxItems": 10} |
uniqueItems |
array | Array items must be unique | {"uniqueItems": true} |
All schemas are validated against JSON Schema standard before processing, ensuring compliance.
๐ง Built-in Coercions (Charter Extensions)
| Coercion | Description |
|---|---|
coerce_to_string |
Convert int, float, bool, datetime, dict, list to string |
coerce_to_integer |
Convert float, string (numeric), bool, datetime to int |
coerce_to_float |
Convert int, string (numeric), bool to float |
coerce_to_boolean |
Convert int, string to bool |
coerce_to_datetime |
Convert string (ISO format), timestamp to datetime |
coerce_to_date |
Convert string (date format), datetime to date (date only, no time) |
coerce_to_uuid |
Convert string to UUID |
coerce_to_lowercase |
Convert string to lowercase |
coerce_to_uppercase |
Convert string to uppercase |
coerce_to_stripped_string |
Strip leading and trailing whitespace from string |
coerce_to_list |
Convert single value to list [value] (preserves None) |
coerce_empty_to_null |
Convert empty strings/lists/dicts to None (useful for nullable fields) |
โ Built-in Validations (Charter Extensions)
| Validation | Description | Configuration |
|---|---|---|
min_length |
Minimum length for strings/arrays | {"threshold": N} |
max_length |
Maximum length for strings/arrays | {"threshold": N} |
only_allow |
Only allow specific values | {"allowed_values": [...]} |
greater_than_or_equal_to |
Numeric minimum | {"threshold": N} |
less_than_or_equal_to |
Numeric maximum | {"threshold": N} |
is_positive |
Value must be positive | {"threshold": 0} |
no_capital_characters |
No uppercase letters | null |
no_special_characters |
Only alphanumeric and spaces | null |
non_empty_string |
String must not be empty | null |
matches_regex |
String must match regex pattern | {"pattern": "..."} |
is_email |
String must be valid email address | null |
is_url |
String must be valid URL | null |
is_alphanumeric |
Only alphanumeric characters (no spaces/special) | null |
is_numeric_string |
String must be numeric (digits, optional decimal) | null |
is_unique |
All items in array must be unique | null |
Note: Charter extensions (
coercionandvalidations) are optional and can be used alongside standard JSON Schema keywords. All validation logic is stored as data in the JSON schema, making it fully data-driven.
๐จ Custom Coercions and Validations
Extend Charter with your own coercion and validation functions:
from pycharter.shared.coercions import register_coercion
from pycharter.shared.validations import register_validation
# Register custom coercion
def coerce_to_uppercase(data):
if isinstance(data, str):
return data.upper()
return data
register_coercion("coerce_to_uppercase", coerce_to_uppercase)
# Register custom validation
def must_be_positive(threshold=0):
def _validate(value, info):
if value <= threshold:
raise ValueError(f"Value must be > {threshold}")
return value
return _validate
register_validation("must_be_positive", must_be_positive)
๐ API Reference
PyCharter's API is organized into three tiers to help you choose the right approach:
Tier 1: Primary Interfaces (Classes - Best Performance)
Validator - Primary validation interface (recommended for production)
from pycharter import Validator
# Create validator via factory methods or store
validator = Validator.from_dir("data/contracts/user")
validator = Validator.from_files(schema="schema.yaml", coercion_rules="coercion.yaml")
validator = Validator.from_file("contract.yaml")
validator = Validator.from_dict(schema={...}, coercion_rules={...})
validator = Validator(store=store, schema_id="user_schema") # from contract store
# Validate data
result = validator.validate(data)
results = validator.validate_batch([data1, data2])
model = validator.get_model() # Get the generated Pydantic model
# State-aware validation (per FSM lifecycle state)
result = validator.validate_for_state(data, state="DRAFT")
result = validator.validate_for_state(data, state="FILLED",
state_rules={"FILLED": {"required": ["filled_qty"]}})
ValidatorBuilder - Fluent API for building validators
from pycharter import ValidatorBuilder
validator = (
ValidatorBuilder()
.from_dir("contracts/order") # or .from_file(), .from_files(), .from_dict(), .from_store()
.with_state_rules({ # optional: per-state required/optional overrides
"DRAFT": {"optional": ["filled_qty"]},
"FILLED": {"required": ["filled_qty"]},
})
.with_quality_checks(thresholds={"completeness": 0.95}) # optional: quality metrics
.strict() # optional: raise on validation failure
.build()
)
QualityCheck - Primary quality assurance interface
from pycharter import QualityCheck, QualityCheckOptions
check = QualityCheck(store=store)
report = check.run(schema_id="user_schema", data=data, options=QualityCheckOptions(...))
ContractStoreClient - Base class for contract stores
from pycharter import ContractStoreClient, SQLiteContractStore, PostgresContractStore
store = SQLiteContractStore("metadata.db")
store.connect()
Tier 2: Convenience Functions (Quick Start)
Pydantic Generator - Input type helpers
from_dict(schema: dict, model_name: str = "DynamicModel")- Create model from dictionaryfrom_json(json_string: str, model_name: str = "DynamicModel")- Create model from JSON stringfrom_file(file_path: str, model_name: str = None)- Create model from file (JSON/YAML)from_url(url: str, model_name: str = "DynamicModel")- Create model from URLgenerate_model(schema: dict, model_name: str = "DynamicModel")- Advanced: more controlgenerate_model_file(schema: dict, output_path: str, model_name: str = "DynamicModel")- Generate and save to file
JSON Schema Converter - Output type helpers
to_dict(model: Type[BaseModel], ...)- Convert model to JSON Schema dictionaryto_file(model: Type[BaseModel], file_path: str, ...)- Convert model to fileto_json(model: Type[BaseModel], ...)- Convert model to JSON stringmodel_to_schema(model: Type[BaseModel], ...)- Advanced: core conversion function
Runtime Validator - Data source helpers
validate_with_store(store, schema_id, data, ...)- Quick validation with contract storevalidate_batch_with_store(store, schema_id, data_list, ...)- Batch validation with storevalidate_with_contract(contract, data, ...)- Quick validation with contract file/dictvalidate_batch_with_contract(contract, data_list, ...)- Batch validation with contractget_model_from_store(store, schema_id, ...)- Get model from contract storeget_model_from_contract(contract, ...)- Get model from contractvalidate_input(contract, ...)- Decorator for function input validationvalidate_output(contract, ...)- Decorator for function output validationvalidate_with_contract_decorator(contract, ...)- Decorator for contract-based validation
Contract Management
parse_contract(contract_dict: dict)- Parse contract dictionaryparse_contract_file(file_path: str)- Parse contract file (YAML/JSON)build_contract(artifacts: ContractArtifacts)- Build contract from artifactsbuild_contract_from_store(store, schema_id, ...)- Build contract from contract store
Tier 3: Low-Level Utilities (When You Have Models)
validate(model: Type[BaseModel], data: dict, strict: bool = False)- Validate single recordvalidate_batch(model: Type[BaseModel], data_list: List[dict], strict: bool = False)- Batch validateValidationResult- Result class withis_valid,data, anderrorsattributes
Metadata Store Implementations
InMemoryContractStore()- In-memory store (testing/development)SQLiteContractStore(database_path: str)- SQLite databasePostgresContractStore(connection_string: str)- PostgreSQL databaseMongoDBContractStore(connection_string: str)- MongoDB databaseRedisContractStore(connection_string: str)- Redis database
Domain Lifecycle (FSM Integration)
from pycharter import (
check_state_alignment,
validate_lifecycle_binding,
get_lifecycle_binding,
get_domain_entity_info,
DEFAULT_STATE_FIELD,
)
# Check that contract enum values match FSM states
result = check_state_alignment(contract_dict, fsm_states={"PENDING","OPEN","FILLED"},
state_field="status")
# result.aligned, result.missing_from_contract, result.missing_from_fsm
# Validate lifecycle binding structure in contract metadata
errors = validate_lifecycle_binding(metadata_dict)
# Read binding (state_machine_name, machine_version, state_field, entity_id_field)
binding = get_lifecycle_binding(metadata_dict)
DEFAULT_STATE_FIELD # "status" โ the conventional field name for FSM state
Exceptions
PyCharter uses a small exception hierarchy for config and pipeline errors. Catch PyCharterError to handle any PyCharter failure:
PyCharterError- Base for all PyCharter exceptionsConfigError- Config loading/parsing failures (missing file, invalid YAML)ConfigValidationError- Schema validation failures (e.g. missing requiredtypefield)ConfigLoadError- Config file load errorsExpressionError- Expression evaluation failures (e.g. invalid syntax in AddField)
Pipeline run(error_context=...) supports ErrorMode: STRICT (raise on failure), LENIENT (log and continue), COLLECT (append to result.errors). Import from pycharter.shared.errors.
๐ฏ Design Principles & Requirements
Charter is designed to meet the following core requirements:
โ JSON Schema Standard Compliance
All schemas must abide by conventional JSON Schema syntax and qualify as valid JSON Schema:
- Validation: All schemas are validated against JSON Schema Draft 2020-12 standard before processing
- Standard Keywords: Full support for all standard validation keywords (minLength, pattern, enum, minimum, maximum, etc.)
- Compliance: Uses
jsonschemalibrary for validation with graceful fallback
โ Data-Driven Validation Logic
All schema information and complex field validation logic is stored as data, not Python code:
- Coercion: Referenced by name (string) in JSON:
"coercion": "coerce_to_integer" - Validations: Referenced by name with configuration (dict) in JSON:
"validations": {"min_length": {"threshold": 3}} - No Code Required: Validation rules are defined entirely in JSON schema files
- Example:
{"coercion": "coerce_to_string", "validations": {"min_length": {"threshold": 3}}}
โ Dynamic Pydantic Model Generation
Models are created dynamically at runtime from JSON schemas:
- Runtime Generation: Uses
pydantic.create_model()to generate models on-the-fly - Dynamic Validators: Field validators are dynamically attached using
field_validatordecorators - Multiple Sources: Models can be created from dicts, JSON strings, files, or URLs
- No Static Code: All models are generated from data, not pre-defined classes
โ Nested Schema Support
Full support for nested object schemas and complex structures:
- Recursive Processing: Nested objects are recursively processed into their own Pydantic models
- Arrays of Objects: Arrays containing nested objects are fully supported
- Deep Nesting: Deeply nested structures work correctly with full type safety
- Type Safety: Each nested object becomes its own typed Pydantic model
โ Extension Fields
Custom fields can be added to JSON Schema to extend functionality:
coercion: Pre-validation type conversion (e.g., string โ integer)validations: Post-validation custom rules- Optional: Extensions work alongside standard JSON Schema keywords
- Separated: Extensions are clearly distinguished from standard JSON Schema
โ Complex Field Validation
Support for both standard and custom field validators:
- Standard Validators: minLength, pattern, enum, minimum, maximum, etc. (JSON Schema standard)
- Custom Validators: Extensible validation rules via
validationsfield - Validation Order: Coercion โ Standard Validation โ Pydantic Validation โ Custom Validations
- Factory Pattern: Validators are factory functions that return validation functions
๐ Development Setup
Quick Setup
# Run setup script
./scripts/setup.sh
# Activate environment
source venv/bin/activate
# Run tests
pytest
Using Make
make install-dev # Install package and dev dependencies
make test # Run tests
make format # Format with Ruff (ruff format + ruff check --fix)
make lint # Check formatting and lint with Ruff (no writes)
make type-check # Run mypy on src/pycharter
make check # Run format, lint, type-check, and test
Building the package: Run make clean && make build for a reliable build (clears stale egg-info; see Publishing).
๐งช Testing
# Run all tests
pytest
# Run with coverage
pytest --cov=pycharter --cov-report=html
# Run specific test file
pytest tests/test_converter.py
# Run tests matching a pattern
pytest -k "coercion"
๐ฆ Publishing to PyPI
Automatic publishing via GitHub Releases (Trusted Publishing - no tokens needed!):
# 1. Update version in pyproject.toml
# version = "0.0.21"
# 2. Commit and push
git add pyproject.toml
git commit -m "Bump version to 0.0.21"
git push
# 3. Create GitHub Release (automatically publishes to PyPI)
gh release create v0.0.21 --title "v0.0.21" --notes "Release notes"
The workflow automatically:
- โ Builds UI
- โ Builds Python package
- โ Publishes to PyPI (using Trusted Publishing)
Local build (reliable): clean first to avoid stale build artifacts, then build:
make clean && make build
Core package builds without Node.js; the UI is included when built (see Publishing guide).
๐ JSON Schema Compliance
PyCharter is fully compliant with JSON Schema Draft 2020-12 standard:
- All schemas are validated against the standard before processing
- Full support for all standard keywords (minLength, maxLength, pattern, enum, minimum, maximum, etc.)
- Optional extensions (
coercionandvalidations) work alongside standard keywords - Strict mode available to enforce standard-only schemas
๐ Requirements
- Python 3.11+
- Pydantic >= 2.0.0
- jsonschema >= 4.0.0 (optional, for enhanced validation)
See pyproject.toml for full dependencies and optional extras (api, ui, dev, etl, etc.).
๐ค Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add some amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
๐ License
This project is licensed under the MIT License - see the LICENSE file for details.
๐ค Contributing
See CONTRIBUTING.md. Report security issues per SECURITY.md. Community expectations: CODE_OF_CONDUCT.md.
๐ Links
- Homepage: https://github.com/optophi/pycharter
- Repository: GitHub
- Issues: GitHub Issues
- Documentation: Configuration & guides ยท API docs
Made with โค๏ธ for the Python community
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pycharter-0.0.57.tar.gz.
File metadata
- Download URL: pycharter-0.0.57.tar.gz
- Upload date:
- Size: 2.6 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
84c2826486cfcd4e2e25df91251933f9888591fe80f8bdad3d4e48fa16d673c0
|
|
| MD5 |
a9eb5a4adeb1a01a4c9f604321c4d1dc
|
|
| BLAKE2b-256 |
e91154914c99fbba5741b389e74f90801eb7b488d76bd16045eb965b2efd2b23
|
Provenance
The following attestation bundles were made for pycharter-0.0.57.tar.gz:
Publisher:
publish.yml on optophi/pycharter
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pycharter-0.0.57.tar.gz -
Subject digest:
84c2826486cfcd4e2e25df91251933f9888591fe80f8bdad3d4e48fa16d673c0 - Sigstore transparency entry: 1340108766
- Sigstore integration time:
-
Permalink:
optophi/pycharter@88b7d003dda8b96b88802403d9ef9bd0f35bf884 -
Branch / Tag:
refs/tags/v0.0.57 - Owner: https://github.com/optophi
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@88b7d003dda8b96b88802403d9ef9bd0f35bf884 -
Trigger Event:
push
-
Statement type:
File details
Details for the file pycharter-0.0.57-py3-none-any.whl.
File metadata
- Download URL: pycharter-0.0.57-py3-none-any.whl
- Upload date:
- Size: 2.9 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
07cb484d021f50aad559754111ead9a3f4faaae8f0ba3986e57bbaf3d9765f17
|
|
| MD5 |
eaefd441143d668a3ff641fe5fc982c9
|
|
| BLAKE2b-256 |
82fcd11dc07b37f98a7a1724b5acc9f65f8839df0aef47bcef86c96baef81c9e
|
Provenance
The following attestation bundles were made for pycharter-0.0.57-py3-none-any.whl:
Publisher:
publish.yml on optophi/pycharter
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pycharter-0.0.57-py3-none-any.whl -
Subject digest:
07cb484d021f50aad559754111ead9a3f4faaae8f0ba3986e57bbaf3d9765f17 - Sigstore transparency entry: 1340108770
- Sigstore integration time:
-
Permalink:
optophi/pycharter@88b7d003dda8b96b88802403d9ef9bd0f35bf884 -
Branch / Tag:
refs/tags/v0.0.57 - Owner: https://github.com/optophi
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@88b7d003dda8b96b88802403d9ef9bd0f35bf884 -
Trigger Event:
push
-
Statement type: