A Python package for data contract management with five core services: contract parsing, metadata storage, Pydantic generation, JSON Schema conversion, and runtime validation
Project description
PyCharter
Dynamically generate Pydantic models from JSON schemas with coercion and validation support
PyCharter is a Data Contract Management and Quality Assurance platform for Python that enables you to define, store, version, enforce, and monitor data contracts throughout your data pipelines. At its core, PyCharter automatically converts JSON schemas into fully-functional Pydantic models, fully supporting the JSON Schema Draft 2020-12 standard, including all standard validation keywords (minLength, maxLength, pattern, enum, minimum, maximum, etc.), while also providing extensions for pre-validation coercion and post-validation checks. It handles nested objects, arrays, and custom validators, with all validation logic stored as data (not Python code). PyCharter also includes a comprehensive quality assurance module for monitoring data quality, tracking violations, and generating quality reports.
What is PyCharter?
PyCharter is a data contract-as-code platform that brings structure, validation, and governance to data pipelines and applications. It provides:
- Data Contract Definition & Management: Define formal agreements that specify data structure, quality rules, and governance policies
- Schema Registry: Centralized storage and versioning of data schemas with support for schema evolution
- Data Quality Enforcement: Coercion rules (data transformation) and validation rules (business constraints) to ensure data integrity
- Data Governance: Track ownership, stewardship, and enforce data policies across your organization
- Metadata Management: Store and retrieve data about data (schemas, ownership, rules, lineage)
- Runtime Validation: Validate data against contracts in production pipelines, APIs, and ETL processes
Data Terminology
PyCharter implements key data management concepts:
- Data Contract: Formal specification of data structure, quality rules, and governance policies that define the "contract" between data producers and consumers
- Schema Registry: Centralized repository for storing and versioning data schemas, enabling schema discovery and evolution tracking
- Data Quality: Coercion (pre-validation transformation) and validation (post-validation checks) rules that ensure data integrity
- Data Governance: Ownership tracking, stewardship management, and policy enforcement for data assets
- Schema Evolution: Versioning and migration capabilities that allow data structures to evolve over time while maintaining backward compatibility
- Metadata Management: Storage and retrieval of data about data, including schemas, ownership information, governance rules, and lineage
Use Cases
- Data Pipeline Validation: Ensure data conforms to contracts before processing in ETL pipelines
- API Contract Enforcement: Validate API request/response data against defined contracts
- Data Integration: Standardize data formats across systems and services
- Compliance & Governance: Track ownership, enforce data policies, and maintain audit trails
- Schema Registry: Centralized schema management for microservices and data platforms
- Data Quality Assurance: Catch data quality issues early in the pipeline through automated validation
โจ Features
- ๐ Dynamic Model Generation - Convert JSON schemas to Pydantic models at runtime
- ๐ JSON Schema Compliant - Full support for JSON Schema Draft 2020-12 standard
- ๐ Type Coercion - Automatic type conversion before validation (e.g., string โ integer)
- โ Custom Validators - Built-in and extensible validation rules
- ๐๏ธ Nested Structures - Full support for nested objects and arrays
- ๐ฆ Multiple Input Formats - Load schemas from dicts, JSON strings, files, or URLs
- ๐ฏ Type Safe - Full type hints and Pydantic v2 compatibility
- ๐ง Extensible - Register custom coercion and validation functions
- ๐ Data-Driven - All validation logic stored as JSON data, not Python code
- ๐ Data Contract Management - Complete lifecycle management for data contracts with versioning
- ๐พ Schema Registry - Centralized schema storage with support for PostgreSQL, MongoDB, Redis, and InMemory
- ๐๏ธ Data Governance - Track ownership, stewardship, and enforce governance policies
- ๐ Metadata Management - Store and query metadata about your data assets
- ๐ Schema Evolution - Version schemas and track changes over time
- โ Quality Assurance - Monitor data quality, calculate metrics, track violations, and generate reports
- ๐ Quality Metrics - Calculate quality scores, accuracy, completeness, and violation rates
- ๐จ Threshold Alerting - Set quality thresholds and get alerts when quality degrades
๐ฆ Installation
Core Library
pip install pycharter
With API Support
pip install pycharter[api]
This installs FastAPI and Uvicorn for running the REST API server.
With UI Support
pip install pycharter[ui]
This installs the Python dependencies and pre-built UI static files (like Airflow).
After installation, you can immediately start the UI:
pycharter ui serve # Production mode (uses pre-built static files)
For development (if you have the source code):
cd ui
npm install # Install Node.js dependencies
pycharter ui dev # Development mode with hot reload
Note: When installed from pip, the UI works immediately without Node.js. For development, Node.js is required. See ui/INSTALLATION.md for detailed instructions.
๐ Quick Start
Quick Start: ETL Pipelines
Build and run ETL pipelines programmatically (with the | operator) or from YAML configs. Pipeline run() is async; use asyncio.run() from scripts or await in async code.
import asyncio
from pycharter import Pipeline, HTTPExtractor, PostgresLoader, Rename, AddField
# Programmatic pipeline
pipeline = (
Pipeline(HTTPExtractor(url="https://api.example.com/data"))
| Rename({"old": "new"})
| AddField("processed_at", "now()")
| PostgresLoader(connection_string="...", table="users")
)
result = asyncio.run(pipeline.run())
# Config-driven: explicit files
pipeline = Pipeline.from_config_files(
extract="configs/extract.yaml",
load="configs/load.yaml",
variables={"API_KEY": "secret"}
)
# Config-driven: directory (extract.yaml, transform.yaml, load.yaml)
pipeline = Pipeline.from_config_dir("pipelines/users/")
# Config-driven: single file
pipeline = Pipeline.from_config_file("pipelines/users/pipeline.yaml")
result = asyncio.run(pipeline.run())
See ETL Pipelines under Core Services for error handling (error_context, ErrorMode) and variable resolution (PipelineContext(variables={...})).
Quick Start: Convenience Functions (One-off Use)
from pycharter import from_dict, validate
# Define your JSON schema
schema = {
"type": "object",
"version": "1.0.0",
"properties": {
"name": {"type": "string"},
"age": {"type": "integer"},
"email": {"type": "string"}
},
"required": ["name", "age"]
}
# Generate a Pydantic model (convenience function)
Person = from_dict(schema, "Person")
# Validate data
result = validate(Person, {"name": "Alice", "age": 30, "email": "alice@example.com"})
if result.is_valid:
print(f"Valid: {result.data.name}") # Output: Valid: Alice
Production Use: Validator Class (Recommended)
For production code with multiple validations, use the Validator class for better performance. Create validators via factory methods or from a metadata store:
from pycharter import Validator
# From directory (expects schema.yaml, coercion_rules.yaml, validation_rules.yaml)
validator = Validator.from_dir("data/contracts/user")
# From explicit files (any filenames)
validator = Validator.from_files(
schema="schemas/user.yaml",
coercion_rules="rules/coercion.yaml",
validation_rules="rules/validation.yaml"
)
# From a single contract file
validator = Validator.from_file("user_contract.yaml")
# From dictionaries
validator = Validator.from_dict(schema={...}, coercion_rules={...}, validation_rules={...})
# From metadata store (with database)
validator = Validator(store=store, schema_id="user_schema_v1")
# Validate multiple records efficiently (model is cached)
result1 = validator.validate({"name": "Alice", "age": 30})
result2 = validator.validate({"name": "Bob", "age": 25})
# Batch validation
results = validator.validate_batch([data1, data2, data3])
With Metadata Store
from pycharter import Validator, SQLiteMetadataStore
# Connect to metadata store
store = SQLiteMetadataStore("metadata.db")
store.connect()
# Create validator from store
validator = Validator(store=store, schema_id="user_schema_v1")
# Validate data
result = validator.validate({"name": "Alice", "age": 30})
๐ API Organization
PyCharter's API is organized into three tiers to help you choose the right approach for your use case:
Tier 1: Primary Interfaces (โญ Recommended for Production)
Classes that provide the best performance and most features:
Validator- Primary validation interface (use for multiple validations); create viafrom_dir(),from_files(),from_file(),from_dict()or from storePipeline- ETL pipeline (programmatic or config-driven); create viafrom_config_files(),from_config_dir(),from_config_file()or constructorQualityCheck- Primary quality assurance interfaceMetadataStoreClient- Base class for metadata stores
When to use: Production code, batch processing, when you need to validate multiple records or run ETL pipelines.
Tier 2: Convenience Functions (Quick Start)
Functions that make common tasks easy and discoverable:
- Input helpers:
from_dict(),from_file(),from_json(),from_url() - Output helpers:
to_dict(),to_file(),to_json() - Validation helpers:
validate_with_store(),validate_with_contract() - Contract helpers:
parse_contract_file(),build_contract()
When to use: Quick scripts, one-off validations, exploratory work, learning the library.
Tier 3: Low-Level Utilities
Functions for when you already have models or need fine-grained control:
validate()- Validate with existing Pydantic modelvalidate_batch()- Batch validate with existing modelmodel_to_schema()- Core conversion function
When to use: Advanced use cases, when you've already generated models, custom workflows.
Choosing the Right Approach
| Use Case | Recommended Approach | Example |
|---|---|---|
| Production pipeline with multiple validations | Validator class |
validator = Validator(store=store, schema_id="schema"); validator.validate(data) |
| Quick one-off validation | Convenience function | validate_with_contract("contract.yaml", data) |
| You already have a model | Low-level function | validate(UserModel, data) |
| Batch processing | Validator.validate_batch() |
validator.validate_batch([data1, data2, data3]) |
๐๏ธ Core Services & Data Production Journey
PyCharter provides eight core services that work together to support a complete data production journey, from contract specification to quality assurance. Each service plays a critical role in managing data contracts and ensuring data quality throughout your pipeline.
The Data Production Journey
The typical data production workflow follows this path:
1. Data Contract Specification
โ
2. Contract Parsing
โ
3. Metadata Storage
โ
4. Pydantic Model Generation
โ
5. Runtime Validation
โ
6. Quality Assurance & Monitoring
1. ๐ Contract Parser (pycharter.contract_parser)
Purpose: Reads and decomposes data contract files into structured metadata components.
When to Use: At the beginning of your data production journey, when you have data contract files (YAML or JSON) that need to be processed and understood.
How It Works:
- Accepts data contract files containing schema definitions, governance rules, ownership information, and metadata
- Decomposes the contract into distinct components:
schema,governance_rules,ownership, andmetadata - Returns a
ContractMetadataobject that separates concerns and makes each component accessible - Extracts and tracks versions of all components
Example:
from pycharter import parse_contract_file, ContractMetadata
# Parse a contract file (YAML or JSON)
metadata = parse_contract_file("data_contract.yaml")
# Access decomposed components
schema = metadata.schema # JSON Schema definition
governance = metadata.governance_rules # Governance policies
ownership = metadata.ownership # Owner/team information
metadata_info = metadata.metadata # Additional metadata
versions = metadata.versions # Component versions
Contribution to Journey: The contract parser is the entry point that takes raw contract specifications and prepares them for downstream processing. It ensures that contracts are properly structured and that all components (schema, governance, ownership) are separated for independent handling.
1b. ๐๏ธ Contract Builder (pycharter.contract_builder)
Purpose: Constructs consolidated data contracts from separate artifacts (schema, coercion rules, validation rules, metadata).
When to Use: When you have separate artifacts stored independently and need to combine them into a single consolidated contract for runtime validation or distribution.
How It Works:
- Takes separate artifacts (schema, coercion rules, validation rules, metadata, ownership, governance rules)
- Merges coercion and validation rules into the schema
- Tracks versions of all components
- Produces a consolidated contract suitable for runtime validation
- Can build from artifacts directly or retrieve from metadata store
Example:
from pycharter import build_contract, build_contract_from_store, ContractArtifacts
# Build from separate artifacts
artifacts = ContractArtifacts(
schema={"type": "object", "version": "1.0.0", "properties": {...}},
coercion_rules={"version": "1.0.0", "rules": {"age": "coerce_to_integer"}},
validation_rules={"version": "1.0.0", "rules": {"age": {"is_positive": {...}}}},
metadata={"version": "1.0.0", "description": "User contract"},
ownership={"owner": "data-team", "team": "engineering"},
)
contract = build_contract(artifacts)
# Contract now has:
# - schema with rules merged
# - metadata, ownership, governance_rules
# - versions tracking all components
# Or build from metadata store
contract = build_contract_from_store(store, "user_schema_v1")
# Use for validation
from pycharter import validate_with_contract
result = validate_with_contract(contract, {"name": "Alice", "age": "30"})
Contribution to Journey: The contract builder is the consolidation layer that combines separate artifacts (stored independently in the database) into a single contract artifact. This consolidated contract tracks all component versions and can be used for runtime validation, distribution, or archival purposes.
2. ๐พ Metadata Store Client (pycharter.metadata_store)
Purpose: Manages persistent storage and retrieval of decomposed metadata in databases.
When to Use: After parsing contracts, when you need to store metadata components (schemas, governance rules, ownership) in a database for versioning, querying, and governance.
How It Works:
- Provides methods to store and retrieve schemas, governance rules, ownership information, and metadata
- Supports versioning and querying of stored metadata
- Multiple implementations available: PostgreSQL, MongoDB, Redis, and In-Memory (for testing)
Available Implementations:
- PostgresMetadataStore - For PostgreSQL databases (recommended for production)
- SQLiteMetadataStore - For SQLite databases (great for development and small deployments)
- MongoDBMetadataStore - For MongoDB databases
- RedisMetadataStore - For Redis databases
- InMemoryMetadataStore - For testing and development (no persistence)
Example:
from pycharter import SQLiteMetadataStore, parse_contract_file
# Parse contract
metadata = parse_contract_file("contract.yaml")
# Use SQLite metadata store (or PostgresMetadataStore, MongoDBMetadataStore, RedisMetadataStore, etc.)
store = SQLiteMetadataStore("metadata.db")
store.connect()
# Store decomposed components
schema_id = store.store_schema("user_schema", metadata.schema, version="1.0")
# Merge ownership and governance into metadata before storing
# Ownership and governance are part of metadata, not separate entities
metadata_dict = metadata.metadata.copy() if metadata.metadata else {}
if metadata.ownership:
metadata_dict["business_owners"] = [metadata.ownership.get("owner", "unknown")] if metadata.ownership.get("owner") else []
if metadata.governance_rules:
metadata_dict["governance_rules"] = metadata.governance_rules
# Store metadata once with all information (ownership and governance included)
store.store_metadata(resource_id=schema_id, resource_type="schema", metadata=metadata_dict)
# Store coercion and validation rules
store.store_coercion_rules(schema_id, {"age": "coerce_to_integer"}, version="1.0")
store.store_validation_rules(schema_id, {"age": {"is_positive": {}}}, version="1.0")
# Retrieve later
stored_schema = store.get_schema(schema_id)
coercion_rules = store.get_coercion_rules(schema_id)
validation_rules = store.get_validation_rules(schema_id)
Contribution to Journey: The metadata store is the persistence layer that ensures contracts and their components are versioned, searchable, and accessible across your organization. It enables governance, audit trails, and schema evolution tracking.
See Configuration Guide for database setup and initialization instructions.
3. ๐ญ Pydantic Generator (pycharter.pydantic_generator)
Purpose: Dynamically generates fully-functional Pydantic models from JSON Schema definitions.
When to Use: After storing schemas (or directly from parsed contracts), when you need to generate Python models for type-safe data validation and processing.
How It Works:
- Takes JSON Schema definitions (from contracts or metadata store)
- Programmatically generates Pydantic model classes at runtime
- Supports all JSON Schema Draft 2020-12 features plus custom coercions and validations
- Can generate models from dictionaries, JSON strings, files, or URLs
- Optionally generates Python files with model definitions
Example:
from pycharter import from_dict, generate_model_file, MetadataStoreClient
# Option 1: Generate from parsed contract
metadata = parse_contract_file("contract.yaml")
UserModel = from_dict(metadata.schema, "User")
# Option 2: Generate from stored schema
client = MetadataStoreClient(...)
schema = client.get_schema("user_schema_v1")
UserModel = from_dict(schema, "User")
# Option 3: Generate and save to file
generate_model_file(schema, "user_model.py", "User")
Contribution to Journey: The Pydantic generator is the transformation engine that converts declarative JSON Schema definitions into executable Python models. It bridges the gap between contract specifications (data) and runtime validation (code), enabling type-safe data processing.
4. ๐ JSON Schema Converter (pycharter.json_schema_converter)
Purpose: Converts existing Pydantic models back into JSON Schema format (reverse conversion).
When to Use: When you have existing Pydantic models and need to generate JSON Schema definitions, or when you want to round-trip between schemas and models.
How It Works:
- Takes Pydantic model classes as input
- Generates JSON Schema dictionaries that represent the model structure
- Preserves validation rules, types, and constraints
- Can output to dictionaries, JSON strings, or files
Example:
from pycharter import to_dict, to_file, to_json
from pydantic import BaseModel
class Product(BaseModel):
name: str
price: float
in_stock: bool = True
# Convert to JSON Schema
schema = to_dict(Product)
json_string = to_json(Product)
to_file(Product, "product_schema.json")
# Now you can use the schema with other services
ProductModel = from_dict(schema, "Product") # Round-trip
Contribution to Journey: The JSON Schema converter enables bidirectional conversion between models and schemas. It's useful for:
- Generating schemas from existing code
- Round-trip validation (schema โ model โ schema)
- Integrating with systems that require JSON Schema format
- Documenting existing models as schemas
5. โ
Runtime Validator (pycharter.runtime_validator)
Purpose: Lightweight validation utility for validating data against generated Pydantic models in production data pipelines.
When to Use: In your data processing scripts, ETL pipelines, API endpoints, or any place where you need to validate incoming data against contract specifications.
API Organization:
PyCharter provides validation through three tiers:
-
Tier 1: Validator Class (โญ PRIMARY INTERFACE - Recommended for production)
- Best performance for multiple validations (model is cached)
- Supports all data sources (contract files, directories, stores, dictionaries)
- Reusable instance for batch processing
-
Tier 2: Convenience Functions (Quick start - one-off validations)
validate_with_store()- Quick validation with metadata storevalidate_with_contract()- Quick validation with contract file/dictget_model_from_store()/get_model_from_contract()- Get model for reuse
-
Tier 3: Low-Level Functions (When you already have a model)
validate()- Validate single record with existing modelvalidate_batch()- Batch validate with existing model
How It Works:
- Takes a Pydantic model (generated from a schema) and raw data
- Validates data against the model's constraints
- Returns a
ValidationResultwith validation status, validated data, and errors - Supports single record and batch validation
- Can be used in strict mode (raises exceptions) or lenient mode (returns results)
Example - Validator Class (Recommended):
from pycharter import Validator, SQLiteMetadataStore
# Option 1: From directory (schema.yaml, coercion_rules.yaml, validation_rules.yaml)
validator = Validator.from_dir("data/contracts/user")
result = validator.validate({"name": "Alice", "age": 30})
# Option 2: From explicit files
validator = Validator.from_files(schema="schemas/user.yaml", coercion_rules="rules/coercion.yaml")
result = validator.validate({"name": "Alice", "age": 30})
# Option 3: From single contract file
validator = Validator.from_file("user_contract.yaml")
result = validator.validate({"name": "Alice", "age": 30})
# Option 4: From metadata store (with database)
store = SQLiteMetadataStore("metadata.db")
store.connect()
validator = Validator(store=store, schema_id="user_schema_v1")
result = validator.validate({"name": "Alice", "age": 30})
# Batch validation (efficient - model cached)
results = validator.validate_batch([data1, data2, data3])
Example - Convenience Functions (Quick Start):
from pycharter import validate_with_store, validate_with_contract, SQLiteMetadataStore
# Quick validation with store
store = SQLiteMetadataStore("metadata.db")
store.connect()
result = validate_with_store(store, "user_schema_v1", {"name": "Alice", "age": 30})
# Quick validation with contract file (no database)
result = validate_with_contract("user_contract.yaml", {"name": "Alice", "age": 30})
Example - Low-Level (When You Have a Model):
from pycharter import from_dict, validate, validate_batch
# Generate model
UserModel = from_dict(schema, "User")
# Validate single record
result = validate(UserModel, {"name": "Alice", "age": 30})
# Batch validate
results = validate_batch(UserModel, [data1, data2, data3])
Performance Tips:
- โก For multiple validations: Use
Validatorclass (model is cached) - โก For one-off validations: Convenience functions are fine
- โก For batch processing: Use
Validator.validate_batch()orvalidate_batch()
Contribution to Journey: The runtime validator is the enforcement layer that ensures data quality in production. It validates actual data against contract specifications, catching violations early and preventing bad data from propagating through your systems. It supports both database-backed workflows (for production systems with metadata stores) and contract-based workflows (for simpler use cases without database dependencies).
5b. ๐ ETL Pipelines (pycharter.etl_generator)
Purpose: Build and run ETL pipelines programmatically (with the | operator) or from YAML configs. No assumptions about project layoutโyou specify file paths or use a directory with standard filenames.
When to Use: When you need to extract, transform, and load data from config-driven or code-defined pipelines (HTTP, files, databases, cloud storage โ transforms โ Postgres, files, cloud).
How It Works:
- Programmatic:
Pipeline(extractor) | transformer | loader; chain with|; callawait pipeline.run(). - Config-driven: Load from explicit files (
from_config_files), from a directory withextract.yaml,transform.yaml,load.yaml(from_config_dir), or from a singlepipeline.yaml(from_config_file). - Variables: Pass
PipelineContext(variables={"API_KEY": "x"})orvariables={...}in factory methods;${VAR}and${VAR:-default}in configs are resolved from these (no built-inCONTRACT_DIR). - Async:
run()is async; useasyncio.run(pipeline.run())in scripts orawait pipeline.run()in async code. - Error handling: Optional
error_contextwithErrorMode(STRICT, LENIENT, COLLECT) controls whether extraction/load failures raise or are collected inresult.errors.
Example:
import asyncio
from pycharter import Pipeline, PipelineContext, HTTPExtractor, PostgresLoader, Rename, AddField
# Programmatic
pipeline = (
Pipeline(HTTPExtractor(url="https://api.example.com/users"))
| Rename({"userName": "name"})
| AddField("processed_at", "now()")
| PostgresLoader(connection_string="...", table="users")
)
result = asyncio.run(pipeline.run())
# Config-driven (explicit files)
pipeline = Pipeline.from_config_files(
extract="configs/extract.yaml",
load="configs/load.yaml",
variables={"API_KEY": "secret"}
)
# Config-driven (directory: extract.yaml, transform.yaml, load.yaml)
pipeline = Pipeline.from_config_dir("pipelines/users/")
# Config-driven (single file)
pipeline = Pipeline.from_config_file("pipelines/users/pipeline.yaml")
result = asyncio.run(pipeline.run())
Exceptions: Pipeline and config loading use PyCharterโs exception hierarchy: PyCharterError (base), ConfigError, ConfigValidationError, ExpressionError. See Exceptions under API Reference.
See pycharter/etl_generator/ASYNC_AND_EXECUTION.md for async usage and error modes.
6. ๐ Quality Assurance (pycharter.quality)
Purpose: Data quality assurance pipeline that polices data according to data contracts, calculates quality metrics, tracks violations, and generates quality reports.
When to Use: When you need to:
- Monitor data quality over time
- Calculate quality scores and metrics
- Track and manage data quality violations
- Set quality thresholds and get alerts
- Generate quality reports for governance
How It Works:
- Validates data against contracts (using Runtime Validator)
- Calculates quality metrics (accuracy, completeness, violation rates)
- Tracks violations for audit and remediation
- Checks quality thresholds and generates alerts
- Produces comprehensive quality reports
Example:
from pycharter import QualityCheck, QualityCheckOptions, QualityThresholds
# Define quality thresholds
thresholds = QualityThresholds(
min_overall_score=95.0,
max_violation_rate=0.05
)
# Run quality check
check = QualityCheck(store=store)
report = check.run(
schema_id="user_schema_v1",
data="data/users.json",
options=QualityCheckOptions(
calculate_metrics=True,
record_violations=True,
check_thresholds=True,
thresholds=thresholds
)
)
print(f"Quality Score: {report.quality_score.overall_score:.2f}/100")
print(f"Passed: {report.passed}")
Contribution to Journey: The quality assurance module is the policing layer that ensures data quality is maintained throughout the pipeline. It provides metrics, tracking, and alerting capabilities that transform PyCharter from a contract management tool into a complete data quality assurance platform.
See Quality Module README for detailed documentation.
Complete Workflow Example
Here's how all services work together in a complete data production journey:
from pycharter import (
parse_contract_file,
SQLiteMetadataStore,
from_dict,
Validator,
to_dict
)
# Step 1: Parse contract specification
metadata = parse_contract_file("user_contract.yaml")
# Step 2: Store metadata in database
store = SQLiteMetadataStore("metadata.db")
store.connect()
schema_id = store.store_schema("user", metadata.schema, version="1.0")
# Merge ownership and governance into metadata before storing
# Ownership and governance are part of metadata, not separate entities
metadata_dict = metadata.metadata.copy() if metadata.metadata else {}
if metadata.ownership:
metadata_dict["business_owners"] = [metadata.ownership.get("owner", "unknown")] if metadata.ownership.get("owner") else []
if metadata.governance_rules:
metadata_dict["governance_rules"] = metadata.governance_rules
# Store metadata once with all information (ownership and governance included)
store.store_metadata(resource_id=schema_id, resource_type="schema", metadata=metadata_dict)
# Store coercion and validation rules
store.store_coercion_rules(schema_id, {"age": "coerce_to_integer"}, version="1.0")
store.store_validation_rules(schema_id, {"age": {"is_positive": {}}}, version="1.0")
# Step 3: Generate Pydantic model from stored schema
schema = store.get_schema(schema_id)
UserModel = from_dict(schema, "User")
# Step 4: (Optional) Convert model back to schema for documentation
schema_doc = to_dict(UserModel)
# Step 5: Validate data in production pipeline
# Option A: Using Validator class (recommended for production)
validator = Validator(store=store, schema_id=schema_id)
def process_user_data(raw_data):
result = validator.validate(raw_data)
if result.is_valid:
# Process validated data
return result.data
else:
# Handle validation errors
raise ValueError(f"Invalid data: {result.errors}")
# Option B: Using convenience function (quick start)
from pycharter import validate_with_store
def process_user_data_quick(raw_data):
result = validate_with_store(store, schema_id, raw_data)
if result.is_valid:
return result.data
else:
raise ValueError(f"Invalid data: {result.errors}")
7. ๐ REST API (api/) (api/)
Purpose: Expose all PyCharter services as REST API endpoints.
When to Use: When you need to use PyCharter from non-Python applications, microservices, or want to provide a web-based interface.
How It Works:
- Provides HTTP endpoints for all core services
- Uses FastAPI for automatic OpenAPI/Swagger documentation
- Supports both store-based and contract-based operations
- Handles request/response validation with Pydantic models
- Located at the root level (
api/) as a separate application - All endpoints are async-ready for better performance
Example:
# Start the API server
pycharter api
# Or with uvicorn
uvicorn api.main:app --reload
Endpoints:
POST /api/v1/contracts/parse- Parse a data contractPOST /api/v1/contracts/build- Build contract from storePOST /api/v1/metadata/schemas- Store a schemaGET /api/v1/metadata/schemas/{schema_id}- Get a schemaPOST /api/v1/schemas/generate- Generate Pydantic modelPOST /api/v1/validation/validate- Validate dataPOST /api/v1/validation/validate-batch- Batch validation
Documentation:
- Swagger UI: http://localhost:8000/docs
- ReDoc: http://localhost:8000/redoc
See api/README.md for complete API documentation.
Service Integration Summary
| Service | Input | Output | Journey Stage |
|---|---|---|---|
| Contract Parser | Contract files (YAML/JSON) | ContractMetadata |
Contract Specification โ Parsing |
| Contract Builder | Separate artifacts or Store | Consolidated contract | Storage โ Consolidation |
| Metadata Store | ContractMetadata |
Stored metadata (DB) | Parsing โ Storage |
| Pydantic Generator | JSON Schema | Pydantic models | Storage โ Model Generation |
| JSON Schema Converter | Pydantic models | JSON Schema | (Bidirectional) |
| Runtime Validator | Pydantic models + Data | ValidationResult |
Model Generation โ Validation |
| ETL Pipelines | Config files or code | PipelineResult |
Extract โ Transform โ Load |
| Quality Assurance | Contract + Data | QualityReport |
Validation โ Quality Monitoring |
Each service is designed to be independent yet composable, allowing you to use them individually or together as part of a complete data contract management system.
๐ Documentation
- Data Journey Guide - Complete guide to the data production journey, including both combined and separated workflows
- Configuration Guide - Database setup, connection configuration, initialization, and migration commands
- Database ERD - Database schema documentation and entity relationship diagrams
- Examples - Complete working examples for all PyCharter services
- API Documentation - REST API endpoints and usage
๐ Usage Examples
Basic Usage
Using Convenience Functions (Quick Start):
from pycharter import from_dict, from_json, from_file
# From dictionary
schema = {
"type": "object",
"version": "1.0.0",
"properties": {
"title": {"type": "string"},
"published": {"type": "boolean", "default": False}
}
}
Article = from_dict(schema, "Article")
# From JSON string
schema_json = '{"type": "object", "version": "1.0.0", "properties": {"name": {"type": "string"}}}'
User = from_json(schema_json, "User")
# From file
Product = from_file("product_schema.json", "Product")
Using Validator Class (Production):
from pycharter import Validator
# From directory or single file
validator = Validator.from_dir("data/contracts/article")
# or: validator = Validator.from_file("article_contract.yaml")
result = validator.validate({"title": "My Article", "published": True})
Nested Objects
from pycharter import from_dict
schema = {
"type": "object",
"version": "1.0.0",
"properties": {
"name": {"type": "string"},
"address": {
"type": "object",
"properties": {
"street": {"type": "string"},
"city": {"type": "string"},
"zipcode": {"type": "string"}
}
}
}
}
Person = from_dict(schema, "Person")
person = Person(
name="Alice",
address={
"street": "123 Main St",
"city": "New York",
"zipcode": "10001"
}
)
print(person.address.city) # Output: New York
Arrays and Collections
from pycharter import from_dict
schema = {
"type": "object",
"version": "1.0.0",
"properties": {
"tags": {
"type": "array",
"items": {"type": "string"}
},
"items": {
"type": "array",
"items": {
"type": "object",
"properties": {
"name": {"type": "string"},
"price": {"type": "number"}
}
}
}
}
}
Cart = from_dict(schema, "Cart")
cart = Cart(
tags=["python", "pydantic"],
items=[
{"name": "Apple", "price": 1.50},
{"name": "Banana", "price": 0.75}
]
)
print(cart.items[0].name) # Output: Apple
Coercion and Validation
PyCharter supports coercion (pre-validation transformation) and validation (post-validation checks):
from pycharter import from_dict
schema = {
"type": "object",
"version": "1.0.0",
"properties": {
"flight_number": {
"type": "integer",
"coercion": "coerce_to_integer" # Convert string/float to int
},
"destination": {
"type": "string",
"coercion": "coerce_to_string",
"validations": {
"min_length": {"threshold": 3},
"max_length": {"threshold": 3},
"no_capital_characters": None,
"only_allow": {"allowed_values": ["abc", "def", "ghi"]}
}
},
"distance": {
"type": "number",
"coercion": "coerce_to_float",
"validations": {
"greater_than_or_equal_to": {"threshold": 0}
}
}
}
}
Flight = from_dict(schema, "Flight")
# Coercion happens automatically
flight = Flight(
flight_number="123", # Coerced to int: 123
destination="abc", # Passes all validations
distance="100.5" # Coerced to float: 100.5
)
๐ Standard JSON Schema Support
Charter supports all standard JSON Schema Draft 2020-12 validation keywords:
| Keyword | Type | Description | Example |
|---|---|---|---|
minLength |
string | Minimum string length | {"minLength": 3} |
maxLength |
string | Maximum string length | {"maxLength": 10} |
pattern |
string | Regular expression pattern | {"pattern": "^[a-z]+$"} |
enum |
any | Allowed values | {"enum": ["a", "b", "c"]} |
const |
any | Single allowed value | {"const": "fixed"} |
minimum |
number | Minimum value (inclusive) | {"minimum": 0} |
maximum |
number | Maximum value (inclusive) | {"maximum": 100} |
exclusiveMinimum |
number | Minimum value (exclusive) | {"exclusiveMinimum": 0} |
exclusiveMaximum |
number | Maximum value (exclusive) | {"exclusiveMaximum": 100} |
multipleOf |
number | Must be multiple of | {"multipleOf": 2} |
minItems |
array | Minimum array length | {"minItems": 1} |
maxItems |
array | Maximum array length | {"maxItems": 10} |
uniqueItems |
array | Array items must be unique | {"uniqueItems": true} |
All schemas are validated against JSON Schema standard before processing, ensuring compliance.
๐ง Built-in Coercions (Charter Extensions)
| Coercion | Description |
|---|---|
coerce_to_string |
Convert int, float, bool, datetime, dict, list to string |
coerce_to_integer |
Convert float, string (numeric), bool, datetime to int |
coerce_to_float |
Convert int, string (numeric), bool to float |
coerce_to_boolean |
Convert int, string to bool |
coerce_to_datetime |
Convert string (ISO format), timestamp to datetime |
coerce_to_date |
Convert string (date format), datetime to date (date only, no time) |
coerce_to_uuid |
Convert string to UUID |
coerce_to_lowercase |
Convert string to lowercase |
coerce_to_uppercase |
Convert string to uppercase |
coerce_to_stripped_string |
Strip leading and trailing whitespace from string |
coerce_to_list |
Convert single value to list [value] (preserves None) |
coerce_empty_to_null |
Convert empty strings/lists/dicts to None (useful for nullable fields) |
โ Built-in Validations (Charter Extensions)
| Validation | Description | Configuration |
|---|---|---|
min_length |
Minimum length for strings/arrays | {"threshold": N} |
max_length |
Maximum length for strings/arrays | {"threshold": N} |
only_allow |
Only allow specific values | {"allowed_values": [...]} |
greater_than_or_equal_to |
Numeric minimum | {"threshold": N} |
less_than_or_equal_to |
Numeric maximum | {"threshold": N} |
is_positive |
Value must be positive | {"threshold": 0} |
no_capital_characters |
No uppercase letters | null |
no_special_characters |
Only alphanumeric and spaces | null |
non_empty_string |
String must not be empty | null |
matches_regex |
String must match regex pattern | {"pattern": "..."} |
is_email |
String must be valid email address | null |
is_url |
String must be valid URL | null |
is_alphanumeric |
Only alphanumeric characters (no spaces/special) | null |
is_numeric_string |
String must be numeric (digits, optional decimal) | null |
is_unique |
All items in array must be unique | null |
Note: Charter extensions (
coercionandvalidations) are optional and can be used alongside standard JSON Schema keywords. All validation logic is stored as data in the JSON schema, making it fully data-driven.
๐จ Custom Coercions and Validations
Extend Charter with your own coercion and validation functions:
from pycharter.shared.coercions import register_coercion
from pycharter.shared.validations import register_validation
# Register custom coercion
def coerce_to_uppercase(data):
if isinstance(data, str):
return data.upper()
return data
register_coercion("coerce_to_uppercase", coerce_to_uppercase)
# Register custom validation
def must_be_positive(threshold=0):
def _validate(value, info):
if value <= threshold:
raise ValueError(f"Value must be > {threshold}")
return value
return _validate
register_validation("must_be_positive", must_be_positive)
๐ API Reference
PyCharter's API is organized into three tiers to help you choose the right approach:
Tier 1: Primary Interfaces (Classes - Best Performance)
Validator - Primary validation interface (recommended for production)
from pycharter import Validator
# Create validator via factory methods or store
validator = Validator.from_dir("data/contracts/user")
validator = Validator.from_files(schema="schema.yaml", coercion_rules="coercion.yaml")
validator = Validator.from_file("contract.yaml")
validator = Validator.from_dict(schema={...}, coercion_rules={...})
validator = Validator(store=store, schema_id="user_schema") # from metadata store
# Validate data
result = validator.validate(data)
results = validator.validate_batch([data1, data2])
model = validator.get_model() # Get the generated Pydantic model
QualityCheck - Primary quality assurance interface
from pycharter import QualityCheck, QualityCheckOptions
check = QualityCheck(store=store)
report = check.run(schema_id="user_schema", data=data, options=QualityCheckOptions(...))
MetadataStoreClient - Base class for metadata stores
from pycharter import MetadataStoreClient, SQLiteMetadataStore, PostgresMetadataStore
store = SQLiteMetadataStore("metadata.db")
store.connect()
Tier 2: Convenience Functions (Quick Start)
Pydantic Generator - Input type helpers
from_dict(schema: dict, model_name: str = "DynamicModel")- Create model from dictionaryfrom_json(json_string: str, model_name: str = "DynamicModel")- Create model from JSON stringfrom_file(file_path: str, model_name: str = None)- Create model from file (JSON/YAML)from_url(url: str, model_name: str = "DynamicModel")- Create model from URLgenerate_model(schema: dict, model_name: str = "DynamicModel")- Advanced: more controlgenerate_model_file(schema: dict, output_path: str, model_name: str = "DynamicModel")- Generate and save to file
JSON Schema Converter - Output type helpers
to_dict(model: Type[BaseModel], ...)- Convert model to JSON Schema dictionaryto_file(model: Type[BaseModel], file_path: str, ...)- Convert model to fileto_json(model: Type[BaseModel], ...)- Convert model to JSON stringmodel_to_schema(model: Type[BaseModel], ...)- Advanced: core conversion function
Runtime Validator - Data source helpers
validate_with_store(store, schema_id, data, ...)- Quick validation with metadata storevalidate_batch_with_store(store, schema_id, data_list, ...)- Batch validation with storevalidate_with_contract(contract, data, ...)- Quick validation with contract file/dictvalidate_batch_with_contract(contract, data_list, ...)- Batch validation with contractget_model_from_store(store, schema_id, ...)- Get model from metadata storeget_model_from_contract(contract, ...)- Get model from contractvalidate_input(contract, ...)- Decorator for function input validationvalidate_output(contract, ...)- Decorator for function output validationvalidate_with_contract_decorator(contract, ...)- Decorator for contract-based validation
Contract Management
parse_contract(contract_dict: dict)- Parse contract dictionaryparse_contract_file(file_path: str)- Parse contract file (YAML/JSON)build_contract(artifacts: ContractArtifacts)- Build contract from artifactsbuild_contract_from_store(store, schema_id, ...)- Build contract from metadata store
Tier 3: Low-Level Utilities (When You Have Models)
validate(model: Type[BaseModel], data: dict, strict: bool = False)- Validate single recordvalidate_batch(model: Type[BaseModel], data_list: List[dict], strict: bool = False)- Batch validateValidationResult- Result class withis_valid,data, anderrorsattributes
Metadata Store Implementations
InMemoryMetadataStore()- In-memory store (testing/development)SQLiteMetadataStore(database_path: str)- SQLite databasePostgresMetadataStore(connection_string: str)- PostgreSQL databaseMongoDBMetadataStore(connection_string: str)- MongoDB databaseRedisMetadataStore(connection_string: str)- Redis database
Exceptions
PyCharter uses a small exception hierarchy for config and pipeline errors. Catch PyCharterError to handle any PyCharter failure:
PyCharterError- Base for all PyCharter exceptionsConfigError- Config loading/parsing failures (missing file, invalid YAML)ConfigValidationError- Schema validation failures (e.g. missing requiredtypefield)ConfigLoadError- Config file load errorsExpressionError- Expression evaluation failures (e.g. invalid syntax in AddField)
Pipeline run(error_context=...) supports ErrorMode: STRICT (raise on failure), LENIENT (log and continue), COLLECT (append to result.errors). Import from pycharter.shared.errors.
๐ฏ Design Principles & Requirements
Charter is designed to meet the following core requirements:
โ JSON Schema Standard Compliance
All schemas must abide by conventional JSON Schema syntax and qualify as valid JSON Schema:
- Validation: All schemas are validated against JSON Schema Draft 2020-12 standard before processing
- Standard Keywords: Full support for all standard validation keywords (minLength, pattern, enum, minimum, maximum, etc.)
- Compliance: Uses
jsonschemalibrary for validation with graceful fallback
โ Data-Driven Validation Logic
All schema information and complex field validation logic is stored as data, not Python code:
- Coercion: Referenced by name (string) in JSON:
"coercion": "coerce_to_integer" - Validations: Referenced by name with configuration (dict) in JSON:
"validations": {"min_length": {"threshold": 3}} - No Code Required: Validation rules are defined entirely in JSON schema files
- Example:
{"coercion": "coerce_to_string", "validations": {"min_length": {"threshold": 3}}}
โ Dynamic Pydantic Model Generation
Models are created dynamically at runtime from JSON schemas:
- Runtime Generation: Uses
pydantic.create_model()to generate models on-the-fly - Dynamic Validators: Field validators are dynamically attached using
field_validatordecorators - Multiple Sources: Models can be created from dicts, JSON strings, files, or URLs
- No Static Code: All models are generated from data, not pre-defined classes
โ Nested Schema Support
Full support for nested object schemas and complex structures:
- Recursive Processing: Nested objects are recursively processed into their own Pydantic models
- Arrays of Objects: Arrays containing nested objects are fully supported
- Deep Nesting: Deeply nested structures work correctly with full type safety
- Type Safety: Each nested object becomes its own typed Pydantic model
โ Extension Fields
Custom fields can be added to JSON Schema to extend functionality:
coercion: Pre-validation type conversion (e.g., string โ integer)validations: Post-validation custom rules- Optional: Extensions work alongside standard JSON Schema keywords
- Separated: Extensions are clearly distinguished from standard JSON Schema
โ Complex Field Validation
Support for both standard and custom field validators:
- Standard Validators: minLength, pattern, enum, minimum, maximum, etc. (JSON Schema standard)
- Custom Validators: Extensible validation rules via
validationsfield - Validation Order: Coercion โ Standard Validation โ Pydantic Validation โ Custom Validations
- Factory Pattern: Validators are factory functions that return validation functions
๐ Development Setup
Quick Setup
# Run setup script
./setup.sh
# Activate environment
source venv/bin/activate
# Run tests
pytest
Using Make
make install-dev # Install package and dev dependencies
make test # Run tests
make format # Format code with black and isort
make lint # Run type checking with mypy
make check # Run all checks (format, lint, test)
๐งช Testing
# Run all tests
pytest
# Run with coverage
pytest --cov=pycharter --cov-report=html
# Run specific test file
pytest tests/test_converter.py
# Run tests matching a pattern
pytest -k "coercion"
๐ฆ Publishing to PyPI
Automatic publishing via GitHub Releases (Trusted Publishing - no tokens needed!):
# 1. Update version in pyproject.toml
# version = "0.0.21"
# 2. Commit and push
git add pyproject.toml
git commit -m "Bump version to 0.0.21"
git push
# 3. Create GitHub Release (automatically publishes to PyPI)
gh release create v0.0.21 --title "v0.0.21" --notes "Release notes"
The workflow automatically:
- โ Builds UI
- โ Builds Python package
- โ Publishes to PyPI (using Trusted Publishing)
Local build:
make build # Builds package (UI built automatically via setup.py)
See PUBLISHING.md for complete documentation.
๐ JSON Schema Compliance
PyCharter is fully compliant with JSON Schema Draft 2020-12 standard:
- All schemas are validated against the standard before processing
- Full support for all standard keywords (minLength, maxLength, pattern, enum, minimum, maximum, etc.)
- Optional extensions (
coercionandvalidations) work alongside standard keywords - Strict mode available to enforce standard-only schemas
๐ Requirements
- Python 3.10+
- Pydantic >= 2.0.0
- jsonschema >= 4.0.0 (optional, for enhanced validation)
๐ค Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add some amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
๐ License
This project is licensed under the MIT License - see the LICENSE file for details.
๐ Links
- Repository: GitHub
- Issues: GitHub Issues
- Documentation: GitHub README
Made with โค๏ธ for the Python community
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pycharter-0.0.25.tar.gz.
File metadata
- Download URL: pycharter-0.0.25.tar.gz
- Upload date:
- Size: 1.4 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cfdfeb5a024e1665c28bc47d4f0f53c041126e3fa62cd7c3a90aa941e9bc61a0
|
|
| MD5 |
50314cf978e1af03f04fb07ebcbcf348
|
|
| BLAKE2b-256 |
0f3f1d074154c0660b6b21cc5b59454f25e8a6ee17d6bd39c9ce25ec32a2cd1c
|
File details
Details for the file pycharter-0.0.25-py3-none-any.whl.
File metadata
- Download URL: pycharter-0.0.25-py3-none-any.whl
- Upload date:
- Size: 1.7 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
edc1d0aaf1b460328dfedba7db74b821e37e5d28906f695a1f6d1cdde3be5e2a
|
|
| MD5 |
2f9465bf0db2d0462de211b3d16f4e06
|
|
| BLAKE2b-256 |
5c1689946d3204479e2c0867cbfe3ad8b7109f364f1425426f8053f9b0949cf1
|