A Python library for declarative data mapping and transformation
Project description
Schematix
A Python library for declarative data mapping and transformation that emphasizes reusability and composability. Define your target schemas once and bind them to different data sources with intuitive operator overloading.
✨ Key Features
- 🎯 Reusable Schema Definitions - Define once, use across multiple data sources
- 🔧 Intuitive Operators -
>>,|,&,@,+for elegant data transformations - 🏗️ Type Agnostic - Works with dicts, dataclasses, Pydantic models, any attributable objects
- 🧩 Composable Architecture - Mix and match field types and transformations
- 🛡️ Comprehensive Validation - Built-in error handling and validation
- 📊 Batch Processing - Transform lists of data efficiently
- 🎨 Clean API - Readable, maintainable transformation code
🚀 Quick Start
Installation
pip install schematix
Basic Usage
from schematix import Schema, Field
# Define your target schema
class UserSchema(Schema):
id = Field(source='user_id', target='id')
email = Field(source='email_address', target='email', required=True)
name = Field(source='first_name') + Field(source='last_name')
# Transform data
data = {
'user_id': 123,
'email_address': 'john@example.com',
'first_name': 'John',
'last_name': 'Doe'
}
user = UserSchema().transform(data)
# Result: {'id': 123, 'email': 'john@example.com', 'name': 'John Doe'}
🎭 Operator Magic
Schematix provides intuitive operators for common transformation patterns:
Pipeline (>>) - Connect source to target
source_field >> target_field # Extract from source, assign to target
Fallback (|) - Try alternatives
Field(source='email') | Field(source='contact_email') # Try email, fallback to contact_email
Combine (&) - Merge multiple fields
user_fields = Field(source='name') & Field(source='email') & Field(source='age')
Nested (@) - Apply to nested data
Field(source='name') @ 'user.profile' # Extract name from data.user.profile.name
Accumulate (+) - Smart value combination
Field(source='first') + Field(source='last') # "John" + "Doe" = "John Doe"
Field(source='price') + Field(source='tax') # 100 + 15 = 115
🏗️ Advanced Usage
Schema Binding for Multiple Data Sources
class UserSchema(Schema):
id = Field()
email = Field(required=True)
name = Field()
# Bind to different data sources
reddit_users = UserSchema().bind({
'id': 'user_id',
'email': 'email_addr',
'name': ('username', str.title) # Extract username and titlecase it
})
api_users = UserSchema().bind({
'id': 'uid',
'email': 'contact.email',
'name': lambda data: f"{data['first']} {data['last']}"
})
# Transform from different sources
reddit_user = reddit_users.transform(reddit_data)
api_user = api_users.transform(api_data)
Enhanced Field Types
from schematix import SourceField, TargetField
# SourceField with fallbacks and conditions
email = SourceField(
source='primary_email',
fallbacks=['secondary_email', 'contact.email'],
condition=lambda data: data.get('active', True)
)
# TargetField with formatting and multiple targets
name = TargetField(
target='display_name',
formatter=str.title,
additionaltargets=['full_name', 'user_name']
)
Target Type Conversion
from dataclasses import dataclass
@dataclass
class User:
id: int
email: str
name: str
# Convert directly to dataclass
user_obj = UserSchema().transform(data, typetarget=User)
print(type(user_obj)) # <class '__main__.User'>
Schema Composition
# Merge schemas
BaseUserSchema = Schema.merge(ContactSchema, ProfileSchema)
# Copy with modifications
ExtendedUserSchema = BaseUserSchema.copy(
created_at=Field(source='registration_date'),
is_premium=Field(source='account_type', transform=lambda x: x == 'premium')
)
# Create subsets
PublicUserSchema = ExtendedUserSchema.subset('id', 'name', 'email')
🔧 Real-World Examples
API Response Transformation
# GitHub API to internal user format
class GitHubUserSchema(Schema):
id = Field(source='id')
username = Field(source='login')
name = Field(source='name') | Field(source='login') # Fallback to login
email = Field(source='email')
repos = Field(source='public_repos', default=0)
profile = Field(source='html_url')
github_user = GitHubUserSchema().transform(github_api_response)
Web Scraping Normalization
# Normalize product data from different e-commerce sites
class ProductSchema(Schema):
name = Field()
price = Field(transform=lambda x: float(x.replace('$', '')))
rating = Field(default=0.0)
# Site-specific bindings
amazon_products = ProductSchema().bind({
'name': 'title',
'price': 'price.amount',
'rating': 'averageRating'
})
ebay_products = ProductSchema().bind({
'name': 'itemTitle',
'price': 'currentPrice.value',
'rating': ('feedbackScore', lambda x: x / 100) # Convert to 0-5 scale
})
ETL Pipeline
# Database to data warehouse transformation
class AnalyticsUserSchema(Schema):
user_id = Field(source='id', required=True)
signup_date = Field(source='created_at', transform=parse_date)
lifetime_value = Field(source='orders', transform=calculate_ltv)
segment = (
Field(source='total_spent', transform=lambda x: 'premium' if x > 1000 else 'standard') |
Field(default='unknown')
)
# Batch processing
users = AnalyticsUserSchema().transformplural(user_records)
📊 Error Handling & Validation
# Comprehensive validation
errors = UserSchema().validate(data)
if errors:
print(f"Validation errors: {errors}")
# Field-level error handling with fallbacks
safe_extraction = (
Field(source='primary_source', required=True) |
Field(source='backup_source') |
Field(default='fallback_value')
)
Decorator Style (Alternative Syntax)
import schematix as sx
# Define fields using decorators
@sx.field
class UserID:
source = 'user_id'
required = True
@sx.field.accumulated
class FullName:
fields = [
sx.Field(source='first_name'),
sx.Field(source='last_name')
]
# Define schema using decorator
@sx.schema
class UserSchema:
id = UserID
email = sx.Field(source='email_address', required=True)
name = FullName
# Same transformation capability
user = UserSchema().transform(data)
🔄 Transform System
Schematix now includes a powerful transform system for data processing pipelines:
Intuitive Transform Composition
from schematix.transforms import text, numbers, common
# Pipeline composition with >> operator
name_cleaner = text.strip >> text.title >> text.normalizewhitespace
# Fallback logic with | operator
safe_number = numbers.to.int | numbers.constant(0)
# Parallel processing with & operator
multi_format = numbers.format.currency & numbers.format.percent
# Real-world cleaning pipeline
email_processor = common.clean.email >> common.validate.email
Comprehensive Transform Library
- Text: String manipulation, regex, encoding, formatting (35+ transforms)
- Numbers: Math operations, formatting, validation (30+ transforms)
- Dates: Parsing, formatting, timezone handling (40+ transforms)
- Collections: List/dict operations, filtering, aggregation (25+ transforms)
- Validation: Format checking, cleaning, requirements (20+ transforms)
- Common: Pre-built patterns for real-world use cases (25+ transforms)
Advanced Features
# Context-aware transforms
full_name = transforms.multifield(['first_name', 'last_name'],
lambda f, l: f"{f} {l}")
# Conditional transforms
format_price = transforms.conditional(
lambda x: x > 100,
numbers.format.currency(),
numbers.format.commas()
)
# Safe operations with fallbacks
safe_clean = common.clean.safe.email(default="unknown@example.com")
Transform + Schema Integration
# Use transforms in field definitions
class UserSchema(Schema):
name = Field(source='full_name', transform=text.strip >> text.title)
email = Field(source='email_addr', transform=common.clean.email)
price = Field(source='amount', transform=numbers.to.float >> numbers.format.currency)
# Or use the short aliases
import schematix as sx
class ProductSchema(sx.Schema):
title = sx.Field(source='name', transform=sx.x.txt.title)
cost = sx.Field(source='price', transform=sx.x.num.format.currency())
🛠️ Development Status
Schematix is actively developed and production-ready:
- ✅ 173 passing tests with comprehensive coverage
- ✅ Type hints throughout for excellent IDE support
- ✅ Detailed documentation and examples
- ✅ Semantic versioning and changelog
- ✅ MIT License - use freely in commercial projects
🤝 Contributing
Contributions are welcome! Please see CONTRIBUTING.md for guidelines.
📄 License
MIT License - see LICENSE for details.
🔗 Links
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file schematix-0.4.66.tar.gz.
File metadata
- Download URL: schematix-0.4.66.tar.gz
- Upload date:
- Size: 69.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
89c505a93c7d4ca6bf5bb384e9abb2635150e71a7ca40bfa500417b1fe09c52c
|
|
| MD5 |
68f5d841d11376269302ce32dd65b351
|
|
| BLAKE2b-256 |
f538553d8b5c56cb46793b863b7ea5c27fecf738e978d8721093ab636ea529d0
|
File details
Details for the file schematix-0.4.66-py3-none-any.whl.
File metadata
- Download URL: schematix-0.4.66-py3-none-any.whl
- Upload date:
- Size: 52.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
75b8c2617b35e5595b4fd9c9b91588d8c99a76282224840875b292a181db3163
|
|
| MD5 |
a659252b85326514ade48e4b92c531d2
|
|
| BLAKE2b-256 |
0c50d79ac9b789545795a3c34939520bcb7c1f1bde81dba6f65c558b9b6a7d12
|