A Python library for declarative data mapping and transformation

These details have not been verified by PyPI

Project links

Project description

Schematix

A Python library for declarative data mapping and transformation that emphasizes reusability and composability. Define your target schemas once and bind them to different data sources with intuitive operator overloading.

✨ Key Features

🎯 Reusable Schema Definitions - Define once, use across multiple data sources
🔧 Intuitive Operators - >>, |, &, @, + for elegant data transformations
🏗️ Type Agnostic - Works with dicts, dataclasses, Pydantic models, any attributable objects
🧩 Composable Architecture - Mix and match field types and transformations
🛡️ Comprehensive Validation - Built-in error handling and validation
📊 Batch Processing - Transform lists of data efficiently
🎨 Clean API - Readable, maintainable transformation code

🚀 Quick Start

Installation

pip install schematix

Basic Usage

from schematix import Schema, Field

# Define your target schema
class UserSchema(Schema):
    id = Field(source='user_id')
    email = Field(source='email_address', required=True)
    name = Field(source='first_name') + Field(source='last_name')

# Transform data
data = {
    'user_id': 123,
    'email_address': 'john@example.com',
    'first_name': 'John',
    'last_name': 'Doe'
}

user = UserSchema().transform(data)
# Result: {'id': 123, 'email': 'john@example.com', 'name': 'John Doe'}

🎭 Operator Magic

Schematix provides intuitive operators for common transformation patterns:

Pipeline (`>>`) - Connect source to target

source_field >> target_field  # Extract from source, assign to target

Fallback (`|`) - Try alternatives

Field(source='email') | Field(source='contact_email')  # Try email, fallback to contact_email

Combine (`&`) - Merge multiple fields

user_fields = Field(source='name') & Field(source='email') & Field(source='age')

Nested (`@`) - Apply to nested data

Field(source='name') @ 'user.profile'  # Extract name from data.user.profile.name

Accumulate (`+`) - Smart value combination

Field(source='first') + Field(source='last')  # "John" + "Doe" = "John Doe"
Field(source='price') + Field(source='tax')   # 100 + 15 = 115

🏗️ Advanced Usage

Schema Binding for Multiple Data Sources

class UserSchema(Schema):
    id = Field()
    email = Field(required=True)
    name = Field()

# Bind to different data sources
reddit_users = UserSchema().bind({
    'id': 'user_id',
    'email': 'email_addr',
    'name': ('username', str.title)  # Extract username and titlecase it
})

api_users = UserSchema().bind({
    'id': 'uid',
    'email': 'contact.email',
    'name': lambda data: f"{data['first']} {data['last']}"
})

# Transform from different sources
reddit_user = reddit_users.transform(reddit_data)
api_user = api_users.transform(api_data)

Enhanced Field Types

from schematix import SourceField, TargetField

# SourceField with fallbacks and conditions
email = SourceField(
    source='primary_email',
    fallbacks=['secondary_email', 'contact.email'],
    condition=lambda data: data.get('active', True)
)

# TargetField with formatting and multiple targets
name = TargetField(
    target='display_name',
    formatter=str.title,
    additionaltargets=['full_name', 'user_name']
)

Target Type Conversion

from dataclasses import dataclass

@dataclass
class User:
    id: int
    email: str
    name: str

# Convert directly to dataclass
user_obj = UserSchema().transform(data, typetarget=User)
print(type(user_obj))  # <class '__main__.User'>

Schema Composition

# Merge schemas
BaseUserSchema = Schema.merge(ContactSchema, ProfileSchema)

# Copy with modifications
ExtendedUserSchema = BaseUserSchema.copy(
    created_at=Field(source='registration_date'),
    is_premium=Field(source='account_type', transform=lambda x: x == 'premium')
)

# Create subsets
PublicUserSchema = ExtendedUserSchema.subset('id', 'name', 'email')

🔧 Real-World Examples

API Response Transformation

# GitHub API to internal user format
class GitHubUserSchema(Schema):
    id = Field(source='id')
    username = Field(source='login')
    name = Field(source='name') | Field(source='login')  # Fallback to login
    email = Field(source='email')
    repos = Field(source='public_repos', default=0)
    profile = Field(source='html_url')

github_user = GitHubUserSchema().transform(github_api_response)

Web Scraping Normalization

# Normalize product data from different e-commerce sites
class ProductSchema(Schema):
    name = Field()
    price = Field(transform=lambda x: float(x.replace('$', '')))
    rating = Field(default=0.0)

# Site-specific bindings
amazon_products = ProductSchema().bind({
    'name': 'title',
    'price': 'price.amount',
    'rating': 'averageRating'
})

ebay_products = ProductSchema().bind({
    'name': 'itemTitle',
    'price': 'currentPrice.value',
    'rating': ('feedbackScore', lambda x: x / 100)  # Convert to 0-5 scale
})

ETL Pipeline

# Database to data warehouse transformation
class AnalyticsUserSchema(Schema):
    user_id = Field(source='id', required=True)
    signup_date = Field(source='created_at', transform=parse_date)
    lifetime_value = Field(source='orders', transform=calculate_ltv)
    segment = (
        Field(source='total_spent', transform=lambda x: 'premium' if x > 1000 else 'standard') |
        Field(default='unknown')
    )

# Batch processing
users = AnalyticsUserSchema().transformplural(user_records)

📊 Error Handling & Validation

# Comprehensive validation
errors = UserSchema().validate(data)
if errors:
    print(f"Validation errors: {errors}")

# Field-level error handling with fallbacks
safe_extraction = (
    Field(source='primary_source', required=True) |
    Field(source='backup_source') |
    Field(default='fallback_value')
)

🛠️ Development Status

Schematix is actively developed and production-ready:

✅ 93 passing tests with comprehensive coverage
✅ Type hints throughout for excellent IDE support
✅ Detailed documentation and examples
✅ Semantic versioning and changelog
✅ MIT License - use freely in commercial projects

📚 Documentation

🤝 Contributing

Contributions are welcome! Please see CONTRIBUTING.md for guidelines.

📄 License

MIT License - see LICENSE for details.

🔗 Links

PyPI: https://pypi.org/project/schematix/
Repository: https://github.com/schizoprada/schematix
Documentation: https://schematix.readthedocs.io
Issues: https://github.com/schizoprada/schematix/issues

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.4.71

Jul 7, 2025

0.4.70

Jul 7, 2025

0.4.69

Jul 4, 2025

0.4.68

Jul 3, 2025

0.4.67

Jul 3, 2025

0.4.66

Jul 2, 2025

0.4.65

Jul 2, 2025

0.4.63

Jun 25, 2025

0.4.62

Jun 25, 2025

0.4.61

Jun 25, 2025

0.4.6

Jun 17, 2025

0.4.5

Jun 15, 2025

0.4.0

May 29, 2025

0.3.6

May 29, 2025

This version

0.3.0

May 27, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

schematix-0.3.0.tar.gz (27.2 kB view details)

Uploaded May 27, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

schematix-0.3.0-py3-none-any.whl (21.3 kB view details)

Uploaded May 27, 2025 Python 3

File details

Details for the file schematix-0.3.0.tar.gz.

File metadata

Download URL: schematix-0.3.0.tar.gz
Upload date: May 27, 2025
Size: 27.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for schematix-0.3.0.tar.gz
Algorithm	Hash digest
SHA256	`21970fcff2fecbe11894c94f578e53fe783ea7328d94a384c02ebac7b6053932`
MD5	`35a2a3af043b2784460ff03e9b411402`
BLAKE2b-256	`9ae4c144983263f9cf644d829dfbdd97fb8925c406e7af633d2a0fd3f84543f8`

See more details on using hashes here.

File details

Details for the file schematix-0.3.0-py3-none-any.whl.

File metadata

Download URL: schematix-0.3.0-py3-none-any.whl
Upload date: May 27, 2025
Size: 21.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for schematix-0.3.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e25d573f5def978a80f0096b672fd137eebcee7aac980af3f0fdcf448bc99dbe`
MD5	`cdc4d6bbc210db1ade73bee19e6bba5a`
BLAKE2b-256	`526c8d568e4b9a60f4bfc3af437a2f572a61cd63eb31e5d776abd82af31cd071`

See more details on using hashes here.

schematix 0.3.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Schematix

✨ Key Features

🚀 Quick Start

Installation

Basic Usage

🎭 Operator Magic

Pipeline (>>) - Connect source to target

Fallback (|) - Try alternatives

Combine (&) - Merge multiple fields

Nested (@) - Apply to nested data

Accumulate (+) - Smart value combination

🏗️ Advanced Usage

Schema Binding for Multiple Data Sources

Enhanced Field Types

Target Type Conversion

Schema Composition

🔧 Real-World Examples

API Response Transformation

Web Scraping Normalization

ETL Pipeline

📊 Error Handling & Validation

🛠️ Development Status

📚 Documentation

🤝 Contributing

📄 License

🔗 Links

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Pipeline (`>>`) - Connect source to target

Fallback (`|`) - Try alternatives

Combine (`&`) - Merge multiple fields

Nested (`@`) - Apply to nested data

Accumulate (`+`) - Smart value combination