A flexible data transformation library with a plugin system
Project description
๐ Tukuy
A flexible data transformation library with a plugin system for Python.
๐ Overview
Tukuy (meaning "to transform" or "to become" in Quechua) is a powerful and extensible data transformation library that makes it easy to manipulate, validate, and extract data from various formats. With its plugin architecture, Tukuy provides a unified interface for working with text, HTML, JSON, dates, numbers, and more.
โจ Features
- ๐งฉ Plugin System: Easily extend functionality with custom plugins
- ๐ Chainable Transformers: Compose multiple transformations in sequence
- ๐งช Type-safe Transformations: With built-in validation
- ๐ Pattern-based Data Extraction: Extract structured data from HTML and JSON
- ๐ก๏ธ Error Handling: Comprehensive error handling with detailed messages
๐ฆ Installation
pip install tukuy
๐ ๏ธ Basic Usage
from tukuy import TukuyTransformer
# Create transformer
TUKUY = TukuyTransformer()
# Basic text transformation
text = " Hello World! "
result = TUKUY.transform(text, [
"strip",
"lowercase",
{"function": "truncate", "length": 5}
])
print(result) # "hello..."
# HTML transformation
html = "<div>Hello <b>World</b>!</div>"
result = TUKUY.transform(html, [
"strip_html_tags",
"lowercase"
])
print(result) # "hello world!"
# Date transformation
date_str = "2023-01-01"
age = TUKUY.transform(date_str, [
{"function": "age_calc"}
])
print(age) # 1
# Validation
email = "test@example.com"
valid = TUKUY.transform(email, ["email_validator"])
print(valid) # "test@example.com" or None if invalid
๐ Pattern-based Extraction
Tukuy provides powerful pattern-based extraction capabilities for both HTML and JSON data.
๐ HTML Extraction
pattern = {
"properties": [
{
"name": "title",
"selector": "h1",
"transform": ["strip", "lowercase"]
},
{
"name": "links",
"selector": "a",
"attribute": "href",
"type": "array"
}
]
}
data = TUKUY.extract_html_with_pattern(html, pattern)
๐ JSON Extraction
pattern = {
"properties": [
{
"name": "user",
"selector": "data.user",
"properties": [
{
"name": "name",
"selector": "fullName",
"transform": ["strip"]
}
]
}
]
}
data = TUKUY.extract_json_with_pattern(json_str, pattern)
๐ Use Cases
Tukuy is designed to handle a wide range of data transformation scenarios:
- ๐ Web Scraping: Extract structured data from HTML pages
- ๐ Data Cleaning: Normalize and validate data from various sources
- ๐ Format Conversion: Transform data between different formats
- ๐ Text Processing: Apply complex text transformations
- ๐ Data Extraction: Extract specific information from complex structures
- โ Validation: Ensure data meets specific criteria
โก Performance Tips
- ๐ Chain Transformations: Use chained transformations to avoid intermediate objects
- ๐งฉ Use Built-in Transformers: Built-in transformers are optimized for performance
- ๐ Be Specific with Selectors: More specific selectors are faster to process
- ๐ ๏ธ Custom Transformers: For performance-critical operations, create custom transformers
- ๐ฆ Batch Processing: Process data in batches for better performance
๐ก๏ธ Error Handling
Tukuy provides comprehensive error handling with detailed error messages:
from tukuy.exceptions import ValidationError, TransformationError, ParseError
try:
result = TUKUY.transform(data, transformations)
except ValidationError as e:
print(f"Validation failed: {e}")
except ParseError as e:
print(f"Parsing failed: {e}")
except TransformationError as e:
print(f"Transformation failed: {e}")
๐ค Contributing
Contributions are welcome! Here's how you can help:
- ๐ด Fork the repository
- ๐ฟ Create a feature branch (
git checkout -b feature/amazing-feature) - ๐ป Make your changes
- โ
Run tests with
pytest - ๐ Update documentation if needed
- ๐ Commit your changes (
git commit -m 'Add amazing feature') - ๐ Push to the branch (
git push origin feature/amazing-feature) - ๐ Open a Pull Request
๐งฉ Plugin System Documentation
Tukuy's plugin system is the core of its extensibility. Below is a comprehensive list of all available plugins and their features.
๐ Built-in Plugins
๐ Text Plugin (text)
- Description: Handles text manipulation and string operations
- Key Transformers:
strip: Remove leading/trailing whitespacelowercase: Convert text to lowercaseuppercase: Convert text to uppercasetruncate: Truncate text to specified lengthreplace: Replace text patternsregex_replace: Replace using regular expressionssplit: Split text into arrayjoin: Join array into textnormalize: Normalize text (remove diacritics)
๐ HTML Plugin (html)
- Description: Process and extract data from HTML content
- Key Transformers:
strip_html_tags: Remove HTML tagsextract_text: Extract text contentselect: Extract content using CSS selectorsextract_links: Get all links from HTMLextract_tables: Extract tables to structured dataclean_html: Sanitize HTML content
๐
Date Plugin (date)
- Description: Handle date parsing, formatting, and calculations
- Key Transformers:
parse_date: Convert string to date objectformat_date: Format date to stringage_calc: Calculate age from dateadd_days: Add days to datediff_days: Calculate days between datesis_weekend: Check if date is weekendto_timezone: Convert between timezones
๐ข Numerical Plugin (numerical)
- Description: Mathematical operations and number formatting
- Key Transformers:
round: Round number to decimalsformat_number: Format with thousand separatorsto_currency: Format as currencypercentage: Convert to percentagemath_eval: Evaluate mathematical expressionsscale: Scale number to rangestatistics: Calculate basic statistics
โ
Validation Plugin (validation)
- Description: Data validation and verification
- Key Transformers:
email_validator: Validate email addressesurl_validator: Validate URLsphone_validator: Validate phone numberslength_validator: Validate string lengthrange_validator: Validate number rangesregex_validator: Validate against regex patterntype_validator: Validate data types
๐ JSON Plugin (json)
- Description: JSON manipulation and extraction
- Key Transformers:
parse_json: Parse JSON stringstringify: Convert to JSON stringextract: Extract values using JSON pathflatten: Flatten nested JSONmerge: Merge multiple JSON objectsvalidate_schema: Validate against JSON schema
๐ Creating Custom Plugins
You can create custom plugins by extending the TransformerPlugin class:
from tukuy.plugins import TransformerPlugin
from tukuy.base import ChainableTransformer
class ReverseTransformer(ChainableTransformer[str, str]):
def validate(self, value: str) -> bool:
return isinstance(value, str)
def _transform(self, value: str, context=None) -> str:
return value[::-1]
class MyPlugin(TransformerPlugin):
def __init__(self):
super().__init__("my_plugin")
@property
def transformers(self):
return {
'reverse': lambda _: ReverseTransformer('reverse')
}
# Usage
TUKUY = TukuyTransformer()
TUKUY.register_plugin(MyPlugin())
result = TUKUY.transform("hello", ["reverse"]) # "olleh"
๐ Plugin Lifecycle
Plugins can implement initialize() and cleanup() methods for setup and teardown:
class MyPlugin(TransformerPlugin):
def initialize(self) -> None:
super().initialize()
# Load resources, connect to databases, etc.
def cleanup(self) -> None:
super().cleanup()
# Close connections, free resources, etc.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tukuy-0.0.5.tar.gz.
File metadata
- Download URL: tukuy-0.0.5.tar.gz
- Upload date:
- Size: 64.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5091fefe15259aff96c1a01dd508da87dac295de6df2cca6cff40581aafaef82
|
|
| MD5 |
4137f4f46a6d5b60751f609d6d00e2c9
|
|
| BLAKE2b-256 |
9aedc31bad079f4eb3703572ded6aef5b7b0cea37b76907d974782aa78995756
|
File details
Details for the file tukuy-0.0.5-py3-none-any.whl.
File metadata
- Download URL: tukuy-0.0.5-py3-none-any.whl
- Upload date:
- Size: 66.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5352bd615685b21fdfcd0e3c0e2af336100a3524ac29f18d5527ce786128c0f0
|
|
| MD5 |
fa1a4663beea54e2e6628f8902832ab5
|
|
| BLAKE2b-256 |
518b5ce07a55eb202b12ef66b89b4dc4003651cb3e7b811a036cead81d7d60bf
|