Skip to main content

A flexible data transformation library with a plugin system

Project description

๐ŸŒ€ Tukuy

A flexible data transformation library with a plugin system for Python.

๐Ÿš€ Overview

Tukuy (meaning "to transform" or "to become" in Quechua) is a powerful and extensible data transformation library that makes it easy to manipulate, validate, and extract data from various formats. With its plugin architecture, Tukuy provides a unified interface for working with text, HTML, JSON, dates, numbers, and more.

โœจ Features

  • ๐Ÿงฉ Plugin System: Easily extend functionality with custom plugins
  • ๐Ÿ”„ Chainable Transformers: Compose multiple transformations in sequence
  • ๐Ÿงช Type-safe Transformations: With built-in validation
  • ๐Ÿ” Pattern-based Data Extraction: Extract structured data from HTML and JSON
  • ๐Ÿ›ก๏ธ Error Handling: Comprehensive error handling with detailed messages

๐Ÿ“ฆ Installation

pip install tukuy

๐Ÿ› ๏ธ Basic Usage

from tukuy import TukuyTransformer

# Create transformer
TUKUY = TukuyTransformer()

# Basic text transformation
text = " Hello World! "
result = TUKUY.transform(text, [
    "strip",
    "lowercase",
    {"function": "truncate", "length": 5}
])
print(result)  # "hello..."

# HTML transformation
html = "<div>Hello <b>World</b>!</div>"
result = TUKUY.transform(html, [
    "strip_html_tags",
    "lowercase"
])
print(result)  # "hello world!"

# Date transformation
date_str = "2023-01-01"
age = TUKUY.transform(date_str, [
    {"function": "age_calc"}
])
print(age)  # 1

# Validation
email = "test@example.com"
valid = TUKUY.transform(email, ["email_validator"])
print(valid)  # "test@example.com" or None if invalid

๐Ÿ” Pattern-based Extraction

Tukuy provides powerful pattern-based extraction capabilities for both HTML and JSON data.

๐ŸŒ HTML Extraction

pattern = {
    "properties": [
        {
            "name": "title",
            "selector": "h1",
            "transform": ["strip", "lowercase"]
        },
        {
            "name": "links",
            "selector": "a",
            "attribute": "href",
            "type": "array"
        }
    ]
}

data = TUKUY.extract_html_with_pattern(html, pattern)

๐Ÿ“‹ JSON Extraction

pattern = {
    "properties": [
        {
            "name": "user",
            "selector": "data.user",
            "properties": [
                {
                    "name": "name",
                    "selector": "fullName",
                    "transform": ["strip"]
                }
            ]
        }
    ]
}

data = TUKUY.extract_json_with_pattern(json_str, pattern)

๐Ÿš€ Use Cases

Tukuy is designed to handle a wide range of data transformation scenarios:

  • ๐ŸŒ Web Scraping: Extract structured data from HTML pages
  • ๐Ÿ“Š Data Cleaning: Normalize and validate data from various sources
  • ๐Ÿ”„ Format Conversion: Transform data between different formats
  • ๐Ÿ“ Text Processing: Apply complex text transformations
  • ๐Ÿ” Data Extraction: Extract specific information from complex structures
  • โœ… Validation: Ensure data meets specific criteria

โšก Performance Tips

  • ๐Ÿ”— Chain Transformations: Use chained transformations to avoid intermediate objects
  • ๐Ÿงฉ Use Built-in Transformers: Built-in transformers are optimized for performance
  • ๐Ÿ” Be Specific with Selectors: More specific selectors are faster to process
  • ๐Ÿ› ๏ธ Custom Transformers: For performance-critical operations, create custom transformers
  • ๐Ÿ“ฆ Batch Processing: Process data in batches for better performance

๐Ÿ›ก๏ธ Error Handling

Tukuy provides comprehensive error handling with detailed error messages:

from tukuy.exceptions import ValidationError, TransformationError, ParseError

try:
    result = TUKUY.transform(data, transformations)
except ValidationError as e:
    print(f"Validation failed: {e}")
except ParseError as e:
    print(f"Parsing failed: {e}")
except TransformationError as e:
    print(f"Transformation failed: {e}")

๐Ÿค Contributing

Contributions are welcome! Here's how you can help:

  1. ๐Ÿด Fork the repository
  2. ๐ŸŒฟ Create a feature branch (git checkout -b feature/amazing-feature)
  3. ๐Ÿ’ป Make your changes
  4. โœ… Run tests with pytest
  5. ๐Ÿ“ Update documentation if needed
  6. ๐Ÿ”„ Commit your changes (git commit -m 'Add amazing feature')
  7. ๐Ÿš€ Push to the branch (git push origin feature/amazing-feature)
  8. ๐Ÿ” Open a Pull Request

๐Ÿงฉ Plugin System Documentation

Tukuy's plugin system is the core of its extensibility. Below is a comprehensive list of all available plugins and their features.

๐Ÿ“š Built-in Plugins

๐Ÿ“ Text Plugin (text)

  • Description: Handles text manipulation and string operations
  • Key Transformers:
    • strip: Remove leading/trailing whitespace
    • lowercase: Convert text to lowercase
    • uppercase: Convert text to uppercase
    • truncate: Truncate text to specified length
    • replace: Replace text patterns
    • regex_replace: Replace using regular expressions
    • split: Split text into array
    • join: Join array into text
    • normalize: Normalize text (remove diacritics)

๐ŸŒ HTML Plugin (html)

  • Description: Process and extract data from HTML content
  • Key Transformers:
    • strip_html_tags: Remove HTML tags
    • extract_text: Extract text content
    • select: Extract content using CSS selectors
    • extract_links: Get all links from HTML
    • extract_tables: Extract tables to structured data
    • clean_html: Sanitize HTML content

๐Ÿ“… Date Plugin (date)

  • Description: Handle date parsing, formatting, and calculations
  • Key Transformers:
    • parse_date: Convert string to date object
    • format_date: Format date to string
    • age_calc: Calculate age from date
    • add_days: Add days to date
    • diff_days: Calculate days between dates
    • is_weekend: Check if date is weekend
    • to_timezone: Convert between timezones

๐Ÿ”ข Numerical Plugin (numerical)

  • Description: Mathematical operations and number formatting
  • Key Transformers:
    • round: Round number to decimals
    • format_number: Format with thousand separators
    • to_currency: Format as currency
    • percentage: Convert to percentage
    • math_eval: Evaluate mathematical expressions
    • scale: Scale number to range
    • statistics: Calculate basic statistics

โœ… Validation Plugin (validation)

  • Description: Data validation and verification
  • Key Transformers:
    • email_validator: Validate email addresses
    • url_validator: Validate URLs
    • phone_validator: Validate phone numbers
    • length_validator: Validate string length
    • range_validator: Validate number ranges
    • regex_validator: Validate against regex pattern
    • type_validator: Validate data types

๐Ÿ“‹ JSON Plugin (json)

  • Description: JSON manipulation and extraction
  • Key Transformers:
    • parse_json: Parse JSON string
    • stringify: Convert to JSON string
    • extract: Extract values using JSON path
    • flatten: Flatten nested JSON
    • merge: Merge multiple JSON objects
    • validate_schema: Validate against JSON schema

๐Ÿ”Œ Creating Custom Plugins

You can create custom plugins by extending the TransformerPlugin class:

from tukuy.plugins import TransformerPlugin
from tukuy.base import ChainableTransformer

class ReverseTransformer(ChainableTransformer[str, str]):
    def validate(self, value: str) -> bool:
        return isinstance(value, str)
    
    def _transform(self, value: str, context=None) -> str:
        return value[::-1]

class MyPlugin(TransformerPlugin):
    def __init__(self):
        super().__init__("my_plugin")
    
    @property
    def transformers(self):
        return {
            'reverse': lambda _: ReverseTransformer('reverse')
        }

# Usage
TUKUY = TukuyTransformer()
TUKUY.register_plugin(MyPlugin())

result = TUKUY.transform("hello", ["reverse"])  # "olleh"

๐Ÿ”„ Plugin Lifecycle

Plugins can implement initialize() and cleanup() methods for setup and teardown:

class MyPlugin(TransformerPlugin):
    def initialize(self) -> None:
        super().initialize()
        # Load resources, connect to databases, etc.
    
    def cleanup(self) -> None:
        super().cleanup()
        # Close connections, free resources, etc.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tukuy-0.0.6.tar.gz (64.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tukuy-0.0.6-py3-none-any.whl (66.2 kB view details)

Uploaded Python 3

File details

Details for the file tukuy-0.0.6.tar.gz.

File metadata

  • Download URL: tukuy-0.0.6.tar.gz
  • Upload date:
  • Size: 64.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for tukuy-0.0.6.tar.gz
Algorithm Hash digest
SHA256 91592ca8e37b50a283dfbd4f6baf377965c1faddf683dc627774421b5aff77e0
MD5 4f95db1b4383afb2b2b99f747ee17a52
BLAKE2b-256 f1d8171eb50376b12142cab58df0c0a3a2e1bc357cfd208df2c2209b892b3efc

See more details on using hashes here.

File details

Details for the file tukuy-0.0.6-py3-none-any.whl.

File metadata

  • Download URL: tukuy-0.0.6-py3-none-any.whl
  • Upload date:
  • Size: 66.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for tukuy-0.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 2b18c21132bf89bee5b420c7676c3d5da1ef3c49eae7203f57c57b48128de113
MD5 bf2e369531bf5528436c8013f9d5dea8
BLAKE2b-256 be2de2c610d72ed3f4af6c09f0608d761c8323e341f8b25a79990476dbdf8032

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page