Skip to main content

A data transformation library for flattening complex nested structures into tabular formats while preserving hierarchical relationships

Project description

Transmog

PyPI version Python versions License

A Python library for transforming complex nested data structures into flat, tabular formats while preserving hierarchical relationships.

Features

  • Multiple Input Formats: JSON, JSONL, CSV
  • Nested Structure Handling: Flattens deeply nested objects with customizable separators
  • Array Processing: Extracts arrays as child tables with parent-child relationships maintained
  • Output Options: Python dictionaries, PyArrow tables, JSON, CSV, Parquet
  • Performance Features: Chunked processing, streaming output, memory optimization
  • Data Integrity: Deterministic ID generation, consistent parent-child linking
  • Error Recovery: Configurable strategies for handling malformed data

Installation

pip install transmog

Optional dependencies:

pip install transmog[dev]  # Development tools

Quick Example

import transmog as tm

# Sample nested data
data = {
    "user": {
        "id": 1,
        "name": "John Doe",
        "contact": {
            "email": "john@example.com"
        },
        "orders": [
            {"id": 101, "amount": 99.99},
            {"id": 102, "amount": 45.50}
        ]
    }
}

# Process the data
processor = tm.Processor()
result = processor.process(data)

# Access the data
tables = result.to_dict()
main_table = tables["main"]
orders = tables["user_orders"]

# Export to different formats
result.write_all_json("output/json")
result.write_all_csv("output/csv")
result.write_all_parquet("output/parquet")

Configuration

# Use pre-configured modes
config = tm.TransmogConfig.memory_optimized()
# or
config = tm.TransmogConfig.performance_optimized()

# Custom configuration
config = (
    tm.TransmogConfig.default()
    .with_naming(separator=".")
    .with_processing(cast_to_string=True)
    .with_metadata(id_field="custom_id")
    .with_error_handling(max_retries=3)
)

processor = tm.Processor(config=config)

Large Dataset Processing

# Memory-optimized processing
processor = tm.Processor.memory_optimized()

# Chunked processing
result = processor.process_chunked(
    "large_data.jsonl",
    entity_name="records",
    chunk_size=1000
)

# Streaming output
processor.stream_process_file(
    "large_data.jsonl",
    entity_name="records",
    output_format="parquet",
    output_destination="output_dir"
)

Error Handling

# Skip and log errors
processor = tm.Processor().with_error_handling(recovery_strategy="skip")

# Partial recovery (preserves valid portions)
processor = tm.Processor.with_partial_recovery()

Documentation

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

transmog-1.0.6.tar.gz (88.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

transmog-1.0.6-py3-none-any.whl (112.9 kB view details)

Uploaded Python 3

File details

Details for the file transmog-1.0.6.tar.gz.

File metadata

  • Download URL: transmog-1.0.6.tar.gz
  • Upload date:
  • Size: 88.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for transmog-1.0.6.tar.gz
Algorithm Hash digest
SHA256 ffde0788e96553b2d2d215a35365119e6a2c909ca2841ac2cf2544b4d603a2a1
MD5 d579191382734c656284c379f663e164
BLAKE2b-256 02451bda770165b91e583bad1113c59668c588659137c20868246ebf03e97378

See more details on using hashes here.

File details

Details for the file transmog-1.0.6-py3-none-any.whl.

File metadata

  • Download URL: transmog-1.0.6-py3-none-any.whl
  • Upload date:
  • Size: 112.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for transmog-1.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 c1ee0abd99d4e2bf65351836ab949e89b5fd7b315e9b6d5b48c5814a6962417e
MD5 0d08dccdad721be62a19a3a894ed3bc4
BLAKE2b-256 52e269f361299e0d842a69d52ad09b767e709e5ea458dada38d405188590ccfc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page