Skip to main content

A data transformation library for flattening complex nested structures into tabular formats while preserving hierarchical relationships

Project description

Transmog

PyPI version Python versions License

Transform nested data into flat tables with a simple, intuitive API.

Overview

Transmog transforms nested JSON data into flat, tabular formats while preserving relationships between parent and child records.

Key Features:

  • Simple one-function API with smart defaults
  • Multiple output formats (CSV, Parquet)
  • Automatic relationship preservation
  • Memory-efficient streaming for large datasets

Installation

Standard install (includes Parquet support):

pip install transmog

Minimal install (CSV only):

pip install transmog[minimal]

Quick Start

import transmog as tm

# Transform nested data into flat tables
data = {"product_id": "PROD-123", "name": "Gaming Laptop", "specs": {"cpu": "i7", "ram": "16GB"}}
result = tm.flatten(data, name="products")

# Access flattened data in memory (list of dicts)
print(result.main)
# [{'product_id': 'PROD-123', 'name': 'Gaming Laptop', 'specs_cpu': 'i7', 'specs_ram': '16GB', '_id': '...', '_timestamp': '...'}]

# Save to files in different formats
result.save("products.csv")        # Single CSV file
result.save("products.parquet")    # Single Parquet file

Example: Nested JSON to Multiple Tables

Transform complex nested data with arrays intelligently using smart mode (default):

data = {
    "user": {"name": "Alice", "email": "alice@example.com"},
    "tags": ["premium", "verified"],  # Simple array - kept as native array
    "orders": [  # Complex array - exploded to child table
        {"id": 101, "amount": 99.99, "items": ["laptop", "mouse"]},
        {"id": 102, "amount": 45.50, "items": ["keyboard"]}
    ]
}

result = tm.flatten(data, name="customer")

# Main table - flattened user data with native arrays
print(result.main)
# [
#   {
#     'user_name': 'Alice',
#     'user_email': 'alice@example.com',
#     'tags': ['premium', 'verified'],  # Native array!
#     '_id': '...',
#     '_timestamp': '...'
#   }
# ]

# Complex arrays become separate tables with parent references
print(result.tables["customer_orders"])
# [
#   {'id': 101, 'amount': 99.99, 'items': ['laptop', 'mouse'], '_parent_id': '...', '_id': '...', '_timestamp': '...'},
#   {'id': 102, 'amount': 45.50, 'items': ['keyboard'], '_parent_id': '...', '_id': '...', '_timestamp': '...'}
# ]

# Access all tables in memory
print(f"Created {len(result.all_tables)} tables:")
print(list(result.all_tables.keys()))
# ['customer', 'customer_orders', 'customer_orders_items']

# Save to different formats for analysis
result.save("analytics/", "csv")       # CSV files for database import
result.save("warehouse/", "parquet")   # Parquet files for data warehouse

Configuration

Customize processing behavior with TransmogConfig:

# Default configuration
result = tm.flatten(data)

# Include nulls for CSV export (consistent columns)
result = tm.flatten(data, config=tm.TransmogConfig(include_nulls=True))

# Memory-efficient processing (smaller batches)
result = tm.flatten(data, config=tm.TransmogConfig(batch_size=100))

# High-performance processing (larger batches)
result = tm.flatten(data, config=tm.TransmogConfig(batch_size=10000))

File Processing:

result = tm.flatten("data.json")

Advanced Configuration

For more control over the flattening process:

# Create custom configuration
config = tm.TransmogConfig(
    # Array handling
    array_mode=tm.ArrayMode.SEPARATE,  # Extract all arrays to child tables
    # Options: SMART (default), SEPARATE, INLINE, SKIP

    # ID management
    id_generation="natural",           # Use existing ID field (options: random, natural, hash, or list)
    id_field="sku",                    # Name of ID field to use/create
    parent_field="_parent",            # Customize parent reference field name
    time_field="_timestamp",           # Add processing timestamp to records


    # Data processing
    include_nulls=False,               # Skip null and empty values (default: False)
    max_depth=100,                     # Maximum nesting depth

    # Performance tuning
    batch_size=5000,                   # Process more records per batch
)

result = tm.flatten(data, name="products", config=config)

# ID generation options
config = tm.TransmogConfig(id_generation="random")              # Always generate new UUIDs (default)
config = tm.TransmogConfig(id_generation="natural")             # Use existing ID field (fail if missing)
config = tm.TransmogConfig(id_generation="hash")                # Hash entire record (deterministic)
config = tm.TransmogConfig(id_generation=["user_id", "date"])   # Composite key (deterministic)

# Customize configuration as needed
config = tm.TransmogConfig(include_nulls=True)  # For consistent CSV columns
config.id_field = "product_id"
result = tm.flatten(data, config=config)

Documentation

Complete documentation is available at scottdraper8.github.io/transmog, including:

Contributing

For contribution guidelines, development setup, and coding standards, see the Contributing Guide in the documentation.

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

transmog-2.0.0.tar.gz (23.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

transmog-2.0.0-py3-none-any.whl (28.2 kB view details)

Uploaded Python 3

File details

Details for the file transmog-2.0.0.tar.gz.

File metadata

  • Download URL: transmog-2.0.0.tar.gz
  • Upload date:
  • Size: 23.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for transmog-2.0.0.tar.gz
Algorithm Hash digest
SHA256 30619e1fa5c997818061eccb0d31d2889e70162e03517f149392f99779992000
MD5 e08725f02cfeea040e592598923b1ce5
BLAKE2b-256 dddc296c3cd6cd1b4afd880c66dd5384684c8b7ad84bd6cabc1d20307f183874

See more details on using hashes here.

File details

Details for the file transmog-2.0.0-py3-none-any.whl.

File metadata

  • Download URL: transmog-2.0.0-py3-none-any.whl
  • Upload date:
  • Size: 28.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for transmog-2.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ed02fee87e54f500a4bc0eff91905987918fb44bf1b2ac61d83954bd6b099db9
MD5 c6b51948bbb802f059e30bf444af5d89
BLAKE2b-256 a71bf52bbec29213860a559fde21632d429e3c6ce67d2949527be255a4d5b459

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page