Skip to main content

A data transformation library for flattening complex nested structures into tabular formats while preserving hierarchical relationships

Project description

Transmog

PyPI version Python versions License

Transform nested data into flat tables with a simple, intuitive API.

Overview

Transmog transforms nested JSON data into flat, tabular formats while preserving relationships between parent and child records.

Key Features:

  • Simple one-function API with smart defaults
  • Multiple output formats (JSON, CSV, Parquet)
  • Automatic relationship preservation
  • Memory-efficient streaming for large datasets

Quick Start

pip install transmog
import transmog as tm

# Transform nested data into flat tables
data = {"product_id": "PROD-123", "name": "Gaming Laptop", "specs": {"cpu": "i7", "ram": "16GB"}}
result = tm.flatten(data, name="products")

# Access flattened data in memory (list of dicts)
print(result.main)
# [{'product_id': 'PROD-123', 'name': 'Gaming Laptop', 'specs_cpu': 'i7', 'specs_ram': '16GB'}]

# Save to files in different formats
result.save("products.csv")        # Single CSV file
result.save("products.parquet")    # Single Parquet file
result.save("products.json")       # Single JSON file (only main table)

Example: Nested JSON to Multiple Tables

Transform complex nested data with arrays intelligently using smart mode (default):

data = {
    "user": {"name": "Alice", "email": "alice@example.com"},
    "tags": ["premium", "verified"],  # Simple array - kept as native array
    "orders": [  # Complex array - exploded to child table
        {"id": 101, "amount": 99.99, "items": ["laptop", "mouse"]},
        {"id": 102, "amount": 45.50, "items": ["keyboard"]}
    ]
}

result = tm.flatten(data, name="customer")

# Main table - flattened user data with native arrays
print(result.main)
# [
#   {
#     'user_name': 'Alice',
#     'user_email': 'alice@example.com',
#     'tags': ['premium', 'verified'],  # Native array!
#     '_id': 'a1b2c3d4-e5f6-4789-abc1-23456789def0'
#   }
# ]

# Complex arrays become separate tables with parent references
print(result.tables["customer_orders"])
# [
#   {'id': '101', 'amount': '99.99', 'items': ['laptop', 'mouse'], '_parent_id': 'a1b2c3d4...', '_id': 'b2c3d4...'},
#   {'id': '102', 'amount': '45.50', 'items': ['keyboard'], '_parent_id': 'a1b2c3d4...', '_id': 'c3d4...'}
# ]

# Access all tables in memory
print(f"Created {len(result.all_tables)} tables:")
print(list(result.all_tables.keys()))
# ['customer', 'customer_orders', 'customer_orders_items']

# Save to different formats for analysis
result.save("analytics/", "csv")       # CSV files for database import
result.save("warehouse/", "parquet")   # Parquet files for data warehouse
result.save("api/", "json")           # JSON files for web applications

Key Options:

  • Custom field separators: separator="."
  • Use existing IDs: id_field="customer_id"
  • Error handling: errors="skip"
  • File processing: tm.flatten_file("data.json")

Advanced Options

For more control over the flattening process:

result = tm.flatten(
    data,
    name="products",
    # Naming options
    separator=".",              # Use dots: user.name instead of user_name
    nested_threshold=3,         # Simplify deeply nested field names
    # ID management
    id_field="sku",            # Use existing field as primary ID
    parent_id_field="_parent",  # Customize parent reference field name
    add_timestamp=True,         # Add processing timestamp to records
    # Array handling (default is "smart")
    arrays="separate",         # Extract all arrays to child tables (vs "smart", "inline", "skip")
    # Data processing
    preserve_types=True,       # Keep original data types (not strings)
    skip_null=False,           # Include null values in output
    skip_empty=False,          # Include empty strings/lists
    # Performance tuning
    batch_size=5000,           # Process more records per batch
    low_memory=True,           # Optimize for memory usage over speed
)

Documentation

Complete documentation is available at scottdraper8.github.io/transmog, including:

Contributing

For contribution guidelines, development setup, and coding standards, see the Contributing Guide in the documentation.

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

transmog-1.1.1.tar.gz (105.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

transmog-1.1.1-py3-none-any.whl (141.4 kB view details)

Uploaded Python 3

File details

Details for the file transmog-1.1.1.tar.gz.

File metadata

  • Download URL: transmog-1.1.1.tar.gz
  • Upload date:
  • Size: 105.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.24

File hashes

Hashes for transmog-1.1.1.tar.gz
Algorithm Hash digest
SHA256 6db3ff81b2d7b26cad91b61077b6190f2c834625380461e0692fe1aef84e5da2
MD5 063f1f9c0826a38597dfd25515635e5a
BLAKE2b-256 f7beccde4afa9bd6a6345db5de8c113fb32ee58432bb251961adf1c2fb79268f

See more details on using hashes here.

File details

Details for the file transmog-1.1.1-py3-none-any.whl.

File metadata

  • Download URL: transmog-1.1.1-py3-none-any.whl
  • Upload date:
  • Size: 141.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.24

File hashes

Hashes for transmog-1.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 a63cd6e8febd0d96874627631ae872ba09847df149458139c4cfc6a45a4e1373
MD5 b28ce261f4152a9d6386ed25fb15e946
BLAKE2b-256 39c37c6abd7a99614386510404b548f740ed10379ee909b282cc8b45aa87c98c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page