Skip to main content

A data transformation library for flattening complex nested structures into tabular formats while preserving hierarchical relationships

Project description

Transmog - Flatten Nested JSON to Tabular Formats

Transmog Version Python 3.10+ Poetry pre-commit License: MIT


A configurable data flattening tool that transforms nested JSON data into flat, tabular formats while preserving parent-child relationships.


Installation

# Full install (CSV, Parquet, ORC, Avro output)
pip install transmog

# CSV only (no pyarrow, fastavro, or cramjam)
pip install transmog[minimal]

Quick Start

import transmog as tm

data = {"user": "Alice", "orders": [{"id": 101}, {"id": 102}]}
result = tm.flatten(data, name="users")

result.main                    # Main table
result.tables["users_orders"]  # Child tables
result.save("output.csv")      # Save to file

In-Memory vs Streaming

  1. flatten(data, name, config) — Flatten data in memory

    result = tm.flatten("data.json", name="products")
    result = tm.flatten([{"id": 1}, {"id": 2}])
    result.save("output.parquet")
    
  2. flatten_stream(data, output_path, name, output_format) — Stream directly to disk

    tm.flatten_stream("large.jsonl", "output/", name="events", output_format="parquet")
    

Configuration

config = tm.TransmogConfig(
    # Array handling
    array_mode=tm.ArrayMode.SMART,   # SMART (default), SEPARATE, INLINE, SKIP

    # ID generation and metadata fields
    id_generation="random",          # random (default), natural, hash, or ["field1", "field2"]
    id_field="_id",                  # Field name for record IDs
    parent_field="_parent_id",       # Field name for parent references
    time_field="_timestamp",         # Field name for timestamps (None to disable)

    # Data transformation
    include_nulls=False,             # Include null/empty values in output
    stringify_values=False,          # Convert all leaf values to strings

    # Processing controls
    max_depth=100,                   # Maximum recursion depth
    batch_size=1000                  # Records per batch for streaming
)

result = tm.flatten(data, config=config)

Array Modes

Mode Behavior
SMART Preserve simple arrays, extract complex arrays to child tables
SEPARATE Extract all arrays to child tables
INLINE Serialize arrays as JSON strings
SKIP Omit arrays from output

ID Generation

Strategy Description
random Generate random UUID (default)
natural Use existing ID field from data
hash Deterministic hash of entire record
["field1", ...] Deterministic hash of specified fields

Documentation

Full documentation: scottdraper8.github.io/transmog

License

MIT License - see LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

transmog-2.0.4.tar.gz (30.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

transmog-2.0.4-py3-none-any.whl (37.8 kB view details)

Uploaded Python 3

File details

Details for the file transmog-2.0.4.tar.gz.

File metadata

  • Download URL: transmog-2.0.4.tar.gz
  • Upload date:
  • Size: 30.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for transmog-2.0.4.tar.gz
Algorithm Hash digest
SHA256 9f077e94f4b1358d4297170dcca14e453a87dc23a1a87d7609d7ca443221b688
MD5 ee161a5b9bb4dd38482b299142d2575d
BLAKE2b-256 e08a87c45bdc3e2e87775f5738dceab8e942a8a2e5ab2c66a3cd42915a826eff

See more details on using hashes here.

File details

Details for the file transmog-2.0.4-py3-none-any.whl.

File metadata

  • Download URL: transmog-2.0.4-py3-none-any.whl
  • Upload date:
  • Size: 37.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for transmog-2.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 793821e92b9fcc018cd6a3e41375d40a2d8c2df08bcff497796bcb3642aaabce
MD5 ea9d5a6a95477c41946bd4c0cf9e4898
BLAKE2b-256 aaba51d2265be353fcbaa5acccfdc4c3b4bf9d987770d36de717f3bdb34d2a9d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page