Skip to main content

A data transformation library for flattening complex nested structures into tabular formats while preserving hierarchical relationships

Project description

Transmog - Flatten Nested JSON to Tabular Formats

Transmog Version Python 3.10+ Poetry pre-commit License: MIT


A configurable data flattening tool that transforms nested JSON data into flat, tabular formats while preserving parent-child relationships.


Installation

# Full install (CSV, Parquet, ORC, Avro output)
pip install transmog

# CSV only (no pyarrow, fastavro, or cramjam)
pip install transmog[minimal]

Quick Start

import transmog as tm

data = {"user": "Alice", "orders": [{"id": 101}, {"id": 102}]}
result = tm.flatten(data, name="users")

result.main                    # Main table
result.tables["users_orders"]  # Child tables
result.save("output.csv")      # Save to file

In-Memory vs Streaming

  1. flatten(data, name, config) — Flatten data in memory

    result = tm.flatten("data.json", name="products")
    result = tm.flatten([{"id": 1}, {"id": 2}])
    result.save("output.parquet")
    
  2. flatten_stream(data, output_path, name, output_format) — Stream directly to disk

    tm.flatten_stream("large.jsonl", "output/", name="events", output_format="parquet")
    

Configuration

config = tm.TransmogConfig(
    # Array handling
    array_mode=tm.ArrayMode.SMART,   # SMART (default), SEPARATE, INLINE, SKIP

    # ID generation and metadata fields
    id_generation="random",          # random (default), natural, hash, or ["field1", "field2"]
    id_field="_id",                  # Field name for record IDs
    parent_field="_parent_id",       # Field name for parent references
    time_field="_timestamp",         # Field name for timestamps (None to disable)

    # Data transformation
    include_nulls=False,             # Include null/empty values in output
    stringify_values=False,          # Convert all leaf values to strings

    # Processing controls
    max_depth=100,                   # Maximum recursion depth
    batch_size=1000                  # Records per batch for streaming
)

result = tm.flatten(data, config=config)

Array Modes

Mode Behavior
SMART Preserve simple arrays, extract complex arrays to child tables
SEPARATE Extract all arrays to child tables
INLINE Serialize arrays as JSON strings
SKIP Omit arrays from output

ID Generation

Strategy Description
random Generate random UUID (default)
natural Use existing ID field from data
hash Deterministic hash of entire record
["field1", ...] Deterministic hash of specified fields

Documentation

Full documentation: scottdraper8.github.io/transmog

License

MIT License - see LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

transmog-2.0.3.tar.gz (30.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

transmog-2.0.3-py3-none-any.whl (37.4 kB view details)

Uploaded Python 3

File details

Details for the file transmog-2.0.3.tar.gz.

File metadata

  • Download URL: transmog-2.0.3.tar.gz
  • Upload date:
  • Size: 30.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for transmog-2.0.3.tar.gz
Algorithm Hash digest
SHA256 d5d668e5a1d0fc73db0f671f1eeb21b18a74c6b106177c77b1bef73ba29d8c7b
MD5 f01a157515fda9d93ee6a4c8f0cde63a
BLAKE2b-256 1755612680c850e96e5c1c6d18036438bdc0605d3d6d9545cbcc39c6f664db8e

See more details on using hashes here.

File details

Details for the file transmog-2.0.3-py3-none-any.whl.

File metadata

  • Download URL: transmog-2.0.3-py3-none-any.whl
  • Upload date:
  • Size: 37.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for transmog-2.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 fd43d2adaa2c004a20b6aff8d9c4631f196ae05ac2cad3d2eadb64115f8da990
MD5 5964eeb50a03a5e4c08048a88d97f17d
BLAKE2b-256 a3fa3dac13f6bd75d116d49399c7612b9e9f3c5ec820a1c51d633aaa8995063f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page