Skip to main content

A data transformation library for flattening complex nested structures into tabular formats while preserving hierarchical relationships

Project description

Transmog - Flatten Nested JSON to Tabular Formats

Transmog Version Python 3.10+ Poetry pre-commit License: MIT


A configurable data flattening tool that transforms nested JSON data into flat, tabular formats while preserving parent-child relationships.


Installation

# Full install (CSV, Parquet, ORC, Avro output)
pip install transmog

# CSV only (no pyarrow, fastavro, or cramjam)
pip install transmog[minimal]

Quick Start

import transmog as tm

data = {"user": "Alice", "orders": [{"id": 101}, {"id": 102}]}
result = tm.flatten(data, name="users")

result.main                    # Main table
result.tables["users_orders"]  # Child tables
result.save("output.csv")      # Save to file

In-Memory vs Streaming

  1. flatten(data, name, config) — Flatten data in memory

    result = tm.flatten("data.json", name="products")
    result = tm.flatten([{"id": 1}, {"id": 2}])
    result.save("output.parquet")
    
  2. flatten_stream(data, output_path, name, output_format) — Stream directly to disk

    tm.flatten_stream("large.jsonl", "output/", name="events", output_format="parquet")
    

Configuration

config = tm.TransmogConfig(
    # Array handling
    array_mode=tm.ArrayMode.SMART,   # SMART (default), SEPARATE, INLINE, SKIP

    # ID generation and metadata fields
    id_generation="random",          # random (default), natural, hash, or ["field1", "field2"]
    id_field="_id",                  # Field name for record IDs
    parent_field="_parent_id",       # Field name for parent references
    time_field="_timestamp",         # Field name for timestamps (None to disable)

    # Data transformation
    include_nulls=False,             # Include null/empty values in output
    stringify_values=False,          # Convert all leaf values to strings

    # Processing controls
    max_depth=100,                   # Maximum recursion depth
    batch_size=1000                  # Records per batch for streaming
)

result = tm.flatten(data, config=config)

Array Modes

Mode Behavior
SMART Preserve simple arrays, extract complex arrays to child tables
SEPARATE Extract all arrays to child tables
INLINE Serialize arrays as JSON strings
SKIP Omit arrays from output

ID Generation

Strategy Description
random Generate random UUID (default)
natural Use existing ID field from data
hash Deterministic hash of entire record
["field1", ...] Deterministic hash of specified fields

Documentation

Full documentation: scottdraper8.github.io/transmog

License

MIT License - see LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

transmog-2.0.2.tar.gz (27.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

transmog-2.0.2-py3-none-any.whl (34.7 kB view details)

Uploaded Python 3

File details

Details for the file transmog-2.0.2.tar.gz.

File metadata

  • Download URL: transmog-2.0.2.tar.gz
  • Upload date:
  • Size: 27.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for transmog-2.0.2.tar.gz
Algorithm Hash digest
SHA256 fc144dcc6fcb41e33edbc75599cc9c4cbaeb7561fa2e1b1e317bd22607718a52
MD5 3da4fde42a66d4b5a7621149fb9c38df
BLAKE2b-256 6dd518560cd32cc0479033ef211514ec3b085843e50bac86291363c988c11cb0

See more details on using hashes here.

File details

Details for the file transmog-2.0.2-py3-none-any.whl.

File metadata

  • Download URL: transmog-2.0.2-py3-none-any.whl
  • Upload date:
  • Size: 34.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for transmog-2.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 31f6c82515ca8672f158a2f0575afa0b3109e0542ea993e0cf3f9ae396d1021e
MD5 4cf4dd38398e6696520ce00134cf8926
BLAKE2b-256 2e90360cc72d6d7189958471bb2369ca7d7f1f2f9a344928ae56f28f1f34f936

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page