Skip to main content

A data transformation library for flattening complex nested structures into tabular formats while preserving hierarchical relationships

Project description

Transmog

PyPI version Python versions License

Flatten nested JSON data into tabular formats while preserving parent-child relationships.

Installation

# Standard install (includes Parquet and ORC support)
pip install transmog

# Minimal install (CSV output only)
pip install transmog[minimal]

Quick Start

import transmog as tm

data = {"user": "Alice", "orders": [{"id": 101}, {"id": 102}]}
result = tm.flatten(data, name="users")

result.main                    # Main table
result.tables["users_orders"]  # Child tables
result.save("output.csv")      # Save to file

How it works: Nested JSON is flattened into related tables with foreign key relationships:

%%{init: {'theme': 'dark', 'themeVariables': {
    'primaryColor': '#ff79c6',
    'secondaryColor': '#bd93f9',
    'tertiaryColor': '#44475a',
    'mainBkg': '#282a36',
    'nodeBorder': '#ff79c6',
    'clusterBkg': '#44475a',
    'clusterBorder': '#bd93f9',
    'textColor': '#f8f8f2'
}}}%%
flowchart LR
    subgraph Input["INPUT"]
        JSON["user: Alice
        orders: [
          • id: 101
          • id: 102
        ]"]
    end

    Input --> |flatten| ERD

    subgraph ERD["OUTPUT"]
        direction LR

        users["users
        ━━━━━━━━━━━━━━
        _id PK
        user
        _timestamp"]

        users_orders["users_orders
        ━━━━━━━━━━━━━━━━
        _id PK
        _parent_id FK
        id
        _timestamp"]

        users -->|1:N| users_orders
    end

    style Input fill:#44475a,stroke:#ff79c6,stroke-width:3px
    style ERD fill:#44475a,stroke:#bd93f9,stroke-width:3px
    style JSON fill:#282a36,stroke:#ff79c6,stroke-width:2px,color:#f8f8f2
    style users fill:#282a36,stroke:#50fa7b,stroke-width:2px,color:#f8f8f2
    style users_orders fill:#282a36,stroke:#8be9fd,stroke-width:2px,color:#f8f8f2

Features

  • Flatten nested JSON to CSV, Parquet, or ORC
  • Smart array handling preserves simple arrays, extracts complex arrays to child tables
  • Read JSON, JSON Lines, JSON5, HJSON files
  • Stream processing for large datasets
  • Configurable ID generation strategies

API

flatten(data, name, config) — Flatten data in memory

result = tm.flatten("data.json", name="products")
result = tm.flatten([{"id": 1}, {"id": 2}])
result.save("output.parquet")

flatten_stream(data, output_path, name, output_format) — Stream directly to disk

tm.flatten_stream("large.jsonl", "output/", name="events", output_format="parquet")

Configuration

config = tm.TransmogConfig(
    array_mode=tm.ArrayMode.SMART,   # SMART, SEPARATE, INLINE, SKIP
    id_generation="random",          # random, natural, hash, or ["field1", "field2"]
    id_field="_id",
    parent_field="_parent_id",
    time_field="_timestamp",
    include_nulls=False,
    max_depth=100,
    batch_size=1000
)

result = tm.flatten(data, config=config)

Array Modes

Mode Behavior
SMART Preserve simple arrays, extract complex arrays to child tables
SEPARATE Extract all arrays to child tables
INLINE Serialize arrays as JSON strings
SKIP Omit arrays from output

ID Generation

Strategy Description
random Generate random UUID (default)
natural Use existing ID field from data
hash Deterministic hash of entire record
["field1", ...] Deterministic hash of specified fields

Documentation

Full documentation: scottdraper8.github.io/transmog

License

MIT License - see LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

transmog-2.0.1.tar.gz (23.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

transmog-2.0.1-py3-none-any.whl (28.1 kB view details)

Uploaded Python 3

File details

Details for the file transmog-2.0.1.tar.gz.

File metadata

  • Download URL: transmog-2.0.1.tar.gz
  • Upload date:
  • Size: 23.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for transmog-2.0.1.tar.gz
Algorithm Hash digest
SHA256 d068d2a96ee07b51cde91b2de19c1f45bc0ae7514b0f3e8ff643b348e33449ee
MD5 620f11f2e0b0c89c8cca609e815b7f7b
BLAKE2b-256 be6e4164852527650ff4c68871a80a635cc5f989e3f42a705c647bb62960b864

See more details on using hashes here.

File details

Details for the file transmog-2.0.1-py3-none-any.whl.

File metadata

  • Download URL: transmog-2.0.1-py3-none-any.whl
  • Upload date:
  • Size: 28.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for transmog-2.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 e2555919717c56d21b7e83fc314bc7fb7bba2277660db58e4d3cc132c657bb9f
MD5 57fbb5cc42284318010c5d674568ac4f
BLAKE2b-256 d5eb5d281cd594278f515f0a34bdc313cf337c4b00f004f3d86199832cb09fbb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page