A data transformation library for flattening complex nested structures into tabular formats while preserving hierarchical relationships
Project description
Transmog - Flatten Nested JSON to Tabular Formats
A configurable data flattening tool that transforms nested JSON data into flat, tabular formats while preserving parent-child relationships.
Installation
# Full install (CSV, Parquet, ORC, Avro output)
pip install transmog
# CSV only (no pyarrow, fastavro, or cramjam)
pip install transmog[minimal]
Quick Start
import transmog as tm
data = {"user": "Alice", "orders": [{"id": 101}, {"id": 102}]}
result = tm.flatten(data, name="users")
result.main # Main table
result.tables["users_orders"] # Child tables
result.save("output.csv") # Save to file
In-Memory vs Streaming
-
flatten(data, name, config) — Flatten data in memory
result = tm.flatten("data.json", name="products") result = tm.flatten([{"id": 1}, {"id": 2}]) result.save("output.parquet")
-
flatten_stream(data, output_path, name, output_format) — Stream directly to disk
tm.flatten_stream("large.jsonl", "output/", name="events", output_format="parquet")
Configuration
config = tm.TransmogConfig(
# Array handling
array_mode=tm.ArrayMode.SMART, # SMART (default), SEPARATE, INLINE, SKIP
# ID generation and metadata fields
id_generation="random", # random (default), natural, hash, or ["field1", "field2"]
id_field="_id", # Field name for record IDs
parent_field="_parent_id", # Field name for parent references
time_field="_timestamp", # Field name for timestamps (None to disable)
# Data transformation
include_nulls=False, # Include null/empty values in output
stringify_values=False, # Convert all leaf values to strings
# Processing controls
max_depth=100, # Maximum recursion depth
batch_size=1000 # Records per batch for streaming
)
result = tm.flatten(data, config=config)
Array Modes
| Mode | Behavior |
|---|---|
SMART |
Preserve simple arrays, extract complex arrays to child tables |
SEPARATE |
Extract all arrays to child tables |
INLINE |
Serialize arrays as JSON strings |
SKIP |
Omit arrays from output |
ID Generation
| Strategy | Description |
|---|---|
random |
Generate random UUID (default) |
natural |
Use existing ID field from data |
hash |
Deterministic hash of entire record |
["field1", ...] |
Deterministic hash of specified fields |
Documentation
Full documentation: scottdraper8.github.io/transmog
- Getting Started
- Configuration
- Array Handling
- ID Management
- Error Handling
- Output Formats
- Streaming
- API Reference
- Contributing
License
MIT License - see LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file transmog-2.0.4.tar.gz.
File metadata
- Download URL: transmog-2.0.4.tar.gz
- Upload date:
- Size: 30.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.19
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9f077e94f4b1358d4297170dcca14e453a87dc23a1a87d7609d7ca443221b688
|
|
| MD5 |
ee161a5b9bb4dd38482b299142d2575d
|
|
| BLAKE2b-256 |
e08a87c45bdc3e2e87775f5738dceab8e942a8a2e5ab2c66a3cd42915a826eff
|
File details
Details for the file transmog-2.0.4-py3-none-any.whl.
File metadata
- Download URL: transmog-2.0.4-py3-none-any.whl
- Upload date:
- Size: 37.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.19
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
793821e92b9fcc018cd6a3e41375d40a2d8c2df08bcff497796bcb3642aaabce
|
|
| MD5 |
ea9d5a6a95477c41946bd4c0cf9e4898
|
|
| BLAKE2b-256 |
aaba51d2265be353fcbaa5acccfdc4c3b4bf9d987770d36de717f3bdb34d2a9d
|