An ETL/ELT transformation library for converting nested JSON structures into flat, tabular formats

These details have not been verified by PyPI

Project links

Project description

Transmogrify

A Python library for transforming complex nested JSON data into flat, structured formats.

Features

Flatten deeply nested JSON/dict structures with customizable delimiter options
Transform values during processing with custom functions
Native Formats: output to PyArrow Tables, Python dictionaries, or JSON objects
Bytes Output: serialize directly to Parquet, CSV, or JSON bytes
File Export: write to various file formats (JSON, CSV, Parquet)
Recover from errors in malformed data with customizable strategies
Optimize for performance with optional dependencies
Stream large datasets efficiently
Deterministic ID generation for data consistency across processing runs

Installation

pip install transmogrify

For minimal installation without optional dependencies:

pip install transmogrify[minimal]

For development installation:

pip install transmogrify[dev]

See the installation guide for more details.

Quick Example

import transmogrify as tm

# Sample nested data
data = {
    "user": {
        "id": 1,
        "name": "John Doe",
        "contact": {
            "email": "john@example.com"
        },
        "orders": [
            {"id": 101, "amount": 99.99},
            {"id": 102, "amount": 45.50}
        ]
    }
}

# Process the data
processor = tm.Processor()
result = processor.process(data)

# Native data structure output
tables = result.to_dict()                # Get all tables as Python dictionaries
pa_tables = result.to_pyarrow_tables()   # Get as PyArrow Tables

# Access the data in memory
main_table = tables["main"]              # Main table as Python dict
orders = tables["user_orders"]           # Child table as Python dict

# Bytes output for direct writing
json_bytes = result.to_json_bytes(indent=2)  # Get all tables as JSON bytes
csv_bytes = result.to_csv_bytes()        # Get all tables as CSV bytes
parquet_bytes = result.to_parquet_bytes()    # Get all tables as Parquet bytes

# Direct write to files
with open("main_table.json", "wb") as f:
    f.write(json_bytes["main"])

# Or use PyArrow tables directly
pa_table = pa_tables["main"]       # Work with PyArrow Table directly
print(f"Table has {pa_table.num_rows} rows and {pa_table.num_columns} columns")

# File output (still supported)
result.write_all_json("output_dir/json")
result.write_all_csv("output_dir/csv")
result.write_all_parquet("output_dir/parquet")

Deterministic ID Generation

Transmogrify can now ensure consistent IDs for records across multiple processing runs:

# Configure deterministic IDs based on specific fields
processor = tm.Processor(
    deterministic_id_fields={
        "": "id",                     # Root level uses "id" field
        "user_orders": "id"           # Order records use "id" field
    }
)

# Process the data - IDs will be consistent across runs
result = processor.process(data)

# For complex ID generation logic, use a custom function
def custom_id_generator(record):
    # Generate custom ID based on record contents
    if "id" in record:
        return f"CUSTOM-{record['id']}"
    return str(uuid.uuid4())  # Fallback

processor = tm.Processor(id_generation_strategy=custom_id_generator)

See the deterministic IDs guide for more information.

Output Format Options

Transmogrify provides three main categories of output formats:

Native Data Structures - Python objects like dictionaries and PyArrow Tables

result.to_dict()              # Python dictionaries
result.to_json_objects()      # JSON-serializable Python objects
result.to_pyarrow_tables()    # PyArrow Tables

Bytes Serialization - Raw bytes in JSON, CSV, or Parquet format

result.to_json_bytes()        # JSON bytes
result.to_csv_bytes()         # CSV bytes
result.to_parquet_bytes()     # Parquet bytes

File Output - Direct writing to files in different formats

result.write_all_json()       # Write to JSON files
result.write_all_csv()        # Write to CSV files
result.write_all_parquet()    # Write to Parquet files

Documentation

Use Cases

Data ETL pipelines
API response processing
JSON/CSV conversion
Preparing nested data for tabular analysis
Data normalization and standardization
Integration with data processing frameworks
In-memory data transformation
Cloud-based serverless processing
Incremental data processing with consistent IDs

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Fork the repository
Create your feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add some amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

Please make sure to update tests as appropriate.

License

Distributed under the MIT License. See LICENSE for more information.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

2.0.4

Mar 5, 2026

2.0.3

Feb 14, 2026

2.0.2

Feb 2, 2026

2.0.1

Nov 19, 2025

2.0.0

Nov 12, 2025

1.2.0 yanked

Apr 25, 2025

Reason this release was yanked:

Misprint of version

1.1.1

Nov 6, 2025

1.1.0

Jul 1, 2025

1.0.6

Jun 3, 2025

1.0.5

Jun 2, 2025

1.0.4

May 27, 2025

1.0.3

May 23, 2025

1.0.2

May 22, 2025

1.0.1

May 19, 2025

1.0.0

May 16, 2025

0.1.2.5

Apr 25, 2025

0.1.2

Apr 25, 2025

0.1.1

Apr 25, 2025

This version

0.1.0

Apr 24, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

transmog-0.1.0.tar.gz (58.5 kB view details)

Uploaded Apr 24, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

transmog-0.1.0-py3-none-any.whl (69.6 kB view details)

Uploaded Apr 24, 2025 Python 3

File details

Details for the file transmog-0.1.0.tar.gz.

File metadata

Download URL: transmog-0.1.0.tar.gz
Upload date: Apr 24, 2025
Size: 58.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for transmog-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`6625ea0d960892b6fc024cf65400253bbfb87c1610f127caf6814ed0e1690e9a`
MD5	`7deaf43cb5c90a8e891b1ab87f783c71`
BLAKE2b-256	`c384b04119fbcbb2682ad6238a0894a71338bcae106fd47d5650e25609bf81c2`

See more details on using hashes here.

File details

Details for the file transmog-0.1.0-py3-none-any.whl.

File metadata

Download URL: transmog-0.1.0-py3-none-any.whl
Upload date: Apr 24, 2025
Size: 69.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for transmog-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3df8d8432292ca0acbc7a797b564588fb600cc181cfc2ded1e81332589974a58`
MD5	`c00100291dff1a6939263a88cf6c3554`
BLAKE2b-256	`0a285bd970c6d6decb2e45e777ce0e8ef485fad45d8168a1388af8865d9bca1e`

See more details on using hashes here.

transmog 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Transmogrify

Features

Installation

Quick Example

Deterministic ID Generation

Output Format Options

Documentation

Use Cases

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes