Skip to main content

Stop writing custom parsers for every data format. Flatten anything.

Project description

Flatten Anything 🔨

Stop writing custom parsers for every data format. Flatten anything.

PyPI Python 3.10+ License: MIT

The Problem

Every data pipeline starts the same way: "I have this nested JSON file, and I need to flatten it." Then next week: "Now it's XML." Then: "The client sent Excel files." Before you know it, you have 200 lines of custom parsing code for each format.

The Solution

from flatten_anything import flatten, ingest

# That's it. That's the whole library.
data = ingest('your_nightmare_file.json')
flat = flatten(data)

It just works. No matter what format. No matter how nested.

What's New in v1.1

🚀 Streaming Support

Process files larger than memory without breaking a sweat:

# Stream a 10GB CSV file
for chunk in ingest('huge_file.csv', stream=True):
    flat = flatten(chunk)
    # Process each chunk without loading entire file

🎯 Smarter Flattening

New records parameter intelligently handles multiple records:

# Automatically flattens each record separately (new default!)
data = ingest('users.csv')
flat = flatten(data)  # Returns list of flattened records

# Or treat as single structure when needed
flat = flatten(data, records=False)  # Flattens entire structure

Installation

Basic Installation

# Core installation (JSON, CSV, YAML, XML, API support)
pip install flatten-anything

With Optional Format Support

# Add Parquet support
pip install flatten-anything[parquet]

# Add Excel support
pip install flatten-anything[excel]

# Install everything
pip install flatten-anything[all]

Format Support Matrix

Format Core Install Optional Install Streaming
JSON/JSONL ✅ Included - ✅ JSONL only
CSV/TSV ✅ Included - ✅ Yes
YAML ✅ Included - ❌ No
XML ✅ Included - ❌ No
API/URLs ✅ Included - ❌ No
Parquet pip install flatten-anything[parquet] ✅ Yes
Excel pip install flatten-anything[excel] ❌ No

Quick Start

Basic Usage

from flatten_anything import flatten, ingest

# Load any supported file format
data = ingest('data.json')

# Flatten it (automatically handles single vs multiple records)
flat = flatten(data)

Streaming Large Files

# Process huge files in chunks
for chunk in ingest('massive.csv', stream=True, chunk_size=10000):
    flat_records = flatten(chunk)
    # Process chunk (e.g., write to database, analyze, etc.)
    process_records(flat_records)

Real-world Example

# Your horrible nested JSON
data = {
    "user": {
        "name": "John",
        "contacts": {
            "emails": ["john@example.com", "john@work.com"],
            "phones": {
                "home": "555-1234",
                "work": "555-5678"
            }
        }
    },
    "metrics": [1, 2, 3]
}

flat = flatten(data)
# {
#     'user.name': 'John',
#     'user.contacts.emails.0': 'john@example.com',
#     'user.contacts.emails.1': 'john@work.com',
#     'user.contacts.phones.home': '555-1234',
#     'user.contacts.phones.work': '555-5678',
#     'metrics.0': 1,
#     'metrics.1': 2,
#     'metrics.2': 3
# }

Multiple Records Handling

# CSV data with multiple records
users = [
    {"name": "Alice", "age": 30, "city": "NYC"},
    {"name": "Bob", "age": 25, "city": "LA"}
]

# Default: flatten each record (records=True)
flat = flatten(users)
# [
#     {"name": "Alice", "age": 30, "city": "NYC"},
#     {"name": "Bob", "age": 25, "city": "LA"}
# ]

# Flatten as single structure (records=False)
flat = flatten(users, records=False)
# {
#     "0.name": "Alice", "0.age": 30, "0.city": "NYC",
#     "1.name": "Bob", "1.age": 25, "1.city": "LA"
# }

Advanced Usage

Integrate with pandas

import pandas as pd

# Method 1: Load entire file
data = ingest('data.csv')
flat = flatten(data)
df = pd.DataFrame(flat)

# Method 2: Stream large files
dfs = []
for chunk in ingest('huge.csv', stream=True, chunk_size=5000):
    flat_chunk = flatten(chunk)
    dfs.append(pd.DataFrame(flat_chunk))
final_df = pd.concat(dfs, ignore_index=True)

Control Empty Lists

data = {"items": [], "count": 0}

# Preserve empty lists (default)
flatten(data, preserve_empty_lists=True)
# {"items": [], "count": 0}

# Remove empty lists
flatten(data, preserve_empty_lists=False)
# {"count": 0}

Memory-Efficient Pipeline

from pathlib import Path

# Process directory of large files without memory issues
for filepath in Path('data/').glob('*.csv'):
    for chunk in ingest(filepath, stream=True):
        flat = flatten(chunk)
        # Process and immediately discard to save memory
        send_to_database(flat)

API Reference

ingest()

ingest(source, format=None, stream=False, chunk_size=5000, **kwargs)
  • source: File path or URL to ingest
  • format: Optional format override. Auto-detected if not specified
  • stream: Enable streaming for large files (supported formats only)
  • chunk_size: Records per chunk when streaming
  • Returns: List of records or generator if streaming

flatten()

flatten(data, prefix="", preserve_empty_lists=True, records=True)
  • data: Data structure to flatten
  • prefix: Key prefix (used internally for recursion)
  • preserve_empty_lists: Keep or remove empty lists
  • records: Treat list as multiple records (True) or single structure (False)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

flatten_anything-1.1.1.tar.gz (26.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

flatten_anything-1.1.1-py3-none-any.whl (33.4 kB view details)

Uploaded Python 3

File details

Details for the file flatten_anything-1.1.1.tar.gz.

File metadata

  • Download URL: flatten_anything-1.1.1.tar.gz
  • Upload date:
  • Size: 26.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.13

File hashes

Hashes for flatten_anything-1.1.1.tar.gz
Algorithm Hash digest
SHA256 333f4fb68cf0d9ff157269ecfb1b2e1db47a143fe97d485d4194ce1427cdbd14
MD5 e044660e92f52a9537b28a1b093a7491
BLAKE2b-256 15ff5f01a71369fbc7a6beff097833886215a7aa5906554b91da867970d652f6

See more details on using hashes here.

File details

Details for the file flatten_anything-1.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for flatten_anything-1.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 07cda183ab0859a9ba51474cc7ecb96372275b83cfb6095b695bfaf93bc167c0
MD5 52f0d4a16e21c12caca21f1aaec1ea91
BLAKE2b-256 5ea447299e5bb1b884b5a162bc47eefbe8192d3a5522f2cf8927720addea8a96

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page