Stop writing custom parsers for every data format. Flatten anything.
Project description
Flatten Anything 🔨
Stop writing custom parsers for every data format. Flatten anything.
The Problem
Every data pipeline starts the same way: "I have this nested JSON file, and I need to flatten it." Then next week: "Now it's XML." Then: "The client sent Excel files." Before you know it, you have 200 lines of custom parsing code for each format.
The Solution
from flatten_anything import flatten, ingest
# That's it. That's the whole library.
data = ingest('your_nightmare_file.json')
flat = flatten(data)
It just works. No matter what format. No matter how nested.
What's New in v1.1
🚀 Streaming Support
Process files larger than memory without breaking a sweat:
# Stream a 10GB CSV file
for chunk in ingest('huge_file.csv', stream=True):
flat = flatten(chunk)
# Process each chunk without loading entire file
🎯 Smarter Flattening
New records parameter intelligently handles multiple records:
# Automatically flattens each record separately (new default!)
data = ingest('users.csv')
flat = flatten(data) # Returns list of flattened records
# Or treat as single structure when needed
flat = flatten(data, records=False) # Flattens entire structure
Installation
Basic Installation
# Core installation (JSON, CSV, YAML, XML, API support)
pip install flatten-anything
With Optional Format Support
# Add Parquet support
pip install flatten-anything[parquet]
# Add Excel support
pip install flatten-anything[excel]
# Install everything
pip install flatten-anything[all]
Format Support Matrix
| Format | Core Install | Optional Install | Streaming |
|---|---|---|---|
| JSON/JSONL | ✅ Included | - | ✅ JSONL only |
| CSV/TSV | ✅ Included | - | ✅ Yes |
| YAML | ✅ Included | - | ❌ No |
| XML | ✅ Included | - | ❌ No |
| API/URLs | ✅ Included | - | ❌ No |
| Parquet | ❌ | pip install flatten-anything[parquet] |
✅ Yes |
| Excel | ❌ | pip install flatten-anything[excel] |
❌ No |
Quick Start
Basic Usage
from flatten_anything import flatten, ingest
# Load any supported file format
data = ingest('data.json')
# Flatten it (automatically handles single vs multiple records)
flat = flatten(data)
Streaming Large Files
# Process huge files in chunks
for chunk in ingest('massive.csv', stream=True, chunk_size=10000):
flat_records = flatten(chunk)
# Process chunk (e.g., write to database, analyze, etc.)
process_records(flat_records)
Real-world Example
# Your horrible nested JSON
data = {
"user": {
"name": "John",
"contacts": {
"emails": ["john@example.com", "john@work.com"],
"phones": {
"home": "555-1234",
"work": "555-5678"
}
}
},
"metrics": [1, 2, 3]
}
flat = flatten(data)
# {
# 'user.name': 'John',
# 'user.contacts.emails.0': 'john@example.com',
# 'user.contacts.emails.1': 'john@work.com',
# 'user.contacts.phones.home': '555-1234',
# 'user.contacts.phones.work': '555-5678',
# 'metrics.0': 1,
# 'metrics.1': 2,
# 'metrics.2': 3
# }
Multiple Records Handling
# CSV data with multiple records
users = [
{"name": "Alice", "age": 30, "city": "NYC"},
{"name": "Bob", "age": 25, "city": "LA"}
]
# Default: flatten each record (records=True)
flat = flatten(users)
# [
# {"name": "Alice", "age": 30, "city": "NYC"},
# {"name": "Bob", "age": 25, "city": "LA"}
# ]
# Flatten as single structure (records=False)
flat = flatten(users, records=False)
# {
# "0.name": "Alice", "0.age": 30, "0.city": "NYC",
# "1.name": "Bob", "1.age": 25, "1.city": "LA"
# }
Advanced Usage
Integrate with pandas
import pandas as pd
# Method 1: Load entire file
data = ingest('data.csv')
flat = flatten(data)
df = pd.DataFrame(flat)
# Method 2: Stream large files
dfs = []
for chunk in ingest('huge.csv', stream=True, chunk_size=5000):
flat_chunk = flatten(chunk)
dfs.append(pd.DataFrame(flat_chunk))
final_df = pd.concat(dfs, ignore_index=True)
Control Empty Lists
data = {"items": [], "count": 0}
# Preserve empty lists (default)
flatten(data, preserve_empty_lists=True)
# {"items": [], "count": 0}
# Remove empty lists
flatten(data, preserve_empty_lists=False)
# {"count": 0}
Memory-Efficient Pipeline
from pathlib import Path
# Process directory of large files without memory issues
for filepath in Path('data/').glob('*.csv'):
for chunk in ingest(filepath, stream=True):
flat = flatten(chunk)
# Process and immediately discard to save memory
send_to_database(flat)
API Reference
ingest()
ingest(source, format=None, stream=False, chunk_size=5000, **kwargs)
source: File path or URL to ingestformat: Optional format override. Auto-detected if not specifiedstream: Enable streaming for large files (supported formats only)chunk_size: Records per chunk when streaming- Returns: List of records or generator if streaming
flatten()
flatten(data, prefix="", preserve_empty_lists=True, records=True)
data: Data structure to flattenprefix: Key prefix (used internally for recursion)preserve_empty_lists: Keep or remove empty listsrecords: Treat list as multiple records (True) or single structure (False)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file flatten_anything-1.1.1.tar.gz.
File metadata
- Download URL: flatten_anything-1.1.1.tar.gz
- Upload date:
- Size: 26.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
333f4fb68cf0d9ff157269ecfb1b2e1db47a143fe97d485d4194ce1427cdbd14
|
|
| MD5 |
e044660e92f52a9537b28a1b093a7491
|
|
| BLAKE2b-256 |
15ff5f01a71369fbc7a6beff097833886215a7aa5906554b91da867970d652f6
|
File details
Details for the file flatten_anything-1.1.1-py3-none-any.whl.
File metadata
- Download URL: flatten_anything-1.1.1-py3-none-any.whl
- Upload date:
- Size: 33.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
07cda183ab0859a9ba51474cc7ecb96372275b83cfb6095b695bfaf93bc167c0
|
|
| MD5 |
52f0d4a16e21c12caca21f1aaec1ea91
|
|
| BLAKE2b-256 |
5ea447299e5bb1b884b5a162bc47eefbe8192d3a5522f2cf8927720addea8a96
|