A library to flatten nested data.

These details have not been verified by PyPI

Project links

Project description

dataflat

A library to flatten nested keys and columns from Python Dictionaries, Pandas DataFrames, Polars DataFrames, PyArrow Tables, and PySpark DataFrames into a set of relational tables.

Installation

pip install dataflat

For Polars support:

pip install dataflat polars

For Pandas support:

pip install dataflat pandas pyarrow

For PyArrow support:

pip install dataflat pyarrow

For PySpark support:

pip install dataflat pyspark

Get started

Import handler, FlattenerOptions, and optionally CaseTranslatorOptions from dataflat:

from dataflat import handler, FlattenerOptions, CaseTranslatorOptions

FlattenerOptions selects the backend:

Option	Input type
`FlattenerOptions.DICTIONARY`	Python `dict`
`FlattenerOptions.PANDAS_DF`	Pandas `DataFrame`
`FlattenerOptions.POLARS_DF`	Polars `DataFrame`
`FlattenerOptions.PYARROW_TABLE`	PyArrow `Table`
`FlattenerOptions.PYSPARK_DF`	PySpark `DataFrame`

CaseTranslatorOptions (optional) translates key / column names after flattening:

Option	Example output
`SNAKE`	`order_id`
`CAMEL`	`orderId`
`PASCAL`	`OrderId`
`KEBAB`	`order-id`
`HUMAN`	`Order id`
`LOWER`	`orderid`

Instantiate a flattener

Use handler() to get a configured flattener instance:

from dataflat import handler, FlattenerOptions, CaseTranslatorOptions

flattener = handler(
    custom_flattener=FlattenerOptions.DICTIONARY,  # or POLARS_DF / PYSPARK_DF
    from_case=CaseTranslatorOptions.CAMEL,         # optional
    to_case=CaseTranslatorOptions.SNAKE,           # optional
    remove_special_chars=False,                    # strip non-alphanumeric chars (default False)
)

Or import the concrete class directly if you prefer:

from dataflat.dictionary import CustomFlattener   # dict
from dataflat.pandas import CustomFlattener       # Pandas
from dataflat.polars import CustomFlattener       # Polars
from dataflat.pyarrow import CustomFlattener      # PyArrow
from dataflat.pyspark import CustomFlattener      # PySpark

Flatten data

All flatteners share the same flatten() signature:

flatten_data = flattener.flatten(
    data=data,                                      # dict / Polars / PySpark DataFrame
    primary_key="id",                               # optional; auto-generates UUID column when omitted
    entity_name="data",                             # root entity name, default "data"
    partition_keys=["date"],                        # columns propagated to all children
    black_list=["keys.to", "be.ignored"],           # fields excluded from output
)

Parameters

primary_key — identifies each root record; propagated to all child entities as <entity_name>.<primary_key> (e.g. data.id). When None (the default), a UUID column is auto-generated with a name that matches to_case: dataflat_id_column (SNAKE), dataflat-id-column (KEBAB), dataflatIdColumn (CAMEL), DataflatIdColumn (PASCAL), Dataflat id column (HUMAN), dataflatidcolumn (LOWER).
partition_keys — additional root columns (e.g. ["date"]) inherited by every child entity.
black_list — dot-joined field paths excluded from all output (e.g. ["summary.totalClients"]).

Return value

flatten() returns dict[str, entity] — one key per entity, named by dot-joined path.

Nested fields are handled as follows:

Struct / object fields are expanded inline using dot-notation (payment.method, client.address.city).
Lists of objects produce a child entity (e.g. data.orders.products) with a positional index column (0-based, per-parent).
Lists of scalars (strings, ints, floats, booleans) are joined with "|" into a single string column in the parent entity — no child entity is created.

{
  "data": [{"id": 1, "date": "2024-01-01", "tags": "featured|sale", "total": 1900}],
  "data.orders": [
    {"id": "abc123", "total": 700, "notes": "fragile|urgent", "data.id": 1, "data.date": "2024-01-01", "index": 0},
    {"id": "dfg456", "total": 1200, "notes": "gift", "data.id": 1, "data.date": "2024-01-01", "index": 1}
  ],
  "data.orders.products": [
    {"id": "ab", "price": 200, "data.id": 1, "data.date": "2024-01-01", "data.orders.index": 0, "index": 0},
    {"id": "cd", "price": 500, "data.id": 1, "data.date": "2024-01-01", "data.orders.index": 0, "index": 1},
    {"id": "fg", "price": 1200, "data.id": 1, "data.date": "2024-01-01", "data.orders.index": 1, "index": 0}
  ]
}

Produced from:

{
  "id": 1,
  "date": "2024-01-01",
  "orders": [
    {
      "id": "abc123",
      "products": [
        {"id": "ab", "price": 200},
        {"id": "cd", "price": 500}
      ],
      "total": 700
    },
    {
      "id": "dfg456",
      "products": [
        {"id": "fg", "price": 1200}
      ],
      "total": 1200
    }
  ],
  "total": 1900
}

Recommendations

For PYSPARK_DF it is recommended to enable case-sensitive mode in Spark:
```
spark.conf.set("spark.sql.caseSensitive", True)
```
For POLARS_DF with nullable struct columns, load data via pl.read_ndjson instead of pl.from_dicts — the latter does not handle nullable structs correctly.
```
import polars as pl
df = pl.read_ndjson("data.ndjson")
```
For PANDAS_DF, the flattener internally converts the DataFrame to a PyArrow Table to reliably distinguish str, dict, and list columns (the numpy_nullable dtype backend marks all three as object). The result is returned with the pyarrow dtype backend (pd.ArrowDtype), giving properly nullable typed columns.

For PYARROW_TABLE with single JSON objects (not NDJSON), use pa.Table.from_pylist:

import json
import pyarrow as pa
with open("data.json") as f:
    data = json.load(f)
table = pa.Table.from_pylist([data])

For NDJSON or multiple records use pyarrow.json.read_json:

import pyarrow.json as pa_json
table = pa_json.read_json("data.ndjson")

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

3.1.0

Apr 1, 2026

This version

3.0.0

Mar 31, 2026

2.0.0

Sep 6, 2024

1.1.0

Jul 17, 2023

1.0.6

Jun 29, 2023

1.0.5

Jun 29, 2023

1.0.4

Jun 9, 2023

1.0.3

Jun 9, 2023

1.0.2

Jun 9, 2023

1.0.1

Mar 23, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dataflat-3.0.0.tar.gz (134.2 kB view details)

Uploaded Mar 31, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

dataflat-3.0.0-py3-none-any.whl (36.4 kB view details)

Uploaded Mar 31, 2026 Python 3

File details

Details for the file dataflat-3.0.0.tar.gz.

File metadata

Download URL: dataflat-3.0.0.tar.gz
Upload date: Mar 31, 2026
Size: 134.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for dataflat-3.0.0.tar.gz
Algorithm	Hash digest
SHA256	`daedca7bd95e6fb2cfa290257e6ad933b2a3b1703192a82986eda66e3fa07b26`
MD5	`fa5e4388edb47bc6ad0801290d422162`
BLAKE2b-256	`ca2a78b786fce0dd3d02fbace8efd0c6ada7c6558d111dcfa313ff0051d56d81`

See more details on using hashes here.

File details

Details for the file dataflat-3.0.0-py3-none-any.whl.

File metadata

Download URL: dataflat-3.0.0-py3-none-any.whl
Upload date: Mar 31, 2026
Size: 36.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for dataflat-3.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`01a4d22af83aa1f53a65fa857dce747f3c59e9fed3a3501a405903d8bab046e3`
MD5	`f1ea2a856de3826999f53939a1600d39`
BLAKE2b-256	`25fdd277335aba89e48947235bb94a309957acdaa9a9a7c153d2907e4fb74ab6`

See more details on using hashes here.

dataflat 3.0.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

dataflat

Installation

Get started

Instantiate a flattener

Flatten data

Recommendations

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes