Skip to main content

A Polars plugin for flattening nested data

Project description

polars-schema-index

A Polars plugin for flattening nested columns with stable numeric indexing.

polars-schema-index provides a systematic way to explode/unnest nested Polars DataFrames (does not yet support LazyFrames) without overwriting columns that share the same name. It achieves this by:

  • Attaching a custom schema_index namespace to your DataFrame.
  • Renaming columns that do not end in digits with a numbered suffix.
  • Iteratively flattening Struct columns (and optionally exploding list[struct] columns first), so every nested field becomes a separate top-level column.

Installation

pip install polars-schema-index[polars]

On older CPUs run:

pip install polars-schema-index[polars-lts-cpu]

Usage

import polars as pl
from polars_schema_index import flatten_nested_data

# Example: flatten a deeply nested JSON structure
df = pl.read_ndjson(
    source=b'''{
        "body": [
            {
                "type": "If",
                "test": {
                    "type": "Compare",
                    "left": {
                        "type": "Name",
                        "id": "x",
                        "ctx": { "type": "Load" }
                    },
                    "ops": [{ "type": "IsNot" }],
                    "comparators": [{ "type": "Constant", "value": null }]
                },
                "body": [{ "type": "Pass" }],
                "orelse": []
            }
        ],
        "type_ignores": []
    }
    '''.replace(b"\n", b"")
)
flattened = flatten_nested_data(df)
print(flattened)

This gives a DataFrame with all nested fields expanded into uniquely suffixed, monotonically increasing numbered columns:

┌────────────────┬────────┬────────────┬─────────┬───┬─────────┬──────────┬──────────┬─────────┐
 type_ignores_1  type_2  orelse_5    type_6     type_14  type_15   value_16  type_17 
 ---             ---     ---         ---         ---      ---       ---       ---     
 list[null]      str     list[null]  str         str      str       null      str     
╞════════════════╪════════╪════════════╪═════════╪═══╪═════════╪══════════╪══════════╪═════════╡
 []              If      []          Compare    IsNot    Constant  null      Load    
└────────────────┴────────┴────────────┴─────────┴───┴─────────┴──────────┴──────────┴─────────┘

What It Solves

  • No more silent overwrites of common keys (like "type") when unnesting.
  • Stable numeric suffixes for each column, so even if you run multiple flatten passes, names remain unique.
  • Optional exploding of list-of-struct columns before flattening them.

Key Functions

  1. flatten_nested_data(df, explode_lists=True, max_passes=1000)
    Iteratively flattens all Struct columns in a DataFrame or LazyFrame, and explodes any list[struct] columns (if explode_lists=True). Continues until no Struct columns remain (or max_passes is reached).

  2. df.schema_index.append_unnest_relabel(df, column=...)
    Moves one column to the end via .permute, unnest it, then relabel newly created columns with numeric suffixes.

Note

  • Column Renaming: The library appends numeric suffixes to all columns that lack them, even if they are already scalar columns. That ensures flattening never creates collisions, but it does mean your top-level columns will also gain suffixes.
  • LazyFrame Support: By default, the plugin is registered for DataFrame. If you want to use this on LazyFrames, you can register a similar namespace for LazyFrame or manually attach the plugin’s logic. I may end up supporting both.

Contributing

  1. Issues & Discussions: Please open a GitHub issue for bugs, feature requests, or questions.
  2. Pull Requests: PRs are welcome! Add tests under tests/, update the docs, and ensure you run pytest locally.

License

This project is licensed under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

polars_schema_index-0.1.2.tar.gz (7.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

polars_schema_index-0.1.2-py3-none-any.whl (6.6 kB view details)

Uploaded Python 3

File details

Details for the file polars_schema_index-0.1.2.tar.gz.

File metadata

  • Download URL: polars_schema_index-0.1.2.tar.gz
  • Upload date:
  • Size: 7.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: pdm/2.22.3 CPython/3.12.8 Linux/6.8.0-51-generic

File hashes

Hashes for polars_schema_index-0.1.2.tar.gz
Algorithm Hash digest
SHA256 0fa604cf0abb644161216dd0bfc6a8fff4ea8305e14cb3b720f45d8b41e771b1
MD5 3d47c9c3d322e0ceeaba5ad49642aebc
BLAKE2b-256 65fb1929ec86453a4b826339fc6315b78a46bfe0e68e0aaf72a31120e7cf5df9

See more details on using hashes here.

File details

Details for the file polars_schema_index-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: polars_schema_index-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 6.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: pdm/2.22.3 CPython/3.12.8 Linux/6.8.0-51-generic

File hashes

Hashes for polars_schema_index-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 58a4204a705c5df4676f33d72543fff0e6d8dc1241582d49a7eab3c517475b38
MD5 d601d50c2438440be12b8ff1a69ec825
BLAKE2b-256 5e852889b96640979297d69fe064d64b2bd45eb9d5bd40e9467f1206d124c38d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page