Skip to main content

A Polars plugin for flattening nested data

Project description

polars-schema-index

uv PyPI Supported Python versions License pre-commit.ci status

A Polars plugin for flattening nested columns with stable numeric indexing.

polars-schema-index provides a systematic way to explode/unnest nested Polars DataFrames (does not yet support LazyFrames) without overwriting columns that share the same name. It achieves this by:

  • Attaching a custom schema_index namespace to your DataFrame.
  • Renaming columns that do not end in digits with a numbered suffix.
  • Iteratively flattening Struct columns (and optionally exploding list[struct] columns first), so every nested field becomes a separate top-level column.

Installation

pip install polars-schema-index[polars]

On older CPUs run:

pip install polars-schema-index[polars-lts-cpu]

Usage

import polars as pl
from polars_schema_index import flatten_nested_data

# Example: flatten a deeply nested JSON structure
df = pl.read_ndjson(
    source=b'''{
        "body": [
            {
                "type": "If",
                "test": {
                    "type": "Compare",
                    "left": {
                        "type": "Name",
                        "id": "x",
                        "ctx": { "type": "Load" }
                    },
                    "ops": [{ "type": "IsNot" }],
                    "comparators": [{ "type": "Constant", "value": null }]
                },
                "body": [{ "type": "Pass" }],
                "orelse": []
            }
        ],
        "type_ignores": []
    }
    '''.replace(b"\n", b"")
)
flattened = flatten_nested_data(df)
print(flattened)

This gives a DataFrame with all nested fields expanded into uniquely suffixed, monotonically increasing numbered columns:

┌────────────────┬────────┬────────────┬─────────┬───┬─────────┬──────────┬──────────┬─────────┐
 type_ignores_1  type_2  orelse_5    type_6     type_14  type_15   value_16  type_17 
 ---             ---     ---         ---         ---      ---       ---       ---     
 list[null]      str     list[null]  str         str      str       null      str     
╞════════════════╪════════╪════════════╪═════════╪═══╪═════════╪══════════╪══════════╪═════════╡
 []              If      []          Compare    IsNot    Constant  null      Load    
└────────────────┴────────┴────────────┴─────────┴───┴─────────┴──────────┴──────────┴─────────┘

What It Solves

  • No more silent overwrites of common keys (like "type") when unnesting.
  • Stable numeric suffixes for each column, so even if you run multiple flatten passes, names remain unique.
  • Optional exploding of list-of-struct columns before flattening them.

Key Functions

  1. flatten_nested_data(df, explode_lists=True, max_passes=1000) Iteratively flattens all Struct columns in a DataFrame or LazyFrame, and explodes any list[struct] columns (if explode_lists=True). Continues until no Struct columns remain (or max_passes is reached).

  2. df.schema_index.append_unnest_relabel(df, column=...) Moves one column to the end via .permute, unnest it, then relabel newly created columns with numeric suffixes.

Note

  • Column Renaming: The library appends numeric suffixes to all columns that lack them, even if they are already scalar columns. That ensures flattening never creates collisions, but it does mean your top-level columns will also gain suffixes.
  • LazyFrame Support: By default, the plugin is registered for DataFrame. If you want to use this on LazyFrames, you can register a similar namespace for LazyFrame or manually attach the plugin’s logic. I may end up supporting both.

Contributing

  1. Issues & Discussions: Please open a GitHub issue for bugs, feature requests, or questions.
  2. Pull Requests: PRs are welcome! Add tests under tests/, update the docs, and ensure you run pytest locally.

License

This project is licensed under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

polars_schema_index-0.1.3.tar.gz (5.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

polars_schema_index-0.1.3-py3-none-any.whl (6.0 kB view details)

Uploaded Python 3

File details

Details for the file polars_schema_index-0.1.3.tar.gz.

File metadata

  • Download URL: polars_schema_index-0.1.3.tar.gz
  • Upload date:
  • Size: 5.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for polars_schema_index-0.1.3.tar.gz
Algorithm Hash digest
SHA256 f89f4ac7639ee0a731ca058fe83583f135679f51a05ec99aab82d4e79d50d159
MD5 4547fd8dc6e930503b724697a8b456fe
BLAKE2b-256 25b0c7c87ad4de738f0830022e39b3d26293398cfa02ded7047d68ce2ccead6f

See more details on using hashes here.

Provenance

The following attestation bundles were made for polars_schema_index-0.1.3.tar.gz:

Publisher: CI.yml on lmmx/polars-schema-index

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file polars_schema_index-0.1.3-py3-none-any.whl.

File metadata

File hashes

Hashes for polars_schema_index-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 900f3c62210e7d8580127ae3011562c1be1f131b2fd881376b65f8ceb0124a73
MD5 6e713d5751c2cd1bd5a2f74f9a56f1d2
BLAKE2b-256 6d8811f27c5a061719b5a5ec3c8d335d94f31bd3502dc8b8e1c56b3dff544c3f

See more details on using hashes here.

Provenance

The following attestation bundles were made for polars_schema_index-0.1.3-py3-none-any.whl:

Publisher: CI.yml on lmmx/polars-schema-index

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page