Skip to main content

Polars plugin providing Map operations on List(Struct({key, value})) columns

Project description

polars-map

build pypi

Polars plugin providing a Map extension type and functions. Maps represent a mapping from unique keys of any type to values, and are stored as List(Struct({key, value})) columns. Most functions in the .map namespace accept either the Map extension type or the underlying List(Struct). The type-preserving methods (filter, filter_keys, filter_values, merge, intersection, difference) don't remap the key or value types, so they reuse the input's dtype instead of re-inferring it (which would lock the GIL). On the expression API that reuse requires the Map extension as input; cast a plain List(Struct) first if needed, e.g. with .map.from_entries(). The equivalent Series methods accept either.

Installation

pip install polars-map

Supported operations (.map.*)

Category Methods
Accessors entries, keys, values, len, get, contains_key
Filtering filter, filter_keys, filter_values
Transform eval, eval_keys, eval_values
Set ops merge, intersection, difference
Conversion from_entries
Iteration __iter__, to_list (Series only)

Arrow conversion

Function Description
from_arrow(table) Arrow Table/RecordBatch to Polars DataFrame, preserving map<> as Map
from_arrow_array(array) Arrow Array to Polars Series, preserving map<> as Map
to_arrow(frame) Polars DataFrame to Arrow Table, converting Map back to map<>
to_arrow_array(series) Polars Series to Arrow Array, converting Map back to map<>
scan_arrow(source) Lazy scan from an Arrow source with Map preservation

Usage

import polars as pl
import pyarrow as pa
from polars_map import Map, from_arrow, to_arrow, scan_arrow

# Construction
ser = pl.Series(
    "m",
    [
        [{"key": "a", "value": 1}, {"key": "b", "value": 2}],
        [{"key": "x", "value": 10}],
    ],
    dtype=Map(pl.String(), pl.Int64()),
)
df = pl.DataFrame([ser])

# Accessors
df.select(pl.col("m").map.keys())    # [["a", "b"], ["x"]]
df.select(pl.col("m").map.values())  # [[1, 2], [10]]
df.select(pl.col("m").map.len())     # [2, 1]

# Lookup
df.select(pl.col("m").map.get("a"))           # [1, None]
df.select(pl.col("m").map.contains_key("a"))  # [True, False]

# Filtering
df.select(pl.col("m").map.filter(pl.element().struct["value"] > 1))
df.select(pl.col("m").map.filter_keys(pl.element() > "a"))
df.select(pl.col("m").map.filter_values(pl.element() >= 2))

# Transform keys or values
df.select(pl.col("m").map.eval_keys(pl.element().str.to_uppercase()))
df.select(pl.col("m").map.eval_values(pl.element() * 2))

# Merge (right-side wins on key conflict)
left = pl.Series("l", [[{"key": "a", "value": 1}, {"key": "b", "value": 2}]], dtype=Map(pl.String(), pl.Int64()))
right = pl.Series("r", [[{"key": "a", "value": 99}, {"key": "c", "value": 3}]], dtype=Map(pl.String(), pl.Int64()))
pl.DataFrame([left, right]).select(pl.col("l").map.merge(pl.col("r")))
# [{"a": 99, "b": 2, "c": 3}]

# Set operations
pl.DataFrame([left, right]).select(pl.col("l").map.intersection(pl.col("r")))  # keys in both
pl.DataFrame([left, right]).select(pl.col("l").map.difference(pl.col("r")))    # keys only in left

# Convert to/from plain List(Struct)
df.select(pl.col("m").map.entries())   # strip Map -> List(Struct)

# from_entries is the inverse: it wraps a raw List(Struct) column into a Map
entries = pl.Series(
    "e",
    [[{"key": "a", "value": 1}, {"key": "a", "value": 2}]],
    dtype=pl.List(pl.Struct({"key": pl.String, "value": pl.Int64})),
)
pl.DataFrame([entries]).select(pl.col("e").map.from_entries())  # Map, deduped to {"a": 1}

# Series iteration yields Python dicts
for d in ser.map:
    print(d)  # {"a": 1, "b": 2}, {"x": 10}

# Arrow table with map column → Polars DataFrame
table = pa.table({"m": pa.array([[("a", 1)]], type=pa.map_(pa.string(), pa.int64()))})
df = from_arrow(table)          # Map(String, Int64) dtype preserved
table2 = to_arrow(df)           # roundtrips back to arrow map<>

# Lazy scanning from an Arrow source
lf = scan_arrow(lambda: [table])
result = lf.collect()

Caveats

  • Extension types — used to wrap the underlying List(Struct) storage with a semantic Map dtype, are not yet stabilized and may change across Polars releases.
  • pl.dtype_of — used to efficiently cast to the extension type after some operations is also unstable.
  • GIL - is required to automatically wrap an expression as the extension type, and so operations which could change the underlying key or value types will briefly lock the GIL to do the cast. This may also prevent the polars engine from reasoning about the type.
  • LongMap - arrow currently only support Map, not LongMap. Polars generlly uses LongList, but if a frame is every converted to arrow with offsets that don't fit in a u32, this will exproting will error.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

polars_map-0.2.2.tar.gz (44.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

polars_map-0.2.2-py3-none-any.whl (13.5 kB view details)

Uploaded Python 3

File details

Details for the file polars_map-0.2.2.tar.gz.

File metadata

  • Download URL: polars_map-0.2.2.tar.gz
  • Upload date:
  • Size: 44.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for polars_map-0.2.2.tar.gz
Algorithm Hash digest
SHA256 390333882c54184da32c96eb85503187c685cc9edf05de1ca493c2fc277c0982
MD5 4f7988a9d6645a25f84c26ecb529ac96
BLAKE2b-256 1274c8f2688e0b0e1d0813f2d95ce026d99d00125631e1577e68add59838fc20

See more details on using hashes here.

Provenance

The following attestation bundles were made for polars_map-0.2.2.tar.gz:

Publisher: release.yml on hafaio/polars-map

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file polars_map-0.2.2-py3-none-any.whl.

File metadata

  • Download URL: polars_map-0.2.2-py3-none-any.whl
  • Upload date:
  • Size: 13.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for polars_map-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 30a0e27920c22f60b02e3ccff28e6e0532939fc55a927202870ffc16e4805e4d
MD5 a7b8c772692f52399ffaa7a68a7a6b09
BLAKE2b-256 26508925d781930893754754834f57f718eaeb411a2f3f2dd5c5e40f5d6e6a27

See more details on using hashes here.

Provenance

The following attestation bundles were made for polars_map-0.2.2-py3-none-any.whl:

Publisher: release.yml on hafaio/polars-map

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page