A Polars plugin for persistent DataFrame-level metadata
Project description
polars-config-meta
A Polars plugin for persistent DataFrame-level metadata.
polars-config-meta offers a simple way to store and propagate Python-side metadata for Polars DataFrames. It achieves this by:
- Registering a custom
config_metanamespace on eachDataFrame(via@register_dataframe_namespace). - Keeping an internal dictionary keyed by the
id(df), with automatic weak-reference cleanup to avoid memory leaks. - Providing a “fallthrough” mechanism so you can write
df.config_meta.some_polars_method(...)and have the resulting newDataFrameautomatically inherit the old metadata—no manual copying required. - Optionally embedding that metadata in file‐level Parquet metadata when you call
df.config_meta.write_parquet(...), and retrieving it withread_parquet_with_meta(...)(eager) orscan_parquet_with_meta(...)(lazy).
Installation
pip install polars-schema-index[polars]
On older CPUs add the polars-lts-cpu extra:
pip install polars-schema-index[polars-lts-cpu]
For parquet file-level metadata read/writing, add the pyarrow extra:
pip install polars-schema-index[pyarrow]
Key Points
-
No Monkey-Patching or Subclassing We do not modify Polars’ built-in classes at runtime or create a custom subclass of
DataFrame. Everything is implemented through a plugin namespace. -
Weak-Reference Based We store metadata in class-level dictionaries keyed by
id(df)and hold aweakrefto the DataFrame. Once the DataFrame is garbage-collected, the metadata is removed too. -
Automatic Metadata Copying
- When you call
df.config_meta.with_columns(...)(or any other Polars method) through theconfig_metanamespace, we intercept the result. - If it’s a new
DataFrame, the plugin copies the old one’s metadata forward.
- When you call
-
Parquet Integration
df.config_meta.write_parquet("file.parquet")automatically embeds the plugin metadata into the Arrow schema’smetadata.read_parquet_with_meta("file.parquet")reads the file, extracts that metadata, and reattaches it to the returnedDataFrame.scan_parquet_with_meta("file.parquet")scans the file, extracts that metadata, and reattaches it to the returnedLazyFrame.
-
Opt-In Only
- If you call
df.with_columns(...)without.config_meta.in front, Polars has no knowledge of this plugin, so metadata will not copy forward. - If you want transformations to preserve metadata, call them via
df.config_meta.<method>(...).
- If you call
Basic Usage
import polars as pl
import polars_config_meta # this registers the plugin
df = pl.DataFrame({"a": [1, 2, 3]})
df.config_meta.set(owner="Alice", confidence=0.95)
# Use the plugin to transform; the returned DataFrame inherits metadata:
df2 = df.config_meta.with_columns(doubled=pl.col("a") * 2)
print(df2.config_meta.get_metadata())
# -> {'owner': 'Alice', 'confidence': 0.95}
# Write to Parquet, storing the metadata in file-level metadata:
df2.config_meta.write_parquet("output.parquet")
# Later, read it back:
from polars_config_meta import read_parquet_with_meta
df_in = read_parquet_with_meta("output.parquet")
print(df_in.config_meta.get_metadata())
# -> {'owner': 'Alice', 'confidence': 0.95}
Storage and Garbage Collection
Internally, the plugin stores metadata in a global dictionary, _df_id_to_meta, keyed by id(df),
and also keeps a weakref to each DataFrame. As soon as a DataFrame is out of scope and
garbage-collected, the entry in _df_id_to_meta is automatically removed. This prevents memory
leaks and keeps the plugin usage simple.
Common Patterns
-
Setting Metadata:
df.config_meta.set(key1="val1", key2="val2", ...) -
Retrieving Metadata:
df.config_meta.get_metadata()(returns adict) -
Updating Metadata From a Dict:
df.config_meta.update({"some_key": "new_val", ...}) -
Merging Metadata From Other DataFrames:
df3 = pl.DataFrame(...) df3.config_meta.merge(df1, df2)
This copies all key–value pairs from
df1anddf2intodf3’s metadata. -
Transformations
df.config_meta.with_columns(...)df.config_meta.select(...)df.config_meta.filter(...)- etc.
For any method that returns a new DataFrame, the plugin copies metadata forward. If the method
returns something else (like a Series or plain Python object), the plugin does nothing.
Caveats
- Must Use
df.config_meta.<method>If you call Polars methods directly ondf, the plugin can’t intercept the result, so metadata will not be inherited. - Not Official Polars Feature This is purely at the Python layer. Polars doesn’t guarantee stable IDs or official hooks for such metadata.
- Arrow/IPC/CSV
For other formats, you’d need to write your own logic to embed or retrieve the metadata. Currently, only Parquet is supported out of the box via
df.config_meta.write_parquetandread_parquet_with_meta/scan_parquet_with_meta.
Contributing
- Issues & Discussions: Please open a GitHub issue for bugs, ideas, or questions.
- Pull Requests: PRs are welcome! This plugin is a community-driven approach to persist DataFrame-level metadata in Polars.
Polars development
There is ongoing work to support file-level metadata in the Parquet writing, see this PR for details.
License
This project is licensed under the MIT License.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file polars_config_meta-0.1.5.tar.gz.
File metadata
- Download URL: polars_config_meta-0.1.5.tar.gz
- Upload date:
- Size: 7.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: pdm/2.22.3 CPython/3.12.6 Linux/5.15.0-125-generic
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4eb6960a0e039ff82dbd6d2d5b14a1a65272e60318dd459a92b0e4f799f9aee8
|
|
| MD5 |
090f7a2e6a458f2efdd505c9e6e670a9
|
|
| BLAKE2b-256 |
7aa1812d6dfabefb80eedd5e44521f0a988ce6cf2a8818f7259158529b3e81fe
|
File details
Details for the file polars_config_meta-0.1.5-py3-none-any.whl.
File metadata
- Download URL: polars_config_meta-0.1.5-py3-none-any.whl
- Upload date:
- Size: 6.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: pdm/2.22.3 CPython/3.12.6 Linux/5.15.0-125-generic
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
91fdea79eaa469fe99ae297ae164465c453f79811c8bd9cbf6596bc1af11ac17
|
|
| MD5 |
be1ef29a7b723402727aeceec0c29db1
|
|
| BLAKE2b-256 |
a35f497adf95cd96ed35a57b5cae0be5250574f31a2fb7c51ca2be2b483c4e84
|