Add your description here
Project description
Polars KDE
Polars KDE provides Kernel Density Estimation (KDE) functionalities powered by the Polars DataFrame library. Under the hood it uses the kernel_density_estimation crate.
Table of Contents
Installation
Install the Polars KDE package using pip:
pip install polars_kde
or
uv add polars_kde
Examples
Here are some examples of how to use Polars KDE. In general there are three methods to calculate KDE's:
- Static Evaluations: This method calculates KDE's at a fixed set of evaluation points for each group. Works an already aggregated Data of type
pl.List(pl.Float32). - Aggregated KDE: This method calculates KDE's at a fixed set of evaluation points for each group and aggregates the results. Works on grouped DataFrames, where each group contains the data of type
pl.Float32. - Dynamic Evaluations: This method calculates KDE's at a variable set of evaluation points for each group. Works on already aggregated Data of type
pl.List(pl.Float32).
Example 1: Static Evaluations
import polars as pl
import polars_kde as pkde
# Sample DataFrame with a as pl.Float32
df = pl.DataFrame(
{
"a": [1.0, 2.0, 3.0, 4.0, 5.0],
"id": [0, 0, 1, 1, 1],
},
schema={"a": pl.Float32, "id": pl.Int32}, # Explicitly set the dtypes
)
# Evaluation points
eval_points = [1.0, 2.0, 3.0, 4.0, 5.0]
# Group by 'id' and apply KDE
df_kde = df.group_by("id").agg(
pl.col("a")
).with_columns(
kde=pkde.kde_static_evals(
pl.col("a"),
eval_points=eval_points,
)
)
print(df_kde)
Example 2: Aggregated KDE
import polars as pl
import polars_kde as pkde
# Sample DataFrame with a as pl.Float32
df = pl.DataFrame(
{
"a": [1.0, 2.0, 3.0, 4.0, 5.0],
"id": [0, 0, 1, 1, 1],
},
schema={"a": pl.Float32, "id": pl.Int32}, # explicitly set the dtypes
)
# Evaluation points
eval_points = [1.0, 2.0, 3.0, 4.0, 5.0]
# Group by 'id' and apply aggregated KDE
df_kde = df.group_by("id").agg(
kde=pkde.kde(
pl.col("a"),
eval_points=eval_points,
)
)
print(df_kde)
Example 3: Dynamic Evaluations
import polars as pl
import polars_kde as pkde
# Sample DataFrame with a as pl.List(pl.Float32)
df = pl.DataFrame(
{
"a": [[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]],
"eval_points": [[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]],
},
schema={"a": pl.List(pl.Float32), "eval_points": pl.List(pl.Float32)},
)
# Apply dynamic KDE evaluations
df_kde = df.with_columns(
kde=pkde.kde_dynamic_evals(
pl.col("a"),
pl.col("eval_points"),
)
)
print(df_kde)
Benchmark
After various tests and experiments, it turns out that it is usually a good idea to use kde to calculate KDE's. By the construction of polars, the single groups handled in parallel.
The following non representative benchmark compares the performance of different KDE implementations using various DataFrame sizes. We also compare against the scipy implementation, wich is not parallelized and applied using map_elements in polars.
The benchmark evaluates the total time taken for KDE computations across different numbers of rows and groups. If we have say 1000 rows and 10 groups, we calculate the KDE for each of the groups and get 10 KDE's evaluated at a fixed set of evaluation points. Note that kde_dynamic_evals also allows for a variable number of evaluation points per group.
If you want to run the benchmark yourself, you can use the following command:
make bench
NOTE: The benchmark.py file is actually a marimo notebook. Since it was created in sandbox mode the dependencies are part of the header information and we do not need to include any additional dependencies in the pyproject.toml file. Great.
License
This project is licensed under the MIT License. See the LICENSE file for details.
Polars KDE provides Kernel Density Estimation (KDE) functionalities powered by the Polars DataFrame library.
Limitations and further improvements
- The current implementation only supports float32 dtypes and uses the Gaussian kernel with Silverman bandwith estimation. Should be extended to support various kernels, bandwiths and dtypes.
- The current implementation only supports 1D KDE's. Should be extended to support 2D KDE's.
- The underlying rust implementation is not yet optimized for performance, especially for large datasets.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file polars_kde-0.1.2.tar.gz.
File metadata
- Download URL: polars_kde-0.1.2.tar.gz
- Upload date:
- Size: 101.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: maturin/1.8.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
23e250c97296fcd79c88fb85e50b4a5b23da2c7a7af0b039d6b1782bb5560563
|
|
| MD5 |
e162e31993c1d08b7c5c3b8f0e794a46
|
|
| BLAKE2b-256 |
4fc8736ee51d2f837c29589476deee9ff92480be0092685e170c07dd0732bea7
|
File details
Details for the file polars_kde-0.1.2-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: polars_kde-0.1.2-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 4.1 MB
- Tags: CPython 3.8+, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: maturin/1.8.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2ffffa10962c3b0e99cf9acea77673eb319d4d7dcbc64e249f4a159ab0ab78fa
|
|
| MD5 |
153eeec8927d6068635fe851a10e3fbc
|
|
| BLAKE2b-256 |
8c5d465d1431ff88d2da5b288b3122d7007610a82bd6aadfbbb00f044bc5ff23
|
File details
Details for the file polars_kde-0.1.2-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.
File metadata
- Download URL: polars_kde-0.1.2-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
- Upload date:
- Size: 3.8 MB
- Tags: CPython 3.8+, manylinux: glibc 2.17+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: maturin/1.8.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ebf40d675510cb58eff6869c2ff923cdf91444a4333626e3fd38f3da205804b5
|
|
| MD5 |
f3b7fda89e55b4cee3d853c19e59c753
|
|
| BLAKE2b-256 |
82c21e89a063cac0301c1f31ed1d61424204f2953660fff2fe9299f5f5859f54
|
File details
Details for the file polars_kde-0.1.2-cp38-abi3-macosx_11_0_arm64.whl.
File metadata
- Download URL: polars_kde-0.1.2-cp38-abi3-macosx_11_0_arm64.whl
- Upload date:
- Size: 3.5 MB
- Tags: CPython 3.8+, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: maturin/1.8.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
28b934ec9848475ecebe698526d4b338a5763118f893690a0ce761e05f6ff07d
|
|
| MD5 |
9433a32de4a9b0d0e1425d798908f87f
|
|
| BLAKE2b-256 |
5cb87bf78c7f8ee679f58e13c4d91c6a41634b4737ab64a7fbda7ef824ced821
|
File details
Details for the file polars_kde-0.1.2-cp38-abi3-macosx_10_12_x86_64.whl.
File metadata
- Download URL: polars_kde-0.1.2-cp38-abi3-macosx_10_12_x86_64.whl
- Upload date:
- Size: 3.8 MB
- Tags: CPython 3.8+, macOS 10.12+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: maturin/1.8.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9bb8441702b450cf49710f0845b5d5fedd4f4983250d8c1fa814b0de17e2707f
|
|
| MD5 |
324ad5df4cb63be53c0ce3f97584717d
|
|
| BLAKE2b-256 |
7bc841f12d9f80e522d4ecc340fd3bb80d4b6379d94441a2280948cd6d8d59d0
|