Skip to main content

Add your description here

Project description

Polars KDE

Polars KDE provides Kernel Density Estimation (KDE) functionalities powered by the Polars DataFrame library. Under the hood it uses the kernel_density_estimation crate.

Table of Contents

Installation

Install the Polars KDE package using pip:

pip install polars_kde

or

uv add polars_kde

Examples

Here are some examples of how to use Polars KDE. In general there are three methods to calculate KDE's:

  1. Static Evaluations: This method calculates KDE's at a fixed set of evaluation points for each group. Works an already aggregated Data of type pl.List(pl.Float32).
  2. Aggregated KDE: This method calculates KDE's at a fixed set of evaluation points for each group and aggregates the results. Works on grouped DataFrames, where each group contains the data of type pl.Float32.
  3. Dynamic Evaluations: This method calculates KDE's at a variable set of evaluation points for each group. Works on already aggregated Data of type pl.List(pl.Float32).

Example 1: Static Evaluations

import polars as pl
import polars_kde as pkde

# Sample DataFrame with a as pl.Float32
df = pl.DataFrame(
    {
        "a": [1.0, 2.0, 3.0, 4.0, 5.0],
        "id": [0, 0, 1, 1, 1],
    },
    schema={"a": pl.Float32, "id": pl.Int32},  # Explicitly set the dtypes
)

# Evaluation points
eval_points = [1.0, 2.0, 3.0, 4.0, 5.0]

# Group by 'id' and apply KDE
df_kde = df.group_by("id").agg(
    pl.col("a")
).with_columns(
    kde=pkde.kde_static_evals(
        pl.col("a"),
        eval_points=eval_points,
    )
)

print(df_kde)

Example 2: Aggregated KDE

import polars as pl
import polars_kde as pkde

# Sample DataFrame with a as pl.Float32
df = pl.DataFrame(
    {
        "a": [1.0, 2.0, 3.0, 4.0, 5.0],
        "id": [0, 0, 1, 1, 1],
    },
    schema={"a": pl.Float32, "id": pl.Int32},  # explicitly set the dtypes
)

# Evaluation points
eval_points = [1.0, 2.0, 3.0, 4.0, 5.0]

# Group by 'id' and apply aggregated KDE
df_kde = df.group_by("id").agg(
    kde=pkde.kde(
        pl.col("a"),
        eval_points=eval_points,
    )
)

print(df_kde)

Example 3: Dynamic Evaluations

import polars as pl
import polars_kde as pkde

# Sample DataFrame with a as pl.List(pl.Float32)
df = pl.DataFrame(
    {
        "a": [[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]],
        "eval_points": [[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]],
    },
    schema={"a": pl.List(pl.Float32), "eval_points": pl.List(pl.Float32)},
)

# Apply dynamic KDE evaluations
df_kde = df.with_columns(
    kde=pkde.kde_dynamic_evals(
        pl.col("a"),
        pl.col("eval_points"),
    )
)

print(df_kde)

Benchmark

After various tests and experiments, it turns out that it is usually a good idea to use kde to calculate KDE's. By the construction of polars, the single groups handled in parallel.

The following non representative benchmark compares the performance of different KDE implementations using various DataFrame sizes. We also compare against the scipy implementation, wich is not parallelized and applied using map_elements in polars.

Benchmark Results

The benchmark evaluates the total time taken for KDE computations across different numbers of rows and groups. If we have say 1000 rows and 10 groups, we calculate the KDE for each of the groups and get 10 KDE's evaluated at a fixed set of evaluation points. Note that kde_dynamic_evals also allows for a variable number of evaluation points per group.

If you want to run the benchmark yourself, you can use the following command:

make bench

NOTE: The benchmark.py file is actually a marimo notebook. Since it was created in sandbox mode the dependencies are part of the header information and we do not need to include any additional dependencies in the pyproject.toml file. Great.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Polars KDE provides Kernel Density Estimation (KDE) functionalities powered by the Polars DataFrame library.

Limitations and further improvements

  • The current implementation only supports float32 dtypes and uses the Gaussian kernel with Silverman bandwith estimation. Should be extended to support various kernels, bandwiths and dtypes.
  • The current implementation only supports 1D KDE's. Should be extended to support 2D KDE's.
  • The underlying rust implementation is not yet optimized for performance, especially for large datasets.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

polars_kde-0.1.2.tar.gz (101.2 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

polars_kde-0.1.2-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.1 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ x86-64

polars_kde-0.1.2-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (3.8 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ ARM64

polars_kde-0.1.2-cp38-abi3-macosx_11_0_arm64.whl (3.5 MB view details)

Uploaded CPython 3.8+macOS 11.0+ ARM64

polars_kde-0.1.2-cp38-abi3-macosx_10_12_x86_64.whl (3.8 MB view details)

Uploaded CPython 3.8+macOS 10.12+ x86-64

File details

Details for the file polars_kde-0.1.2.tar.gz.

File metadata

  • Download URL: polars_kde-0.1.2.tar.gz
  • Upload date:
  • Size: 101.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: maturin/1.8.2

File hashes

Hashes for polars_kde-0.1.2.tar.gz
Algorithm Hash digest
SHA256 23e250c97296fcd79c88fb85e50b4a5b23da2c7a7af0b039d6b1782bb5560563
MD5 e162e31993c1d08b7c5c3b8f0e794a46
BLAKE2b-256 4fc8736ee51d2f837c29589476deee9ff92480be0092685e170c07dd0732bea7

See more details on using hashes here.

File details

Details for the file polars_kde-0.1.2-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for polars_kde-0.1.2-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 2ffffa10962c3b0e99cf9acea77673eb319d4d7dcbc64e249f4a159ab0ab78fa
MD5 153eeec8927d6068635fe851a10e3fbc
BLAKE2b-256 8c5d465d1431ff88d2da5b288b3122d7007610a82bd6aadfbbb00f044bc5ff23

See more details on using hashes here.

File details

Details for the file polars_kde-0.1.2-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for polars_kde-0.1.2-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 ebf40d675510cb58eff6869c2ff923cdf91444a4333626e3fd38f3da205804b5
MD5 f3b7fda89e55b4cee3d853c19e59c753
BLAKE2b-256 82c21e89a063cac0301c1f31ed1d61424204f2953660fff2fe9299f5f5859f54

See more details on using hashes here.

File details

Details for the file polars_kde-0.1.2-cp38-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for polars_kde-0.1.2-cp38-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 28b934ec9848475ecebe698526d4b338a5763118f893690a0ce761e05f6ff07d
MD5 9433a32de4a9b0d0e1425d798908f87f
BLAKE2b-256 5cb87bf78c7f8ee679f58e13c4d91c6a41634b4737ab64a7fbda7ef824ced821

See more details on using hashes here.

File details

Details for the file polars_kde-0.1.2-cp38-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for polars_kde-0.1.2-cp38-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 9bb8441702b450cf49710f0845b5d5fedd4f4983250d8c1fa814b0de17e2707f
MD5 324ad5df4cb63be53c0ce3f97584717d
BLAKE2b-256 7bc841f12d9f80e522d4ecc340fd3bb80d4b6379d94441a2280948cd6d8d59d0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page