Skip to main content

Add your description here

Project description

Polars KDE

Polars KDE provides Kernel Density Estimation (KDE) functionalities powered by the Polars DataFrame library. Under the hood it uses the kernel_density_estimation crate.

Table of Contents

Installation

Install the Polars KDE package using pip:

pip install polars_kde

or

uv add polars_kde

Examples

Here are some examples of how to use Polars KDE. In general there are three methods to calculate KDE's:

  1. Static Evaluations: This method calculates KDE's at a fixed set of evaluation points for each group. Works an already aggregated Data of type pl.List(pl.Float32).
  2. Aggregated KDE: This method calculates KDE's at a fixed set of evaluation points for each group and aggregates the results. Works on grouped DataFrames, where each group contains the data of type pl.Float32.
  3. Dynamic Evaluations: This method calculates KDE's at a variable set of evaluation points for each group. Works on already aggregated Data of type pl.List(pl.Float32).

Example 1: Static Evaluations

import polars as pl
import polars_kde as pkde

# Sample DataFrame with a as pl.Float32
df = pl.DataFrame(
    {
        "a": [1.0, 2.0, 3.0, 4.0, 5.0],
        "id": [0, 0, 1, 1, 1],
    },
    schema={"a": pl.Float32, "id": pl.Int32},  # Explicitly set the dtypes
)

# Evaluation points
eval_points = [1.0, 2.0, 3.0, 4.0, 5.0]

# Group by 'id' and apply KDE
df_kde = df.group_by("id").agg(
    pl.col("a")
).with_columns(
    kde=pkde.kde_static_evals(
        pl.col("a"),
        eval_points=eval_points,
    )
)

print(df_kde)

Example 2: Aggregated KDE

import polars as pl
import polars_kde as pkde

# Sample DataFrame with a as pl.Float32
df = pl.DataFrame(
    {
        "a": [1.0, 2.0, 3.0, 4.0, 5.0],
        "id": [0, 0, 1, 1, 1],
    },
    schema={"a": pl.Float32, "id": pl.Int32},  # explicitly set the dtypes
)

# Evaluation points
eval_points = [1.0, 2.0, 3.0, 4.0, 5.0]

# Group by 'id' and apply aggregated KDE
df_kde = df.group_by("id").agg(
    kde=pkde.kde(
        pl.col("a"),
        eval_points=eval_points,
    )
)

print(df_kde)

Example 3: Dynamic Evaluations

import polars as pl
import polars_kde as pkde

# Sample DataFrame with a as pl.List(pl.Float32)
df = pl.DataFrame(
    {
        "a": [[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]],
        "eval_points": [[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]],
    },
    schema={"a": pl.List(pl.Float32), "eval_points": pl.List(pl.Float32)},
)

# Apply dynamic KDE evaluations
df_kde = df.with_columns(
    kde=pkde.kde_dynamic_evals(
        pl.col("a"),
        pl.col("eval_points"),
    )
)

print(df_kde)

Benchmark

After various tests and experiments, it turns out that it is usually a good idea to use kde to calculate KDE's. By the construction of polars, the single groups handled in parallel.

The following non representative benchmark compares the performance of different KDE implementations using various DataFrame sizes. We also compare against the scipy implementation, wich is not parallelized and applied using map_elements in polars.

Benchmark Results

The benchmark evaluates the total time taken for KDE computations across different numbers of rows and groups. If we have say 1000 rows and 10 groups, we calculate the KDE for each of the groups and get 10 KDE's evaluated at a fixed set of evaluation points. Note that kde_dynamic_evals also allows for a variable number of evaluation points per group.

If you want to run the benchmark yourself, you can use the following command:

make bench

NOTE: The benchmark.py file is actually a marimo notebook. Since it was created in sandbox mode the dependencies are part of the header information and we do not need to include any additional dependencies in the pyproject.toml file. Great.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Polars KDE provides Kernel Density Estimation (KDE) functionalities powered by the Polars DataFrame library.

Limitations and further improvements

  • The current implementation only supports float32 dtypes and uses the Gaussian kernel with Silverman bandwith estimation. Should be extended to support various kernels, bandwiths and dtypes.
  • The current implementation only supports 1D KDE's. Should be extended to support 2D KDE's.
  • The underlying rust implementation is not yet optimized for performance, especially for large datasets.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

polars_kde-0.1.3.tar.gz (101.3 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

polars_kde-0.1.3-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.1 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ x86-64

polars_kde-0.1.3-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (3.8 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ ARM64

polars_kde-0.1.3-cp38-abi3-macosx_11_0_arm64.whl (3.5 MB view details)

Uploaded CPython 3.8+macOS 11.0+ ARM64

polars_kde-0.1.3-cp38-abi3-macosx_10_12_x86_64.whl (3.8 MB view details)

Uploaded CPython 3.8+macOS 10.12+ x86-64

File details

Details for the file polars_kde-0.1.3.tar.gz.

File metadata

  • Download URL: polars_kde-0.1.3.tar.gz
  • Upload date:
  • Size: 101.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: maturin/1.8.2

File hashes

Hashes for polars_kde-0.1.3.tar.gz
Algorithm Hash digest
SHA256 8bc1a05dec3f7fdc68591a12c9f1b1fb3c5df11c7245bb00c47e9bc873acdff9
MD5 cc5badd399048773ffcbefb43f2f3662
BLAKE2b-256 4d78ed140ae17df38c03514cce07c11a0721d439c1e294eb1e0d023502b6093e

See more details on using hashes here.

File details

Details for the file polars_kde-0.1.3-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for polars_kde-0.1.3-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 8e1c23df7ae189bff123f9026636c43ba4d43da06cac5acb89ad09b372d0680e
MD5 e72815917935531df99708dfb5945b79
BLAKE2b-256 fa5eb4370c10b96edc786501e69949c50f067018c1fc7196594a9ef90cae01bc

See more details on using hashes here.

File details

Details for the file polars_kde-0.1.3-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for polars_kde-0.1.3-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 b5a117f97b70d6eb50a4dcee7b6ce11a500438973733be17f601d7c9e01caa0a
MD5 2a5e9433502c3b089ae13134d5eb4976
BLAKE2b-256 e765d4bd71021e0f46c3fff63daa7d14232b2a90543879e7b607582c62798705

See more details on using hashes here.

File details

Details for the file polars_kde-0.1.3-cp38-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for polars_kde-0.1.3-cp38-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 22f232cefaebfcd453a45a493c55ce168a31a3efc53f308340372a7d4f33d6a8
MD5 e9dcceeb69d2bc8088d185c00a261422
BLAKE2b-256 bff505057e74dd18147984db4b7e48e5cc3081b3d30e0a5a42b1bcb302d24521

See more details on using hashes here.

File details

Details for the file polars_kde-0.1.3-cp38-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for polars_kde-0.1.3-cp38-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 49f256f4dff6a550e4fc2bf757c708bed222e0e69172ba9971a15538b01f3b86
MD5 813eb9bfe0361a3fa7da3774123dba29
BLAKE2b-256 848d07a243cdc74d0a978cd323c4143d7cf0feee78ff08dfe23d3194f3864b0b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page