Skip to main content

Add your description here

Project description

polars_kde

PyPI version

Provides Kernel Density Estimation (KDE) functionalities as Polars Plugin. Under the hood it uses the kernel_density_estimation crate.

Table of Contents

Installation

Install polars_kde using pip or uv:

pip install polars_kde
# or 
uv add polars_kde

See uv for more information on how to use uv to manage your Python dependencies.

Examples

Here are some examples of how to use polars_kde. The library provides three main methods to calculate KDE's:

  1. Static Evaluations: This method calculates KDE's at a fixed set of evaluation points for each group. Works an already aggregated Data of type pl.List(pl.Float32).

  2. Aggregated KDE: This method calculates KDE's at a fixed set of evaluation points for each group and aggregates the results. Works on grouped DataFrames, where each group contains the data of type pl.Float32.

  3. Dynamic Evaluations: This method calculates KDE's at a variable set of evaluation points for each group. Works on already aggregated Data of type pl.List(pl.Float32).

In most scenarios, you will probably want to use the kde method, which works grouped dataframes and parallelizes the KDE calculations across groups.

Example 1: Static Evaluations

import polars as pl
import polars_kde as pkde

# Sample DataFrame with a as pl.Float32
df = pl.DataFrame(
    {
        "a": [1.0, 2.0, 3.0, 4.0, 5.0],
        "id": [0, 0, 1, 1, 1],
    },
    schema={"a": pl.Float32, "id": pl.Int32},  # Explicitly set the dtypes
)

# Evaluation points
eval_points = [1.0, 2.0, 3.0, 4.0, 5.0]

# Group by 'id' and apply KDE
df_kde = (
    df
    .group_by("id")
    .agg(
        "a"
    )
    .with_columns(
        kde=pkde.kde_static_evals(
            "a",
            eval_points=eval_points,
        )
    )
)

print(df_kde)

Example 2: Aggregated KDE

import polars as pl
import polars_kde as pkde

# Sample DataFrame with a as pl.Float32
df = pl.DataFrame(
    {
        "a": [1.0, 2.0, 3.0, 4.0, 5.0],
        "id": [0, 0, 1, 1, 1],
    },
    schema={"a": pl.Float32, "id": pl.Int32},  # explicitly set the dtypes
)

# Evaluation points
eval_points = [1.0, 2.0, 3.0, 4.0, 5.0]

# Group by 'id' and apply aggregated KDE
df_kde = (
    df
    .group_by("id")
    .agg(
        kde=pkde.kde(
            "a",
            eval_points=eval_points,
        )
    )
)

print(df_kde)

Example 3: Dynamic Evaluations

import polars as pl
import polars_kde as pkde

# Sample DataFrame with a as pl.List(pl.Float32)
df = pl.DataFrame(
    {
        "a": [[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]],
        "eval_points": [[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]],
    },
    schema={"a": pl.List(pl.Float32), "eval_points": pl.List(pl.Float32)},
)

# Apply dynamic KDE evaluations
df_kde = (
    df
    .with_columns(
        kde=pkde.kde_dynamic_evals(
            "a",
            "eval_points",
        )
    )
)

print(df_kde)

Benchmark

After various tests and experiments, it turns out that it is usually a good idea to use kde to calculate KDE's. By the construction of polars, the single groups handled in parallel.

The following non representative benchmark compares the performance of different KDE implementations using various DataFrame sizes. We also compare against the scipy implementation, wich is not parallelized and applied using map_elements in polars.

Benchmark Results

The benchmark evaluates the total time taken for KDE computations across different numbers of rows and groups. If we have say 1000 rows and 10 groups, we calculate the KDE for each of the groups and get 10 KDE's evaluated at a fixed set of evaluation points. Note that kde_dynamic_evals also allows for a variable number of evaluation points per group.

If you want to run the benchmark yourself, you can use the following command:

make bench

NOTE: The benchmark.py file is actually a marimo notebook. Since it was created in sandbox mode the dependencies are part of the header information and we do not need to include any additional dependencies in the pyproject.toml file. Great.

Limitations and further improvements

  • The current implementation only supports float32 dtypes and uses the Gaussian kernel with Silverman bandwith estimation. Should be extended to support various kernels, bandwiths and dtypes.
  • The current implementation only supports 1D KDE's. Should be extended to support 2D KDE's.
  • The underlying rust implementation is not yet optimized for performance, especially for large datasets.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

polars_kde-0.1.4.tar.gz (96.6 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

polars_kde-0.1.4-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.1 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ x86-64

polars_kde-0.1.4-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (3.8 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ ARM64

polars_kde-0.1.4-cp38-abi3-macosx_11_0_arm64.whl (3.5 MB view details)

Uploaded CPython 3.8+macOS 11.0+ ARM64

polars_kde-0.1.4-cp38-abi3-macosx_10_12_x86_64.whl (3.8 MB view details)

Uploaded CPython 3.8+macOS 10.12+ x86-64

File details

Details for the file polars_kde-0.1.4.tar.gz.

File metadata

  • Download URL: polars_kde-0.1.4.tar.gz
  • Upload date:
  • Size: 96.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: maturin/1.8.2

File hashes

Hashes for polars_kde-0.1.4.tar.gz
Algorithm Hash digest
SHA256 c0c15059b8848f514eac665b27d9b07862d6a3079d7cb52e61f8a98c4641365a
MD5 272dda8953866a7438eab728947f022a
BLAKE2b-256 d181590cf2b97375dd94a40ff92b37d7f90064c143d0d3e37754796eb52cac36

See more details on using hashes here.

File details

Details for the file polars_kde-0.1.4-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for polars_kde-0.1.4-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 b398d57e9b0345c1ab066476017fabc4d900aef71aeb73940fb1a929f8e63c18
MD5 f11a17f90822a003ec08eda041f2e125
BLAKE2b-256 0383db73ec1ae72f2ee52481b3fab81189b26cee04c0be5e3df9dddc2222f7e3

See more details on using hashes here.

File details

Details for the file polars_kde-0.1.4-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for polars_kde-0.1.4-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 222189abfe0d25ba15e30928ae8357e6cbf43a62e114f9f3981f379b2583a64f
MD5 1cc9f646f2a562713ec07bf210093275
BLAKE2b-256 d43fa816b9a4d314549fa1ae8708743013638950e60e89ad38f2eb2108d46633

See more details on using hashes here.

File details

Details for the file polars_kde-0.1.4-cp38-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for polars_kde-0.1.4-cp38-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 b7d7c49c7513cc02e3ecd7f16bb65a03f25f3c75f4381a28a3c6af0922a5a17c
MD5 217b460cfbe5d81dad3b179ff886fa42
BLAKE2b-256 98070ab0cabe52a2fce7cc2b564333aacb5bcaaf90098ee17c0d325f554a16c3

See more details on using hashes here.

File details

Details for the file polars_kde-0.1.4-cp38-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for polars_kde-0.1.4-cp38-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 d03524a9bc2eb9d0a5879541ab4506583695dc52d775759735a27a38b93bd826
MD5 341290ac51428aa1e06556cf9d5cd313
BLAKE2b-256 629b9ad0342be8538508b3c3b7c584a020137f03f5491d9fb837ca9a89f8382b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page