Skip to main content

High-performance statistical testing and regression for Polars DataFrames, powered by Rust

Project description

polars-statistics

CI codecov PyPI version License: MIT Python 3.9+

Note: This extension is in early stage development. APIs may change and some features are experimental.

High-performance statistical testing and regression for Polars DataFrames, powered by Rust.

Features

  • Native Polars Expressions: Full support for group_by, over, and lazy evaluation
  • Statistical Tests: Parametric, non-parametric, distributional, and forecast comparison tests
  • Regression Models: OLS, Ridge, Elastic Net, WLS, GLMs, ALM (24+ distributions)
  • Formula Syntax: R-style formulas with polynomial and interaction effects
  • High Performance: Rust-powered with zero-copy data transfer

Installation

pip install polars-statistics

Quick Start

All functions work as Polars expressions, integrating with group_by and over:

import polars as pl
import polars_statistics as ps

df = pl.DataFrame({
    "group": ["A"] * 50 + ["B"] * 50,
    "y": [...],
    "x1": [...],
    "x2": [...],
})

# Run OLS regression per group
result = df.group_by("group").agg(
    ps.ols("y", "x1", "x2").alias("model")
)

# Extract results from struct
result.with_columns(
    pl.col("model").struct.field("r_squared"),
    pl.col("model").struct.field("coefficients"),
)

Statistical Tests

Statistical tests are powered by anofox-statistics, providing full API parity with R's statistical functions and validated against R implementations.

# Parametric tests
ps.ttest_ind("treatment", "control", alternative="two-sided")
ps.ttest_paired("before", "after")

# Non-parametric tests
ps.mann_whitney_u("x", "y")
ps.kruskal_wallis("group1", "group2", "group3")

# Normality tests
ps.shapiro_wilk("x")

# Forecast comparison
ps.diebold_mariano("errors1", "errors2", horizon=1)

All tests return a struct with statistic and p_value fields.

Regression Models

Regression models are powered by anofox-regression, providing validated implementations against R.

Expression API

# Linear models
ps.ols("y", "x1", "x2")
ps.ridge("y", "x1", "x2", lambda_=1.0)
ps.elastic_net("y", "x1", "x2", lambda_=1.0, alpha=0.5)

# GLM models
ps.logistic("y", "x1", "x2")      # Binary classification
ps.poisson("y", "x1", "x2")       # Count data

# ALM - 24+ distributions
ps.alm("y", "x1", "x2", distribution="laplace")  # Robust to outliers

Formula Syntax

R-style formulas with polynomial and interaction effects:

# Main effects + interaction
ps.ols_formula("y ~ x1 * x2")  # Expands to: x1 + x2 + x1:x2

# Polynomial regression (centered per group)
ps.ols_formula("y ~ poly(x, 2)")

# Explicit transform
ps.ols_formula("y ~ x1 + I(x^2)")

Predictions with Intervals

df.with_columns(
    ps.ols_predict("y", "x1", "x2", interval="prediction", level=0.95)
        .over("group").alias("pred")
).unnest("pred")  # Columns: prediction, lower, upper

Tidy Coefficient Summary

df.group_by("group").agg(
    ps.ols_summary("y", "x1", "x2").alias("coef")
).explode("coef").unnest("coef")
# Columns: term, estimate, std_error, statistic, p_value

Model Classes

For direct model access outside Polars expressions:

from polars_statistics import OLS, Ridge, Logistic, ALM

# Fit model
model = OLS(compute_inference=True).fit(X, y)
print(model.coefficients, model.r_squared, model.p_values)

# ALM with various distributions
alm = ALM.laplace().fit(X, y)  # Robust to outliers

Test Model Classes

Statistical tests are also available as model classes with .fit(), .statistic, .p_value, and .summary():

from polars_statistics import TTestInd, ShapiroWilk, KruskalWallis
import numpy as np

# Two-sample t-test
test = TTestInd(alternative="two-sided").fit(x, y)
print(test.statistic, test.p_value)
print(test.summary())

# Normality test
test = ShapiroWilk().fit(x)
print(test.p_value)

# Multi-group comparison
test = KruskalWallis().fit(g1, g2, g3)
print(test.summary())

Available test classes: TTestInd, TTestPaired, BrownForsythe, YuenTest, MannWhitneyU, WilcoxonSignedRank, KruskalWallis, BrunnerMunzel, ShapiroWilk, DAgostino.

API Reference

See docs/API_REFERENCE.md for complete documentation of all functions, parameters, and output structures.

Performance

Built on high-performance Rust libraries:

  • faer: Fast linear algebra with SIMD
  • Zero-copy: Direct memory sharing between Python and Rust
  • Automatic parallelization: For group_by operations

Development

git clone https://github.com/DataZooDE/polars-statistics.git
cd polars-statistics
python -m venv .venv && source .venv/bin/activate
pip install maturin numpy polars pytest
maturin develop --release
pytest

License

MIT License - see LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

polars_statistics-0.2.0.tar.gz (141.4 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

polars_statistics-0.2.0-cp39-abi3-win_amd64.whl (7.1 MB view details)

Uploaded CPython 3.9+Windows x86-64

polars_statistics-0.2.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (6.5 MB view details)

Uploaded CPython 3.9+manylinux: glibc 2.17+ x86-64

polars_statistics-0.2.0-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (5.7 MB view details)

Uploaded CPython 3.9+manylinux: glibc 2.17+ ARM64

polars_statistics-0.2.0-cp39-abi3-macosx_11_0_arm64.whl (5.5 MB view details)

Uploaded CPython 3.9+macOS 11.0+ ARM64

polars_statistics-0.2.0-cp39-abi3-macosx_10_12_x86_64.whl (6.1 MB view details)

Uploaded CPython 3.9+macOS 10.12+ x86-64

File details

Details for the file polars_statistics-0.2.0.tar.gz.

File metadata

  • Download URL: polars_statistics-0.2.0.tar.gz
  • Upload date:
  • Size: 141.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for polars_statistics-0.2.0.tar.gz
Algorithm Hash digest
SHA256 47c8b5156d057c4e2f5508e31294fc5c0eadc9465a869251c911c1984541f5c0
MD5 dbb0ff23ca13d30c5a5fd253e4ab88b3
BLAKE2b-256 6301e315aa4a6e79f18bb569d7084547e619345b0085bb447786c5be7d4a54c0

See more details on using hashes here.

Provenance

The following attestation bundles were made for polars_statistics-0.2.0.tar.gz:

Publisher: publish.yml on DataZooDE/polars-statistics

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file polars_statistics-0.2.0-pp310-pypy310_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for polars_statistics-0.2.0-pp310-pypy310_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 39e9b67ff5da036590c41253c5c7d98f0f6d8074d6514a3fac85db5c64fdc0fa
MD5 46d04ec85b3d61072edca1b820883be8
BLAKE2b-256 3bd2f4fb87518be6734860450882fbd5840efaf428f1c5bfad628b2320f110c8

See more details on using hashes here.

Provenance

The following attestation bundles were made for polars_statistics-0.2.0-pp310-pypy310_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: publish.yml on DataZooDE/polars-statistics

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file polars_statistics-0.2.0-pp39-pypy39_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for polars_statistics-0.2.0-pp39-pypy39_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 8a778985307482fc11b086e9eb1d5baf26d81102240f5a7ef7c35d877d4d1835
MD5 42e71384df5b6acef43edb1052d454dc
BLAKE2b-256 0b5311595ebbaa2c3a8a9a7a02e9da66b0192c3f44dd9a348b05dc698d1298e3

See more details on using hashes here.

Provenance

The following attestation bundles were made for polars_statistics-0.2.0-pp39-pypy39_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: publish.yml on DataZooDE/polars-statistics

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file polars_statistics-0.2.0-cp39-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for polars_statistics-0.2.0-cp39-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 fbce339d865d0c64f1ba9264e6522c614ad22a3e505a051f5e6bf8f697db87ed
MD5 c0cfe27a7dab3a9f7e56883c937812e7
BLAKE2b-256 a09b38288930d3ace86e2951db72b8a92fad6283819c377af3b40001c5ae4c92

See more details on using hashes here.

Provenance

The following attestation bundles were made for polars_statistics-0.2.0-cp39-abi3-win_amd64.whl:

Publisher: publish.yml on DataZooDE/polars-statistics

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file polars_statistics-0.2.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for polars_statistics-0.2.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 c5d8e1835a28a6e0cdc28b4a2663c2aeb9955daa55ab20b97968e63054facb7d
MD5 e6e261ade8290a7458a89e44e79e78cd
BLAKE2b-256 5b66520a76890979cf81adcdc5b2e11f3a032082fd587435302ddabd0f2b67be

See more details on using hashes here.

Provenance

The following attestation bundles were made for polars_statistics-0.2.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: publish.yml on DataZooDE/polars-statistics

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file polars_statistics-0.2.0-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for polars_statistics-0.2.0-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 c057f5e85ac3f842054e7ee01f02eb6aa2c7dfabfc99d6e016fe58b4cdbc1ba0
MD5 c08e1a97792d6185bab2c75aa1d1686d
BLAKE2b-256 a12ea2017e217707ef438e78e7b6fde43e31ebea704c1a848f14419c02aa8a55

See more details on using hashes here.

Provenance

The following attestation bundles were made for polars_statistics-0.2.0-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: publish.yml on DataZooDE/polars-statistics

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file polars_statistics-0.2.0-cp39-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for polars_statistics-0.2.0-cp39-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 3086a4bd9a81d05c3366568ee9c5575e4e766cc3a1e780b6257a50bc7a7529e7
MD5 dfdff4bb21f83c56428a7eac1ebb4468
BLAKE2b-256 c45058e55996f0db87a9f3d89fe2b367bfea7bc4704e4c0177e7b55658585b11

See more details on using hashes here.

Provenance

The following attestation bundles were made for polars_statistics-0.2.0-cp39-abi3-macosx_11_0_arm64.whl:

Publisher: publish.yml on DataZooDE/polars-statistics

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file polars_statistics-0.2.0-cp39-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for polars_statistics-0.2.0-cp39-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 d74e3c07515cebcb63f0904db1d003deeeaa46c7dadf910ffd93b1989d847c19
MD5 fb4f803e27cc022fc95a8e1d23156ff2
BLAKE2b-256 138188f803f1da0043570160b48faac31a6fef6e03f2657399ac5eda315628fb

See more details on using hashes here.

Provenance

The following attestation bundles were made for polars_statistics-0.2.0-cp39-abi3-macosx_10_12_x86_64.whl:

Publisher: publish.yml on DataZooDE/polars-statistics

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page