Skip to main content

High-performance statistical testing and regression for Polars DataFrames, powered by Rust

Project description

polars-statistics

CI codecov PyPI version License: MIT Python 3.9+

Note: This extension is in early stage development. APIs may change and some features are experimental.

High-performance statistical testing and regression for Polars DataFrames, powered by Rust.

Features

  • Native Polars Expressions: Full support for group_by, over, and lazy evaluation
  • Statistical Tests: Parametric, non-parametric, distributional, and forecast comparison tests
  • Regression Models: OLS, Ridge, Elastic Net, WLS, GLMs, ALM (24+ distributions)
  • Formula Syntax: R-style formulas with polynomial and interaction effects
  • High Performance: Rust-powered with zero-copy data transfer

Installation

pip install polars-statistics

Quick Start

All functions work as Polars expressions, integrating with group_by and over:

import polars as pl
import polars_statistics as ps

df = pl.DataFrame({
    "group": ["A"] * 50 + ["B"] * 50,
    "y": [...],
    "x1": [...],
    "x2": [...],
})

# Run OLS regression per group
result = df.group_by("group").agg(
    ps.ols("y", "x1", "x2").alias("model")
)

# Extract results from struct
result.with_columns(
    pl.col("model").struct.field("r_squared"),
    pl.col("model").struct.field("coefficients"),
)

Statistical Tests

# Parametric tests
ps.ttest_ind("treatment", "control", alternative="two-sided")
ps.ttest_paired("before", "after")

# Non-parametric tests
ps.mann_whitney_u("x", "y")
ps.kruskal_wallis("group1", "group2", "group3")

# Normality tests
ps.shapiro_wilk("x")

# Forecast comparison
ps.diebold_mariano("errors1", "errors2", horizon=1)

All tests return a struct with statistic and p_value fields.

Regression Models

Expression API

# Linear models
ps.ols("y", "x1", "x2")
ps.ridge("y", "x1", "x2", lambda_=1.0)
ps.elastic_net("y", "x1", "x2", lambda_=1.0, alpha=0.5)

# GLM models
ps.logistic("y", "x1", "x2")      # Binary classification
ps.poisson("y", "x1", "x2")       # Count data

# ALM - 24+ distributions
ps.alm("y", "x1", "x2", distribution="laplace")  # Robust to outliers

Formula Syntax

R-style formulas with polynomial and interaction effects:

# Main effects + interaction
ps.ols_formula("y ~ x1 * x2")  # Expands to: x1 + x2 + x1:x2

# Polynomial regression (centered per group)
ps.ols_formula("y ~ poly(x, 2)")

# Explicit transform
ps.ols_formula("y ~ x1 + I(x^2)")

Predictions with Intervals

df.with_columns(
    ps.ols_predict("y", "x1", "x2", interval="prediction", level=0.95)
        .over("group").alias("pred")
).unnest("pred")  # Columns: prediction, lower, upper

Tidy Coefficient Summary

df.group_by("group").agg(
    ps.ols_summary("y", "x1", "x2").alias("coef")
).explode("coef").unnest("coef")
# Columns: term, estimate, std_error, statistic, p_value

Model Classes

For direct model access outside Polars expressions:

from polars_statistics import OLS, Ridge, Logistic, ALM

# Fit model
model = OLS(compute_inference=True).fit(X, y)
print(model.coefficients, model.r_squared, model.p_values)

# ALM with various distributions
alm = ALM.laplace().fit(X, y)  # Robust to outliers

Test Model Classes

Statistical tests are also available as model classes with .fit(), .statistic, .p_value, and .summary():

from polars_statistics import TTestInd, ShapiroWilk, KruskalWallis
import numpy as np

# Two-sample t-test
test = TTestInd(alternative="two-sided").fit(x, y)
print(test.statistic, test.p_value)
print(test.summary())

# Normality test
test = ShapiroWilk().fit(x)
print(test.p_value)

# Multi-group comparison
test = KruskalWallis().fit(g1, g2, g3)
print(test.summary())

Available test classes: TTestInd, TTestPaired, BrownForsythe, YuenTest, MannWhitneyU, WilcoxonSignedRank, KruskalWallis, BrunnerMunzel, ShapiroWilk, DAgostino.

API Reference

See docs/API_REFERENCE.md for complete documentation of all functions, parameters, and output structures.

Performance

Built on high-performance Rust libraries:

  • faer: Fast linear algebra with SIMD
  • Zero-copy: Direct memory sharing between Python and Rust
  • Automatic parallelization: For group_by operations

Development

git clone https://github.com/DataZooDE/polars-statistics.git
cd polars-statistics
python -m venv .venv && source .venv/bin/activate
pip install maturin numpy polars pytest
maturin develop --release
pytest

License

MIT License - see LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

polars_statistics-0.1.0.tar.gz (92.7 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

polars_statistics-0.1.0-cp39-abi3-win_amd64.whl (7.0 MB view details)

Uploaded CPython 3.9+Windows x86-64

polars_statistics-0.1.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (6.4 MB view details)

Uploaded CPython 3.9+manylinux: glibc 2.17+ x86-64

polars_statistics-0.1.0-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (5.6 MB view details)

Uploaded CPython 3.9+manylinux: glibc 2.17+ ARM64

polars_statistics-0.1.0-cp39-abi3-macosx_11_0_arm64.whl (5.4 MB view details)

Uploaded CPython 3.9+macOS 11.0+ ARM64

polars_statistics-0.1.0-cp39-abi3-macosx_10_12_x86_64.whl (6.0 MB view details)

Uploaded CPython 3.9+macOS 10.12+ x86-64

File details

Details for the file polars_statistics-0.1.0.tar.gz.

File metadata

  • Download URL: polars_statistics-0.1.0.tar.gz
  • Upload date:
  • Size: 92.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for polars_statistics-0.1.0.tar.gz
Algorithm Hash digest
SHA256 a82c94fb28d8270e69e5b5505954ed072f8a5ee32d095608a8ae24e9f718ffaa
MD5 fadec1fbf7fb747759d8357690a582d0
BLAKE2b-256 5805dacfab3f13788b596fbb61d4af7d85b8820c3eb503240046775f6a9905ae

See more details on using hashes here.

File details

Details for the file polars_statistics-0.1.0-pp310-pypy310_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for polars_statistics-0.1.0-pp310-pypy310_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 888d98584b13ecc147cfa81dec3df543bf24ffc14b5dd71da3445c9e7f4b7b49
MD5 c01d7e99310f3f26ed4757b64ffa1bfa
BLAKE2b-256 33b794ec958b357ff87e63a3384c7d60aa3a1839dfed141d244641d077653fde

See more details on using hashes here.

File details

Details for the file polars_statistics-0.1.0-pp39-pypy39_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for polars_statistics-0.1.0-pp39-pypy39_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 0f30127ca805c2660d5be46e54f29e6ad5d02cef3667ef7c23242bc48f887960
MD5 931cc34842138846195a3f52591c2c6a
BLAKE2b-256 83bdf5ff5557cfc28c2b34119b5e5cbf8475339fe26d4b62aba4899e83011b6e

See more details on using hashes here.

File details

Details for the file polars_statistics-0.1.0-cp39-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for polars_statistics-0.1.0-cp39-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 5f719ef59e80233a4122a26515006d9523fc125cb5ba5eaaac8ceb158285a9ba
MD5 739ce3a81b01fba3bfdf2c6ecbebaea5
BLAKE2b-256 53095cc1af30ddbe530eac4cd33b7fd44bf3ba7c9b2523b0d34b51603e111177

See more details on using hashes here.

File details

Details for the file polars_statistics-0.1.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for polars_statistics-0.1.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 114ac045055aa0c0409ffd04d9690491fa73133a1a2533752dcb5f586e38eb56
MD5 41019aa9b4aff548c58d9d8c3f09f110
BLAKE2b-256 c0d9b339a8710e35275ed345921785b7c905e755f3652d2f09494b43548d0d0b

See more details on using hashes here.

File details

Details for the file polars_statistics-0.1.0-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for polars_statistics-0.1.0-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 a1d0188323d1ccfd50e442906ed1558f18796e8f4d9e9cfc8b317e1f801dbc15
MD5 aa77e72a331daa65e8047c191f1a9331
BLAKE2b-256 301703999d5717985d4507f27d0db01d5909e04866463c569aac78a0541ef0f3

See more details on using hashes here.

File details

Details for the file polars_statistics-0.1.0-cp39-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for polars_statistics-0.1.0-cp39-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 6d77ab350a0f1010aa82ce05ead5c4f14d6cb4d50e936d95664fa1f9e51f3431
MD5 a88e53ea54cdfe49b714f1ff7d15522f
BLAKE2b-256 5b330f67a93ba31096487f1c7b1a7742bc78e09ba4217dafb521ad0d5e33874c

See more details on using hashes here.

File details

Details for the file polars_statistics-0.1.0-cp39-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for polars_statistics-0.1.0-cp39-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 5601699b2a302fe2ab2228c82e94ef5f837d1d394af7221f0b3011343e647d3d
MD5 5b3ebbda34f396600a0ca13894495af3
BLAKE2b-256 5c1b4371dadda0679274bf77716a3c7310521a9383326f847defc50565ac3849

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page