Skip to main content

Statistical tests, regression, and machine learning for Polars - t-tests, ANOVA, chi-square, correlation, OLS, GLM, quantile regression, and more

Project description

polars-statistics

CI codecov PyPI version License: MIT Python 3.9+

Note: This extension is in early stage development. APIs may change and some features are experimental.

High-performance statistical testing and regression for Polars DataFrames, powered by Rust.

Features

  • Native Polars Expressions: Full support for group_by, over, and lazy evaluation
  • Statistical Tests: Parametric, non-parametric, distributional, and forecast comparison tests
  • Regression Models: OLS, Ridge, Elastic Net, WLS, Quantile, Isotonic, GLMs, ALM (24+ distributions)
  • Diagnostics: Condition number, quasi-separation detection for GLMs
  • Formula Syntax: R-style formulas with polynomial and interaction effects
  • High Performance: Rust-powered with zero-copy data transfer

Installation

pip install polars-statistics

Quick Start

All functions work as Polars expressions, integrating with group_by and over:

import polars as pl
import polars_statistics as ps

df = pl.DataFrame({
    "group": ["A"] * 50 + ["B"] * 50,
    "y": [...],
    "x1": [...],
    "x2": [...],
})

# Run OLS regression per group
result = df.group_by("group").agg(
    ps.ols("y", "x1", "x2").alias("model")
)

# Extract results from struct
result.with_columns(
    pl.col("model").struct.field("r_squared"),
    pl.col("model").struct.field("coefficients"),
)

Statistical Tests

Statistical tests are powered by anofox-statistics, providing full API parity with R's statistical functions and validated against R implementations.

# Parametric tests
ps.ttest_ind("treatment", "control", alternative="two-sided")
ps.ttest_paired("before", "after")

# Non-parametric tests
ps.mann_whitney_u("x", "y")
ps.kruskal_wallis("group1", "group2", "group3")

# Normality tests
ps.shapiro_wilk("x")

# Forecast comparison
ps.diebold_mariano("errors1", "errors2", horizon=1)

# Correlation tests
ps.pearson("x", "y")                    # Pearson correlation with CI
ps.spearman("x", "y")                   # Spearman rank correlation
ps.kendall("x", "y", variant="b")       # Kendall's tau
ps.distance_cor("x", "y")               # Distance correlation (detects nonlinear)
ps.partial_cor("x", "y", ["z1", "z2"])  # Partial correlation

# Categorical tests
ps.binom_test(successes=7, n=10, p0=0.5)  # Exact binomial test
ps.chisq_test("counts", n_rows=2, n_cols=2)  # Chi-square independence
ps.fisher_exact(a=10, b=2, c=3, d=15)   # Fisher's exact test
ps.mcnemar_test(a=45, b=15, c=5, d=35)  # McNemar's test
ps.cohen_kappa("counts", n_categories=3) # Inter-rater agreement
ps.cramers_v("counts", n_rows=3, n_cols=3) # Association strength

All tests return a struct with statistic and p_value fields.

TOST Equivalence Tests

Test for practical equivalence using Two One-Sided Tests (TOST) procedure:

# t-test based equivalence
ps.tost_t_test_two_sample("x", "y", delta=0.5, alpha=0.05)
ps.tost_t_test_paired("before", "after", bounds_type="cohen_d", delta=0.3)

# Correlation equivalence (test if correlation is near zero)
ps.tost_correlation("x", "y", delta=0.3, method="pearson")

# Proportion equivalence
ps.tost_prop_two(successes1=45, n1=100, successes2=48, n2=100, delta=0.1)

# Non-parametric and robust equivalence
ps.tost_wilcoxon_paired("x", "y", delta=0.5)
ps.tost_yuen("x", "y", trim=0.2, delta=0.5)  # Trimmed means
ps.tost_bootstrap("x", "y", n_bootstrap=1000)  # Bootstrap-based

Returns struct with estimate, ci_lower, ci_upper, tost_p_value, equivalent.

Regression Models

Regression models are powered by anofox-regression, providing validated implementations against R.

Expression API

# Linear models
ps.ols("y", "x1", "x2")
ps.ridge("y", "x1", "x2", lambda_=1.0)
ps.elastic_net("y", "x1", "x2", lambda_=1.0, alpha=0.5)

# Robust regression
ps.quantile("y", "x1", "x2", tau=0.5)  # Median regression
ps.isotonic("y", "x")                   # Monotonic regression

# GLM models (with optional Ridge regularization)
ps.logistic("y", "x1", "x2", lambda_=0.1)  # Binary classification
ps.poisson("y", "x1", "x2")                 # Count data

# ALM - 24+ distributions
ps.alm("y", "x1", "x2", distribution="laplace")  # Robust to outliers

# Diagnostics
ps.condition_number("x1", "x2")            # Multicollinearity check
ps.check_binary_separation("y", "x1")      # Quasi-separation detection
ps.check_count_sparsity("y", "x1")         # Sparse count data check

Formula Syntax

R-style formulas with polynomial and interaction effects:

# Main effects + interaction
ps.ols_formula("y ~ x1 * x2")  # Expands to: x1 + x2 + x1:x2

# Polynomial regression (centered per group)
ps.ols_formula("y ~ poly(x, 2)")

# Explicit transform
ps.ols_formula("y ~ x1 + I(x^2)")

Predictions with Intervals

df.with_columns(
    ps.ols_predict("y", "x1", "x2", interval="prediction", level=0.95)
        .over("group").alias("pred")
).unnest("pred")  # Columns: prediction, lower, upper

Tidy Coefficient Summary

df.group_by("group").agg(
    ps.ols_summary("y", "x1", "x2").alias("coef")
).explode("coef").unnest("coef")
# Columns: term, estimate, std_error, statistic, p_value

Model Classes

For direct model access outside Polars expressions:

from polars_statistics import OLS, Ridge, Logistic, ALM

# Fit model
model = OLS(compute_inference=True).fit(X, y)
print(model.coefficients, model.r_squared, model.p_values)

# ALM with various distributions
alm = ALM.laplace().fit(X, y)  # Robust to outliers

Test Model Classes

Statistical tests are also available as model classes with .fit(), .statistic, .p_value, and .summary():

from polars_statistics import TTestInd, ShapiroWilk, KruskalWallis
import numpy as np

# Two-sample t-test
test = TTestInd(alternative="two-sided").fit(x, y)
print(test.statistic, test.p_value)
print(test.summary())

# Normality test
test = ShapiroWilk().fit(x)
print(test.p_value)

# Multi-group comparison
test = KruskalWallis().fit(g1, g2, g3)
print(test.summary())

Available test classes: TTestInd, TTestPaired, BrownForsythe, YuenTest, MannWhitneyU, WilcoxonSignedRank, KruskalWallis, BrunnerMunzel, ShapiroWilk, DAgostino.

Documentation

For the legacy monolithic reference, see docs/API_REFERENCE.md.

Performance

Built on high-performance Rust libraries:

  • faer: Fast linear algebra with SIMD
  • Zero-copy: Direct memory sharing between Python and Rust
  • Automatic parallelization: For group_by operations

Development

git clone https://github.com/DataZooDE/polars-statistics.git
cd polars-statistics
python -m venv .venv && source .venv/bin/activate
pip install maturin numpy polars pytest
maturin develop --release
pytest

License

MIT License - see LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

polars_statistics-0.4.0.tar.gz (255.6 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

polars_statistics-0.4.0-cp39-abi3-win_amd64.whl (7.5 MB view details)

Uploaded CPython 3.9+Windows x86-64

polars_statistics-0.4.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (6.9 MB view details)

Uploaded CPython 3.9+manylinux: glibc 2.17+ x86-64

polars_statistics-0.4.0-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (6.0 MB view details)

Uploaded CPython 3.9+manylinux: glibc 2.17+ ARM64

polars_statistics-0.4.0-cp39-abi3-macosx_11_0_arm64.whl (5.8 MB view details)

Uploaded CPython 3.9+macOS 11.0+ ARM64

polars_statistics-0.4.0-cp39-abi3-macosx_10_12_x86_64.whl (6.5 MB view details)

Uploaded CPython 3.9+macOS 10.12+ x86-64

File details

Details for the file polars_statistics-0.4.0.tar.gz.

File metadata

  • Download URL: polars_statistics-0.4.0.tar.gz
  • Upload date:
  • Size: 255.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for polars_statistics-0.4.0.tar.gz
Algorithm Hash digest
SHA256 f5438bca07d325a56c0566f143fdcc52df336d329b64f87cfbeea06e735b2d99
MD5 3390708de4804a936ba62846726a1ddb
BLAKE2b-256 1cf217e60d86f1cd1e584b4c89175154408b3c0d84191ba95beec1d0c8bb7fb6

See more details on using hashes here.

Provenance

The following attestation bundles were made for polars_statistics-0.4.0.tar.gz:

Publisher: publish.yml on DataZooDE/polars-statistics

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file polars_statistics-0.4.0-pp310-pypy310_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for polars_statistics-0.4.0-pp310-pypy310_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 d2df21991b06f6cc52d2690bb658374f50a6d476df058518835c31e7192c95c0
MD5 72d3670f4f7614f85b9d791be639fa67
BLAKE2b-256 6678c761863d198e53249c95ec9a7bde84cc8ee29496777c7410d14b8a83acd7

See more details on using hashes here.

Provenance

The following attestation bundles were made for polars_statistics-0.4.0-pp310-pypy310_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: publish.yml on DataZooDE/polars-statistics

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file polars_statistics-0.4.0-pp39-pypy39_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for polars_statistics-0.4.0-pp39-pypy39_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 e32210a1965938fdc4e4d6e0326aaa2697fa41f2b6f2dfb047fb11ecd43cbf32
MD5 e0f74143ec163ecd14ea643009b01008
BLAKE2b-256 c34ec66383f6dc5dbcdaf7874ae2760664fd20c585b73fdacb36d3cf61dd2ef6

See more details on using hashes here.

Provenance

The following attestation bundles were made for polars_statistics-0.4.0-pp39-pypy39_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: publish.yml on DataZooDE/polars-statistics

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file polars_statistics-0.4.0-cp39-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for polars_statistics-0.4.0-cp39-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 5b5425c2805546daecf9f5712b80af1b5037cfef415ad96870fe51262fed50aa
MD5 08f1f841be2b98831405595401481839
BLAKE2b-256 92e57da83e2fdb6a7069159e1926330d4394f710676bbd639ed74b23529a9e6a

See more details on using hashes here.

Provenance

The following attestation bundles were made for polars_statistics-0.4.0-cp39-abi3-win_amd64.whl:

Publisher: publish.yml on DataZooDE/polars-statistics

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file polars_statistics-0.4.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for polars_statistics-0.4.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 7d00732c59618cadbc67c63e613fca033940694be5e71fbda608d079f6cf79df
MD5 5e355b528321a39c7175fae47fa60044
BLAKE2b-256 e52acecf4d0fae3a6ad27b172d87a0f30ddb45745f070b7f04284ce65deadbf0

See more details on using hashes here.

Provenance

The following attestation bundles were made for polars_statistics-0.4.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: publish.yml on DataZooDE/polars-statistics

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file polars_statistics-0.4.0-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for polars_statistics-0.4.0-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 6a0ad8149c0e6603869e9496c8b330649ff7095282bbe6b897c043246bebcc8c
MD5 1d80be8fa95ac99eb6427e50145b8240
BLAKE2b-256 cec06419b66b8b6e5d8e37b366101dd2171c84f7ab2cd5911e777fa6ab9cbbfa

See more details on using hashes here.

Provenance

The following attestation bundles were made for polars_statistics-0.4.0-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: publish.yml on DataZooDE/polars-statistics

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file polars_statistics-0.4.0-cp39-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for polars_statistics-0.4.0-cp39-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 576e50e45ebda7d18fddf32db3da24048035ebf4c83eba2b0e9877f4a1799384
MD5 f1b70380cb8fc18c616b46b5424f08fa
BLAKE2b-256 2c9032c9609af5bc8d3fdec807959c3c169d73c66512f236a4774d3272c49622

See more details on using hashes here.

Provenance

The following attestation bundles were made for polars_statistics-0.4.0-cp39-abi3-macosx_11_0_arm64.whl:

Publisher: publish.yml on DataZooDE/polars-statistics

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file polars_statistics-0.4.0-cp39-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for polars_statistics-0.4.0-cp39-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 bfe4bf6ccc430813874f62bbd435c889b1d6f7ddf2cd6b1e948f0ef092ff14c6
MD5 5c928e85ea3b0a8d5894b51695f63db7
BLAKE2b-256 d22374cfa80255da05618b221305f9b653c171bbcc9fada15e2d37303fc262b8

See more details on using hashes here.

Provenance

The following attestation bundles were made for polars_statistics-0.4.0-cp39-abi3-macosx_10_12_x86_64.whl:

Publisher: publish.yml on DataZooDE/polars-statistics

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page