Skip to main content

High-performance statistical operations for Polars DataFrames

Project description

causers

PyPI Version Python Versions License: MIT Documentation Status

causers is a statistical library for Python that implements regression and causal inference methods directly on Polars DataFrames. It is written in Rust to ensure efficient performance and memory safety.

Purpose

Data scientists working with Polars often face a friction point when they need to run statistical models: they must convert their efficient Polars DataFrames into pandas or NumPy arrays to use libraries like statsmodels or scikit-learn. This conversion can be costly in terms of memory and time, especially for large datasets.

causers solves this by providing native statistical routines that operate directly on Polars data. It uses Rust's linear algebra capabilities to perform computations efficiently, supporting standard errors, fixed effects, and bootstrap methods without the overhead of data conversion.

Installation

You can install causers via pip. Pre-built wheels are available for Linux, macOS, and Windows.

# Standard installation
pip install causers

# To include pandas support (if you need to pass pandas DataFrames)
pip install causers[pandas]

Usage

Here is a practical example of running a linear regression with robust standard errors on a Polars DataFrame.

import polars as pl
import causers

# Create a sample DataFrame
df = pl.DataFrame({
    "x1": [1.0, 2.0, 3.0, 4.0, 5.0, 6.0],
    "x2": [0.5, 0.5, 1.0, 1.0, 1.5, 1.5],
    "y": [2.1, 3.9, 6.2, 7.8, 10.1, 12.0],
    "group": [1, 1, 2, 2, 3, 3]
})

# Run OLS regression: y ~ x1 + x2
# Using HC3 robust standard errors by default
result = causers.linear_regression(df, x_cols=["x1", "x2"], y_col="y")

print(f"R²: {result.r_squared:.4f}")
for i, (coef, se) in enumerate(zip(result.coefficients, result.standard_errors)):
    print(f"x{i+1}: {coef:.4f} ± {se:.4f}")

# Run with cluster-robust standard errors
clustered_result = causers.linear_regression(
    df, x_cols=["x1", "x2"], y_col="y", cluster="group"
)

The library supports Python type hints, so your IDE should provide autocompletion for function arguments and result objects.

Features

Regression Models

  • Linear Regression (OLS): Supports single and multiple covariates.
  • Logistic Regression: Implemented via Newton-Raphson optimization for binary outcomes.
  • Robust Inference: HC3 heteroskedasticity-consistent standard errors are used by default for OLS.

Panel Data & Fixed Effects

  • Fixed Effects: Absorb high-dimensional fixed effects (e.g., unit and time) efficiently.
  • Clustered Standard Errors: Compute cluster-robust standard errors for grouped data.
  • Bootstrap Inference: Implements Wild Cluster Bootstrap for linear models and Score Bootstrap for logistic models (recommended for small cluster counts).
  • Mundlak Approach: Supports fixed effects in logistic regression via the Mundlak transformation.

Causal Inference

  • Synthetic Difference-in-Differences (SDID): Implements the Arkhangelsky et al. (2021) estimator with placebo bootstrap for inference.
  • Synthetic Control (SC): Includes Traditional, Penalized, Robust, and Augmented variants.
  • Double Machine Learning (DML): Debiased inference using cross-fitting (Chernozhukov et al., 2018).
  • Instrumental Variables (2SLS): Two-Stage Least Squares estimation for endogeneity correction.

Diagnostics

  • Covariate Balance (balance_check): Computes group means, variances, standard deviations, Standardized Mean Differences (SMD), variance ratios, and effective sample sizes (ESS) for treatment vs. control groups. Supports weighted analysis (e.g., inverse-propensity weights), automatic categorical expansion, and boolean covariates.

Performance

  • Rust Core: All heavy lifting (matrix factorization, optimization loops) happens in Rust.
  • Parallelism: Bootstrap methods (like Wild Cluster Bootstrap) utilize multi-threading via Rayon.
  • Memory Efficiency: Zero-copy data access where possible.

Documentation

Full documentation, including API references and theoretical background for the implemented methods, is available at causers.readthedocs.io.

Development

To build causers from source, you will need the Rust toolchain (cargo) and a Python environment.

# Clone the repository
git clone https://github.com/causers/causers.git
cd causers

# Create a virtual environment
python -m venv .venv
source .venv/bin/activate

# Install development dependencies and build the Rust extension
pip install -e ".[dev]"
maturin develop --release

Running Tests

The test suite uses pytest.

# Run all tests
pytest tests/

# Run performance benchmarks (skipped by default)
pytest tests/test_performance.py

License

This project is licensed under the MIT License. See the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

causers-0.8.0.tar.gz (964.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

causers-0.8.0-cp38-abi3-manylinux_2_39_x86_64.whl (4.7 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.39+ x86-64

File details

Details for the file causers-0.8.0.tar.gz.

File metadata

  • Download URL: causers-0.8.0.tar.gz
  • Upload date:
  • Size: 964.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: maturin/1.11.5

File hashes

Hashes for causers-0.8.0.tar.gz
Algorithm Hash digest
SHA256 85676abe796d99b8a1198baf6b18320d7ad02c5c3e262059c281acf66d764659
MD5 633b8e9c00fe132e5e8680f96f93d56f
BLAKE2b-256 c76bd19cc4e88e2f5210bee893344ce0885451fc30f11abb15816d0a10e0a5ae

See more details on using hashes here.

File details

Details for the file causers-0.8.0-cp38-abi3-manylinux_2_39_x86_64.whl.

File metadata

File hashes

Hashes for causers-0.8.0-cp38-abi3-manylinux_2_39_x86_64.whl
Algorithm Hash digest
SHA256 2e0d977f93af67a397e82bd4fe83e38ebd894e4cd0cfab1602ad560313b6aa49
MD5 6ad5bb70a0e8082c3cce0cc457c5ab82
BLAKE2b-256 21a534503569203c241502bb896e859b9a8a8cdffeb59d658497bdd4f04de5f3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page