Skip to main content

High-performance statistical computing library written in Rust

Project description

StatOxide: High-performance statistical computing in Rust with Python bindings

StatOxide is a modern, high-performance statistical computing library written in Rust, with comprehensive Python bindings via PyO3. Designed for data scientists, statisticians, and researchers who need both performance and productivity.

๐Ÿš€ Features

๐Ÿ“Š Core Data Structures

  • Series: Columnar data with metadata (name, dtype, levels)
  • DataFrame: Tabular data structure with column operations
  • Formula: R-style formula parsing for model specification

๐Ÿ“ˆ Statistical Functions

  • Descriptive Statistics: Mean, variance, skewness, kurtosis, quantiles
  • Probability Distributions: 12 continuous + 6 discrete distributions
  • Statistical Tests: t-test, chi-square, ANOVA, correlation tests
  • Correlation Measures: Pearson, Spearman, Kendall tau

๐Ÿงฎ Statistical Models

  • Linear Models: OLS, Ridge, Lasso, Elastic Net with proper inference
  • Generalized Linear Models: Logistic, Poisson, Gamma, Negative Binomial regression
  • Mixed Effects Models: Linear and GLMMs with EM algorithm estimation
  • Robust Statistics: M-estimators, S-estimators, MM-estimators
  • Nonparametric Methods: Kernel regression, local regression, smoothing splines

๐Ÿ“‰ Time Series Analysis

  • Core Structures: TimeSeries with datetime indexing
  • ARIMA Models: AR, MA, ARMA, ARIMA, SARIMA
  • GARCH Models: ARCH, GARCH for volatility modeling
  • Decomposition: STL, moving averages, Hodrick-Prescott filter
  • Forecasting: Point forecasts, prediction intervals

๐Ÿ› ๏ธ Utilities

  • Linear Algebra: Matrix operations, solvers, decompositions
  • Random Generation: Distributions, bootstrap, train-test split
  • Data Validation: Type checking, missing value detection
  • Numerical Methods: Softmax, standardization, normalization

๐Ÿ Python API

StatOxide provides a complete Python interface through PyO3 bindings:

import statoxide
import statoxide.core as soc
import statoxide.stats as sos

# Core data structures
df = soc.DataFrame({
    "x": [1.0, 2.0, 3.0, 4.0, 5.0],
    "y": [2.0, 4.0, 5.0, 4.0, 5.0]
})

series = df.get_column("x")
print(f"Mean of x: {series.mean():.2f}")
print(f"Std of x: {series.std(1.0):.2f}")

# Statistical functions
print(f"Correlation: {sos.correlation(df.get_column('x').to_list(), 
                                      df.get_column('y').to_list()):.3f}")

summary = sos.descriptive_summary([1.0, 2.0, 3.0, 4.0, 5.0])
print(f"Summary: {summary}")

# Formula parsing
formula = soc.Formula("y ~ x + x^2")
print(f"Formula variables: {formula.variables()}")

# Models
import statoxide.models as som
result = som.linear_regression([[1, 1], [1, 2], [1, 3]], [5, 8, 11])
print(f"Regression coefficients: {result['coefficients']}")

# Mixed effects models
mixed_results = som.mixed_effects("y ~ x + (1 | group)", data)
print(f"Random effect variance: {mixed_results.random_variances}")

# Time series
import statoxide.tsa as sot
arima_result = sot.fit_arima([1.0, 2.0, 3.0, 4.0, 5.0], 1, 0, 1)
print(f"ARIMA AIC: {arima_result['aic']}")

# Utilities
import statoxide.utils as sou
train, test = sou.train_test_split([1.0, 2.0, 3.0, 4.0, 5.0], 0.2)
print(f"Train: {train}, Test: {test}")

๐Ÿ—๏ธ Architecture

StatOxide is organized as a multi-crate Rust workspace:

statoxide/
โ”œโ”€โ”€ Cargo.toml              # Workspace configuration
โ”œโ”€โ”€ crates/
โ”‚   โ”œโ”€โ”€ so-core/           # Core data structures & formula parsing
โ”‚   โ”œโ”€โ”€ so-linalg/         # Linear algebra abstraction
โ”‚   โ”œโ”€โ”€ so-stats/          # Statistical functions & distributions
โ”‚   โ”œโ”€โ”€ so-models/         # Statistical models (regression, GLM, mixed effects, etc.)
โ”‚   โ”œโ”€โ”€ so-tsa/            # Time series analysis
โ”‚   โ”œโ”€โ”€ so-utils/          # Utility functions
โ”‚   โ””โ”€โ”€ so-python/         # Python bindings (PyO3)
โ”œโ”€โ”€ assets/logo.png        # Project logo
โ”œโ”€โ”€ LICENSE-MIT           # MIT license
โ””โ”€โ”€ LICENSE-APACHE-2.0    # Apache 2.0 license

๐Ÿ“ฆ Installation

Prerequisites

  1. Rust Toolchain: curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
  2. Python Development Files:
    • Ubuntu/Debian: sudo apt-get install python3-dev python3.11-dev
    • macOS: brew install python@3.11
  3. Maturin (recommended): pip install maturin

Building from Source

# Clone the repository
git clone https://github.com/EthanNOV56/StatOxide.git
cd StatOxide

# Build Python bindings with maturin
cd crates/so-python
maturin develop  # Editable install for development
# or
maturin build --release  # Build wheel for distribution

Direct Cargo Build

cd /path/to/statoxide
export PYO3_PYTHON=python3.11
cargo build --release --package so-python

The shared library will be at target/release/libso_python.so.

๐Ÿงช Testing

Rust Tests

cargo test --all

Python Tests

After installation:

python -c "import statoxide; print(statoxide.version())"
python crates/so-python/test_api.py  # API demonstration

๐Ÿ“š Documentation

  • API Reference: Run cargo doc --all --no-deps --open for Rust documentation
  • Python Docstrings: All Python functions include detailed docstrings
  • Examples: See crates/so-python/test_api.py for usage examples

๐ŸŽฏ Design Principles

  1. Performance: Leverage Rust's zero-cost abstractions and LLVM optimizations
  2. Safety: Memory safety guarantees without garbage collection
  3. Interoperability: Seamless Python integration with minimal overhead
  4. Modularity: Independent crates for clear separation of concerns
  5. API Consistency: Familiar interfaces inspired by R, pandas, and statsmodels

๐Ÿ”ง Development Status

Module Status Notes
so-core โœ… Complete Data structures, formula parsing
so-linalg โœ… Complete Linear algebra abstraction
so-stats โœ… Complete Statistical functions & distributions
so-models โœ… Complete Regression, GLM, mixed effects, robust, nonparametric
so-tsa โœ… Complete ARIMA, GARCH, decomposition, forecasting
so-utils โœ… Complete Random generation, validation, numerical methods
so-python โœ… Complete Full Python bindings implemented

๐Ÿ“„ License

StatOxide is dual-licensed under both:

You may use StatOxide under either license at your option.

๐Ÿ™ Acknowledgments

  • R and statsmodels for statistical API inspiration
  • pandas for DataFrame design patterns
  • PyO3 team for excellent Rust-Python interop
  • ndarray and faer for numerical computing foundations

๐Ÿค Contributing

Contributions are welcome!

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Run tests: cargo test --all
  5. Submit a pull request

๐Ÿ“ž Support


High-performance statistics meets Python productivity

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

statoxide-0.3.0.tar.gz (527.7 kB view details)

Uploaded Source

File details

Details for the file statoxide-0.3.0.tar.gz.

File metadata

  • Download URL: statoxide-0.3.0.tar.gz
  • Upload date:
  • Size: 527.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.7

File hashes

Hashes for statoxide-0.3.0.tar.gz
Algorithm Hash digest
SHA256 98796cd81a275085d57d681989c03d51cd88ab160ce19b5237da2e8499d2dfff
MD5 c9483f0e8a6a70c3b7f74683f20ad52c
BLAKE2b-256 cbab73ef8da4cc7e769be4b76928e118c3d7c8f59d6e9f1a838e2c013cae0ddc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page