Ultra-fast Rust-powered statistics and time-series utilities for Python.
Project description
.
💥 bunker-stats
A Rust-powered statistical toolkit with a Python API and pandas Styler integration.
bunker-stats is a hybrid Rust/Python library providing fast, numerically-stable statistical primitives, rolling-window analytics, distribution tools, and pandas Styler visualizations — all backed by Rust for correctness and performance.
Project Philosophy & Status (v0.1)
bunker-stats is intentionally released early.
The goal is not to replace NumPy or pandas, but to build a Rust-accelerated analytics toolkit that grows feature-by-feature. This first release focuses on correctness, clear API design, and a solid suite of statistical primitives.
Future releases will focus on:
performance tuning (SIMD, fused loops, BLAS-backed ops)
smarter rolling-window pipelines
more visualization helpers
NaN-safe variants of all ops
multi-column Rust kernels
improved correlation-matrix engine
This library is actively evolving, and v0.1 is the foundation everything else will build on.
🚀 Features Core statistics (Rust)
Mean, variance, std (sample vs population)
Z-scores
MAD (Median Absolute Deviation)
Percentiles & quantiles
IQR & Tukey outlier fences
Covariance / correlation
Welford one-pass mean/variance
EWMA (exponentially weighted moving average)
Rolling window analytics
Rolling mean / std / z-score
Rolling covariance / correlation
Fused rolling pipelines in Rust (planned)
Distribution tools
ECDF (empirical CDF)
Gaussian KDE
Quantile binning
Winsorization
Transforms
Robust scaling (Median + MAD)
diff / pct_change / cumsum / cummean
pandas Styler integration
demean_style(df, column)
zscore_style(df, column, threshold=…)
iqr_outlier_style(df, column)
corr_heatmap(df)
robust_scale_column(df, column)
📦 Installation (from source) git clone https://github.com//bunker-stats.git cd bunker-stats python -m venv .venv source .venv/bin/activate # or .venv\Scripts\activate on Windows pip install maturin maturin develop
🔍 Usage Examples NumPy stats (Rust backend) import numpy as np import bunker_stats as bs
x = np.array([1.0, 2.0, 3.0, 10.0], dtype="float64")
print(bs.mean_np(x)) # 4.0 print(bs.std_np(x)) # 4.08248... print(bs.zscore_np(x)) # [-0.73, -0.48, -0.24, 1.46]
pandas Styler import pandas as pd import bunker_stats as bs
df = pd.DataFrame({"sales": [10, 12, 15, 9, 8, 20]})
styled = bs.pandas.demean_style(df, "sales") styled # displays color-coded DataFrame in Jupyter
📊 Benchmark Results (v0.1)
All benchmarks are reproducible via:
python benchmarks/bench_bunker_stats.py python benchmarks/test_advanced_ops.py
Hardware: Windows 10, Python 3.10, NumPy 1.x
✅ Correctness Checks
bunker-stats matches NumPy/pandas across:
mean, std, z-score
percentiles
IQR & Tukey fences
MAD
diff / pct_change / cumsum / cummean
ECDF
covariance, correlation
rolling covariance/correlation
KDE (integral ≈ 1.0)
EWMA
Welford one-pass stats
All tests pass with tight tolerances (1e-12 where appropriate).
⚡ Performance Summary 1D statistics (1,000,000 elements) Operation NumPy bunker-stats mean 1.51 ms 6.73 ms std (ddof=1) 7.30 ms 16.27 ms z-score 11.9 ms 35.8 ms
Interpretation: NumPy is heavily optimized C with low overhead. For simple scalar ops, NumPy is faster — expected for a v0.1 Rust library accessed via FFI.
Rolling windows (1,000,000 elements, window=50) Operation pandas bunker-stats Rolling mean 34.62 ms 18.31 ms
🔥 bunker-stats rolling mean is ~1.9× faster than pandas.
This is where Rust shines: fused loops, zero Python overhead, no index machinery.
Covariance / Correlation (large vectors) Operation Size pandas bunker-stats Covariance 100k — 1.86 ms Correlation 100k — 7.63 ms
Cov/corr for individual vector pairs are very fast, often competitive with NumPy.
Correlation Matrix (100,000 × 10) Operation pandas bunker-stats corr matrix 34.0 ms 439.8 ms
Interpretation: bunker-stats currently uses a straightforward Rust implementation (correct but not optimized). Future versions will incorporate column-wise precomputations + SIMD.
Advanced Ops Operation Input Size Time RobustScaler 100k 26.34 ms Winsorization 100k 207.6 ms Quantile binning (5 bins) 100k 735.8 ms ECDF 10k 8.51 ms KDE (Gaussian, 5k → 512 grid) 5k 54.44 ms rolling_cov (window=50) 100k 120.49 ms rolling_corr (window=50) 100k 322.33 ms diff(1M) 1M 18.47 ms pct_change(1M) 1M 28.98 ms cumsum(1M) 1M 16.01 ms cummean(1M) 1M 20.83 ms
All advanced ops validated against NumPy/pandas or pure Python equivalents.
🎯 What bunker-stats is (and isn’t) bunker-stats is:
A Rust-backed analytics toolkit specialized for:
rolling statistics
outlier detection
robust scaling
distribution analysis
feature binning & KDE
pandas-friendly visualization
A numerically correct, well-tested foundation you can trust.
bunker-stats is not (yet):
A total replacement for NumPy’s C vectorized primitives
A drop-in for full pandas DataFrame operations
Optimized correlation-matrix engine (coming soon)
What's new in 0.2.2
- Added full benchmarking suite (
bench_bunker_v022.py) comparing bunker-stats to NumPy, pandas, and SciPy. - Optimized axis-wise skipna logic (mean_axis, var_axis, std_axis).
- Implemented nd-rolling helpers:
rolling_mean_last_axisandrolling_std_last_axis. - New outlier/scaling utilities:
iqr_outliers,zscore_outliers,minmax_scale,robust_scale,winsorize. - Added rolling covariance/correlation for 1D series.
- Improved KDE implementation using Scott’s bandwidth rule.
- Added pandas-friendly helpers:
col_mean,row_mean,rolling_mean_series, etc. (requires panda installed). - Comprehensive syntax comparison in README: NumPy vs pandas vs bunker-stats.
🧪 Testing
To run the full suite:
pytest -q # if you add tests/ folder python benchmarks/bench_bunker_stats.py python benchmarks/test_advanced_ops.py
🛣️ Roadmap
SIMD-optimized rolling statistics
Optimized correlation matrix (BLAS-backed)
Fused rolling mean+std+zscore in one pass
Multi-column Styler helpers
NaN-robust implementations across all functions
Polars DataFrame integration
PyO3 async variants where appropriate
❤️ Contributing
PRs welcome — especially for vectorization, algorithmic improvements, and new statistical transforms.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file bunker_stats_rs-0.2.2-cp310-cp310-win_amd64.whl.
File metadata
- Download URL: bunker_stats_rs-0.2.2-cp310-cp310-win_amd64.whl
- Upload date:
- Size: 178.8 kB
- Tags: CPython 3.10, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.10.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b43b3dec5f4beaeb36629b02ff4031b77728f7f3478e87c0d06598859761cc8b
|
|
| MD5 |
b2c52caae2db29c77c1ddcf314c89fb1
|
|
| BLAKE2b-256 |
2945ea43e9c82b22630722f5a6f850a77dfe89a971b23322b4a2b5506fdb465f
|