Skip to main content

Ultra-fast Rust-powered statistics and time-series utilities for Python.

Project description

💥 bunker-stats

A Rust powered statistical toolkit with a Python API and pandas Styler integration.


🔧 Overview

bunker-stats is a hybrid Rust and Python library providing:

  • Fast statistical primitives
  • Rolling window analytics
  • Distribution tools
  • pandas Styler visualizations

Everything runs on Rust for speed and correctness.


🧭 Project Philosophy and Status

v0.1 is an intentional early release.

This library focuses on correctness, clean APIs, and solid statistical foundations.

🔮 Future Focus

  • Performance tuning (SIMD, fused loops, BLAS ops)
  • Smarter rolling window engines
  • More visualization helpers
  • NaN safe variants
  • Multi column Rust kernels
  • Faster correlation matrix engine

🚀 Features

Core statistics (Rust)

  • Mean, variance, standard deviation
  • Sample vs population versions
  • Z scores
  • MAD
  • Percentiles and quantiles
  • IQR and Tukey fences
  • Covariance, correlation
  • Welford one pass algorithms
  • EWMA

Rolling analytics

  • Rolling mean, std, z score
  • Rolling covariance, correlation
  • Planned fused pipelines

Distribution tools

  • ECDF
  • Gaussian KDE
  • Quantile binning
  • Winsorization

Transforms

  • Robust scaling using Median and MAD
  • diff, pct_change, cumsum, cummean

pandas Styler

  • demean_style(df, column)
  • zscore_style(df, column, threshold=...)
  • iqr_outlier_style(df, column)
  • corr_heatmap(df)
  • robust_scale_column(df, column)

Function Bunker-stats syntax NumPy equivalent pandas equivalent Unique feature in bunker-stats
mean bs.mean(x) np.mean(x) s.mean() 1D mean helper; always treats input as 1D numeric, thin Rust-backed wrapper.
mean_skipna bs.mean_skipna(x) np.nanmean(x) / manual mask s.mean(skipna=True) NaN-aware mean with explicit “skipna” semantics, matching pandas mental model.
var bs.var(x) np.var(x, ddof=1) s.var(ddof=1) 1D sample variance (ddof=1) by default; matches stats textbooks.
var_skipna bs.var_skipna(x) np.nanvar(x, ddof=1) / mask s.var(skipna=True, ddof=1) NaN-aware sample variance in one call.
std bs.std(x) np.std(x, ddof=1) s.std(ddof=1) 1D sample std with fixed ddof=1, consistent with var.
std_skipna bs.std_skipna(x) np.nanstd(x, ddof=1) / mask s.std(skipna=True, ddof=1) NaN-aware sample std; avoids writing masks every time.
percentile bs.percentile(x, q=0.95) np.quantile(x, 0.95) / np.percentile np.quantile(s, 0.95) Clean 1D percentile with your interpolation; integrated with other robust stats.
mad bs.mad(x) manual median/MAD custom or s.mad() (mean abs dev, not median) True median absolute deviation used by robust_scale.
iqr q1, q3, iqr = bs.iqr(x) scipy.stats.iqr(x, rng=(25,75)) s.quantile([0.25, 0.75]) Returns (q1, q3, iqr) in one go; no juggling multiple calls / indices.
mean_axis bs.mean_axis(X, axis=0, skipna=False) np.mean(X, axis=0) df.mean(axis=0, skipna=...) Axis-wise mean for 1D/2D arrays with optional skipna.
var_axis bs.var_axis(X, axis=1, skipna=True) np.var(X, axis=1, ddof=1) (no native skipna) df.var(axis=1, skipna=...) Axis-wise sample variance with built-in NaN handling.
std_axis bs.std_axis(X, axis=1, skipna=True) np.std(X, axis=1, ddof=1) (no native skipna) df.std(axis=1, skipna=...) Axis-wise sample std + skipna; aligns pandas mental model with NumPy arrays.
mean_last_axis* bs.mean_last_axis(X) (if exposed) np.mean(X, axis=-1) df.to_numpy().mean(axis=-1) N-D mean over last axis, consistent with your N-D rolling API.
rolling_mean_last_axis bs.rolling_mean_last_axis(X, window=3) manual reshape + loop / np.apply_along_axis no built-in; need groupby+apply / custom logic Shape-preserving N-D rolling mean over last axis (e.g. (batch, feat, time)).
rolling_std_last_axis bs.rolling_std_last_axis(X, window=3) same as above same N-D rolling std over last axis; perfect for batched time-series / ML tensors.
rolling_mean bs.rolling_mean(x, window=5) manual loop or np.convolve trick s.rolling(5).mean() Fast 1D rolling mean (truncated length) with no index overhead.
rolling_std bs.rolling_std(x, window=5) manual loop s.rolling(5).std() 1D rolling std at Rust speed, sample variance convention.
rolling_zscore bs.rolling_zscore(x, window=20) manual window loop s.rolling(20).apply(custom) Rolling z-score in a single function; avoids apply/UDF overhead.
ewma bs.ewma(x, alpha=0.1) manual recurrence s.ewm(alpha=0.1).mean() Minimal EWMA for pure numeric arrays, no pandas object overhead.
df_rolling_mean bs.df_rolling_mean(df, window=5) np.convolve per column df.rolling(5).mean() DataFrame in / out, but columns powered by Rust rolling mean.
df_rolling_std bs.df_rolling_std(df, window=5) manual per-column df.rolling(5).std() Same for std; uses your rolling core but preserves pandas index.
df_ewma bs.df_ewma(df, alpha=0.1) manual per-column EWMA df.ewm(alpha=0.1).mean() Per-column EWMA with Rust engine, lighter than full pandas EWM machinery.
col_mean bs.col_mean(df, skipna=True) np.mean(df.to_numpy(), axis=0) df.mean(axis=0, skipna=True) Column-wise mean; internally uses mean_axis + skipna, returns labeled Series.
row_mean bs.row_mean(df, skipna=True) np.mean(df.to_numpy(), axis=1) df.mean(axis=1, skipna=True) Row-wise mean with Rust numeric core + pandas index.
cov_df bs.cov_df(df) np.cov(df.to_numpy().T, ddof=1) df.cov() Full covariance matrix via Rust cov_matrix, but returned as a DataFrame.
corr_df bs.corr_df(df) np.corrcoef(df.to_numpy().T) df.corr() Correlation matrix backed by your Rust correlation engine.
rolling_mean_series bs.rolling_mean_series(s, window=10) manual 1D loop s.rolling(10).mean() Series-in / Series-out convenience wrapper around Rust rolling mean.
rolling_std_series bs.rolling_std_series(s, window=10) manual 1D loop s.rolling(10).std() Same for std; keeps index alignment, uses Rust core.
iqr_outliers bs.iqr_outliers(x, k=1.5) iqr = scipy.stats.iqr(x); mask = ... quantiles + boolean mask Returns a boolean outlier mask in one call using IQR rule.
zscore_outliers bs.zscore_outliers(x, threshold=3.0) (np.abs((x-x.mean())/x.std()) > 3) same logic on Series One-liner z-score outlier mask; integrates with your mean/std semantics.
minmax_scale scaled, mn, mx = bs.minmax_scale(x) manual (x-mn)/(mx-mn) use MinMaxScaler from sklearn Returns both scaled data and the (min, max) used (for inverse-transform/reuse).
robust_scale scaled, med, mad = bs.robust_scale(x, scale_factor) manual MAD calculation RobustScaler or custom All-in-one robust scaling with returned (median, MAD); pairs with your mad.
winsorize bs.winsorize(x, lower_q=0.05, upper_q=0.95) scipy.stats.mstats.winsorize(x, limits=...) custom quantile clipping 1D winsorization in Rust, single call returning a full adjusted array.
diff bs.diff(x, periods=1) np.diff(x, n=1) (shorter) / manual padding s.diff(periods=1) Full-length diff with NaNs where necessary; supports negative periods.
pct_change bs.pct_change(x, periods=1) manual (x[i]-x[i-p]) / x[i-p] s.pct_change(periods=1) Includes divide-by-zero → NaN handling; symmetric for positive/negative lags.
cumsum bs.cumsum(x) np.cumsum(x) s.cumsum() Rust implementation; value is performance on large 1D arrays.
cummean bs.cummean(x) np.cumsum(x)/np.arange(1,len(x)+1) s.expanding().mean() Streaming cumulative mean without constructing expanding windows.
ecdf vals, probs = bs.ecdf(x) manual sort + rank custom rank/value_counts Returns sorted values + CDF in one go; perfect for ECDF plots.
quantile_bins bins = bs.quantile_bins(x, n_bins=10) manual rank + binning pd.qcut(x, q=10) (Categorical) Returns plain integer bin labels 0..n_bins-1 as a NumPy array (ML-friendly).
sign_mask mask = bs.sign_mask(x) np.sign(x).astype(np.int8) (s > 0) - (s < 0) Encodes sign into {-1, 0, 1}; useful for discrete signal features.
demean_with_signs demeaned, signs = bs.demean_with_signs(x) (x - x.mean(), np.sign(x - x.mean())) custom Returns both demeaned data and sign mask in one pass.
cov bs.cov(x, y) np.cov(x, y, ddof=1)[0,1] s1.cov(s2) 1D sample covariance as a simple scalar function.
corr bs.corr(x, y) np.corrcoef(x, y)[0,1] s1.corr(s2) 1D Pearson correlation using your var/std core.
cov_skipna bs.cov_skipna(x, y) manual pairwise dropna + np.cov s1.cov(s2) with aligned/dropna Pairwise NaN dropping built in for 1D covariance.
corr_skipna bs.corr_skipna(x, y) manual pairwise dropna + np.corrcoef s1.corr(s2) with dropna Same but for correlation; hides the messy mask-bookkeeping.
cov_matrix bs.cov_matrix(X) np.cov(X, rowvar=False, ddof=1) df.cov() Symmetric covariance matrix with Rust loops; tuned for tabular X.
corr_matrix bs.corr_matrix(X) np.corrcoef(X, rowvar=False) df.corr() Correlation matrix built on your cov/std stack; consistent behaviour across code paths.
rolling_cov bs.rolling_cov(x, y, window=50) manual sliding window + np.cov df['x'].rolling(50).cov(df['y']) Rolling 1D covariance without pandas overhead; good for streaming stats.
rolling_corr bs.rolling_corr(x, y, window=50) manual sliding window + np.corrcoef df['x'].rolling(50).corr(df['y']) Rolling 1D correlation in one Rust call; no custom loop needed in Python.
kde_gaussian grid, dens = bs.kde_gaussian(x, n_points=256) scipy.stats.gaussian_kde(x) + evaluation no direct builtin (need SciPy) Lightweight 1D Gaussian KDE; returns (grid, density) using a simple bandwidth rule by default.

📦 Installation

git clone https://github.com/bunker-stats.git
cd bunker-stats

python -m venv .venv
source .venv/bin/activate   # Windows: .venv\Scripts\activate

pip install maturin
maturin develop

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

bunker_stats_rs-0.2a0-cp310-cp310-win_amd64.whl (168.0 kB view details)

Uploaded CPython 3.10Windows x86-64

File details

Details for the file bunker_stats_rs-0.2a0-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for bunker_stats_rs-0.2a0-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 63875022226cfc92f1e0e2f4340f9b6f7b93e75e96a4d5c8ccda225e6c2a4e2b
MD5 396427a138a979641362f78f9d8c9f08
BLAKE2b-256 bb73237c548ef93a5b40b654659eed4e0827eb091503ceb58667801a8269951d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page