Skip to main content

Tangent-space Wasserstein-geodesic distributional forecasting for crypto returns

Project description

wasserstein-btc

Distributional forecasting for crypto returns via geodesics on the 2-Wasserstein manifold of probability measures

A small, falsifiable, interpretable distributional forecaster — ~4 hyperparameters, no learned weights, no neural net.

tests Python License: MIT Live dashboard Research report Cite

Live dashboard · Theory · Research report · Results · Roadmap

WGeo-Ensemble vs GARCH-N vs Static on ETH/USDT h=21d — cumulative-mean CRPS over 6.75 years of walk-forward out-of-sample, showing the WGeo-Ensemble curve below both baselines

wasserstein-btc forecasts the whole conditional distribution of future log-returns — not the mean and not the variance — for liquid crypto pairs at horizons of 1, 5 and 21 days. The market is modelled as a trajectory on the 2-Wasserstein manifold of probability measures, the forecast is the tangent-space extrapolation of recent quantile vectors, and the result is scored with strictly proper rules (CRPS) against an explicit panel of baselines (Static, RW-Drift, Historical-Simulation Bootstrap, GARCH-N, GARCH-t, GJR-GARCH-t).

What it is: a small, falsifiable, interpretable distributional forecaster — ~4 hyperparameters, no learned weights, no neural net. What it is not: a trading-signal generator, a multivariate risk system, or a benchmark against state-of-the-art realised-volatility models (see docs/RESEARCH_REPORT.md §6 for what is not claimed, and ROADMAP.md for the v0.4 priorities that would close that gap).

Contents

Headline result

On the v0.4 panel (BTC + ETH + SOL + BNB × h ∈ {1, 5, 21} × 6.75 years walk-forward; 1380–2470 test days per cell), the WGeo family beats the best non-WGeo baseline (best of Static / RW-Drift / HS-Bootstrap / GARCH-N / GARCH-t / GJR-GARCH-t) in 12 / 12 cells by 0.1% to 3.2% mean CRPS.

The v0.4 cycle adds (a) WGeoEnsemble, the W₂ barycentre of the v0.3 trio in quantile-function coordinates — guaranteed by Jensen's inequality on convex CRPS to weakly dominate the component average; and (b) a residualised Diebold-Mariano test (Giacomini-White 2006) that projects out shared volatility-clustering noise via |y|, y², y plus peer-method losses, preserving the EPA null while strictly reducing HAC variance. Together these lift the panel's statistical evidence:

v0.3 v0.4
Cells WGeo-family wins on CRPS 12 / 12 12 / 12
Cells with vanilla DM p<0.05 1 / 12 (8%) 4 / 12 (33%)
Cells with residualised DM p_r<0.05 8 / 12 (67%)

Per-cell headline numbers, regime-conditional DM tables, and the full falsification verdict against docs/THEORY.md §4 are in docs/RESULTS_LONG.md. Methods-paper-style writeup in docs/RESEARCH_REPORT.md.

Install

The supported workflow uses uv (fast and reproducible). The package itself works under any Python ≥3.11.

git clone https://github.com/AccursedGalaxy/wasserstein-btc
cd wasserstein-btc
uv sync          # creates .venv with locked deps
uv run wbtc test # 53 tests, ~10 seconds

A PyPI release (pip install wbtc) is wired up via .github/workflows/publish-pypi.yml and ships on the next signed tag — see ROADMAP.md.

Quick start — CLI

uv run wbtc info                              # what data do I have?
uv run wbtc fetch BTC/USDT ETH/USDT SOL/USDT  # fetch / update from Binance
uv run wbtc forecast BTC/USDT -H 5 --plot     # forecast & fan-chart PNG
uv run wbtc forecast BTC/USDT -H 5 --json     # JSON for scripting
uv run wbtc backtest --quick                  # fast single-symbol backtest
uv run wbtc backtest-long                     # full multi-asset (~30 min)
uv run wbtc extended-baselines                # HAR-RV/CAViaR/MS/FIGARCH/SV/BVAR vs WGeo on BTC (~2h)
uv run wbtc sweep                             # hyperparameter robustness

Quick start — Python

from wbtc import forecast, available_symbols, default_forecaster

available_symbols()
# ['BNB/USDT', 'BTC/USDT', 'ETH/USDT', 'SOL/USDT', 'XRP/USDT']

fc = forecast("BTC/USDT", horizon=5)
fc.median, fc.quantile(0.05), fc.quantile(0.95)
fc.to_dict()  # JSON-safe summary

# Pick a specific variant explicitly:
from wbtc import WassersteinGeodesicEWMA
fc = forecast("BTC/USDT", horizon=5,
              forecaster=WassersteinGeodesicEWMA(window=90, lookback=20))

default_forecaster(horizon) returns the recommended variant per horizon (see RESEARCH_REPORT.md §7).

What's novel

  • Per-quantile time-regression on the W₂ manifold. The 1D-W₂-as- quantile-function isometry is textbook (Villani 2009 ch. 6); applying it to time-series tangent extrapolation of return distributions appears to be under-published. The closest published method (Saluzzi & Soize 2025, arXiv:2507.07570) uses a Koopman/EDMD-spectral approach with no regime adaptation, applied to housing prices.
  • Cosine-curvature gate. Continuous, non-Markovian gating that blends geodesic extrapolation with a static-empirical fallback when consecutive tangent vectors become orthogonal. Pays off at h=1.
  • Theil-Sen robust slope on the tangent. 29.3% breakdown point; robust to recent-history outliers without explicit regime modelling.
  • Quantile-coordinate ensemble with GARCH. Convex combination in quantile-function space is an exact W₂-geodesic interpolation (McCann 1997) — not a moment-matched or kernel-mixed surrogate.

Documents

docs/
  THEORY.md           math (§2.6–2.8 are the v0.3 sections, §4 lists
                      explicit falsification criteria)
  RESEARCH_REPORT.md  paper-style writeup of the v0.3 contributions
  RESULTS_LONG.md     auto-regenerated 4-asset × 3-horizon evidence
  RESULTS.md          legacy v0.1 single-year report (superseded)
  INDEX.md            one-paragraph orientation to every doc
ROADMAP.md            v0.4 + v0.5 priorities (what would make it
                      competitive vs. production risk systems)
CONTRIBUTING.md       the conventions PRs must follow
CHANGELOG.md          v0.1 → v0.2 → v0.3 history

Honest limitations

  • We have benchmarked against textbook baselines as headline (Static / RW / HS / GARCH-N / GARCH-t / GJR-GARCH-t across 4 assets × 3 horizons in docs/RESULTS_LONG.md) and against a broader named-econometric panel on BTC in docs/RESULTS_EXTENDED.md: HAR-RV (Corsi 2009), CAViaR-SAV (Engle-Manganelli 2004), 2-state Markov-switching Normal (Hamilton 1989), FIGARCH(1,d,0) (Baillie-Bollerslev-Mikkelsen 1996), AR(1) Stochastic Volatility (Taylor 1982 / Harvey-Ruiz-Shephard 1994 via Kalman QML), and a bivariate VAR+GARCH using BTC + ETH jointly. Any production-risk-system claim is still unsupported — this rounds out the academic panel.
  • Daily-only. Intraday volatility dynamics are different.
  • Univariate only. The 1D-W₂ isometry doesn't extend cleanly to higher dimensions; multivariate is a v0.5 research item.
  • No trading P&L claim. Distributional-forecast quality is necessary but not sufficient for tradeable alpha.
  • Heteroskedastic-dispersion variant (WGeo-Hetero) was a documented dead end — see RESEARCH_REPORT.md §4.4 for why (empirical-quantile-based dispersion already encodes the regime; multiplying by GARCH double-counts). The boundary is reusable.

Citation

If you use this software in academic work, please cite it. CITATION.cff is the structured form; the BibTeX-shaped quick form:

@software{wasserstein_btc_2026,
  author       = {Robin Bohrer (AccursedGalaxy)},
  title        = {wasserstein-btc: tangent-space Wasserstein-geodesic
                  distributional forecasting for crypto returns},
  version      = {0.4.0},
  year         = {2026},
  url          = {https://github.com/AccursedGalaxy/wasserstein-btc}
}

License

MIT.

Disclaimer

This is research code. Not financial advice. Falsification criteria are documented in docs/THEORY.md §4 and tested against the full long-horizon backtest in docs/RESULTS_LONG.md. Documented failures are in docs/RESEARCH_REPORT.md §4.2.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wbtc-0.4.1.tar.gz (5.9 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

wbtc-0.4.1-py3-none-any.whl (49.1 kB view details)

Uploaded Python 3

File details

Details for the file wbtc-0.4.1.tar.gz.

File metadata

  • Download URL: wbtc-0.4.1.tar.gz
  • Upload date:
  • Size: 5.9 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.3 {"installer":{"name":"uv","version":"0.11.3","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Arch Linux","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for wbtc-0.4.1.tar.gz
Algorithm Hash digest
SHA256 ded54b5c12aa7ac1be3944de651fdd96ea8362c861ed8a0695d2c8342d08bf65
MD5 34af7821ca54b9ba1ed21af37dcb36b9
BLAKE2b-256 9357525e2de14d58095a1aa3fd78ee4a07398fb02bc02642c13ccebed25ab94e

See more details on using hashes here.

File details

Details for the file wbtc-0.4.1-py3-none-any.whl.

File metadata

  • Download URL: wbtc-0.4.1-py3-none-any.whl
  • Upload date:
  • Size: 49.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.3 {"installer":{"name":"uv","version":"0.11.3","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Arch Linux","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for wbtc-0.4.1-py3-none-any.whl
Algorithm Hash digest
SHA256 9ac2ce290d176528af739ce35b6a96192a863f8668d6a0baf3e0d44833a64832
MD5 636426741433cb3fde9ad43fea918f0e
BLAKE2b-256 44b054eba6752d416a3f6c2a019d41a52395306b2a31644ecf5862092e8577e6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page