Skip to main content

Fast parallel GSADF bubble detection (PSY 2015) with wild bootstrap critical values

Project description

pygsadf

Fast parallel GSADF bubble detection (PSY 2015) with wild-bootstrap critical values.

The first Python package to deliver production-grade, parallelised Generalised Sup ADF testing. Detects explosive bubbles in financial time series at 90%, 95%, and 99% confidence levels.

Why pygsadf?

Feature pygsadf R exuber Stata gsadf EViews
Parallel bootstrap Yes (all cores) Limited No No
CV 99% Yes Yes No No
Numba JIT kernel Yes C++ (Rcpp) Mata Proprietary
Pointwise BSADF CVs Yes Yes No No
CLI tool Yes No No No
One-line API Yes No No No
Free & open source MIT GPL $$ $$$

Performance: 1499 bootstrap iterations on T=3500 series in ~12 minutes on 192 cores, vs ~8 hours sequential.

Installation

pip install pygsadf              # core (NumPy + joblib)
pip install pygsadf[fast]        # + Numba JIT (10-50x faster)
pip install pygsadf[full]        # + Numba + matplotlib + statsmodels + tqdm

Quick Start

import pygsadf
import pandas as pd

# Load your log-price series
prices = pd.read_csv("eth_prices.csv", index_col=0, parse_dates=True)
log_prices = prices["close"].apply(np.log)

# Run GSADF test (one line)
result = pygsadf.gsadf(log_prices)

# Results
print(result)                    # Full summary
result.reject_h0(0.95)          # True = bubble detected at 95%
result.reject_h0(0.99)          # True = bubble detected at 99%
result.bubbles                   # List of (start, end) episodes
result.plot()                    # Publication-ready figure

# Save / load
result.to_pickle("gsadf_result.pkl")
loaded = pygsadf.GSADFResult.from_pickle("gsadf_result.pkl")

Command Line

# Full run with 1499 bootstrap replications
pygsadf --csv data.csv --col log_price --B 1499 --out result.pkl --plot bsadf.png

# Quick test (B=199)
pygsadf --csv data.csv --col close --log --B 199

# Use fewer cores
pygsadf --csv data.csv --col log_price --n-jobs 8

API Reference

pygsadf.gsadf(y, B=1499, ...)

Main entry point. Accepts NumPy array or pandas Series.

Parameters:

  • y — Log-price series (array or pd.Series with DatetimeIndex)
  • B — Bootstrap replications (default 1499; use 199 for quick tests)
  • max_lag — Maximum ADF augmentation lags (BIC selects optimal)
  • quantiles — Confidence levels, default (0.90, 0.95, 0.99)
  • seed — RNG seed for reproducibility
  • n_jobs — Parallel workers (-1 = all cores)

Returns: GSADFResult with:

  • .gsadf_stat — Scalar GSADF statistic
  • .cv — Dict of critical values {"90%": ..., "95%": ..., "99%": ...}
  • .bsadf — Full BSADF sequence (ndarray)
  • .bsadf_cv — Pointwise CV sequences
  • .bubbles — List of BubbleEpisode objects
  • .reject_h0(confidence) — Boolean hypothesis test
  • .plot() — Matplotlib figure
  • .summary() — Formatted text output
  • .to_pickle() / .from_pickle() — Serialisation

pygsadf.wild_bootstrap_cv(y, r0, B=1499, ...)

Low-level bootstrap function for custom workflows.

pygsadf.date_stamp_bubbles(bsadf, cv, dates, min_duration=5)

Date-stamp explosive episodes from BSADF vs critical value sequences.

How It Works

  1. BSADF Computation — For each endpoint, compute the supremum of right-tailed ADF statistics over all valid start points (PSY 2015, Section 3)
  2. GSADF — The overall supremum of the BSADF sequence
  3. Wild Bootstrap — Generate synthetic unit-root series using Rademacher weights, compute GSADF on each, take empirical quantiles as critical values
  4. Date-Stamping — Episodes where BSADF exceeds the pointwise 95% CV for at least log(T) consecutive days

The bootstrap is embarrassingly parallel — each replication is independent with its own deterministic RNG seed, giving identical results whether run on 1 core or 192.

Architecture

pygsadf/
├── __init__.py          # Public API
├── core.py              # gsadf() + GSADFResult
├── adf.py               # Numba-JIT ADF kernel with BIC lag selection
├── bsadf.py             # GSADF + BSADF computation
├── bootstrap.py         # Parallel wild bootstrap
├── datestamp.py          # Bubble episode detection
└── cli.py               # Command-line interface

Validation Against R exuber

pygsadf has been validated against the R exuber package (v0.4.2+) on a synthetic series with a known embedded explosive regime (T=500, AR coefficient 1.05 at t=200–299).

Apples-to-apples comparison (both fixed lag=1, B=999):

Metric pygsadf R exuber Difference
GSADF statistic 16.370363 16.370400 0.0002%
BSADF correlation 0.9987
BSADF MAE 0.033
CV 90% (wild bootstrap) 9.016 9.425 4.3%
CV 95% (wild bootstrap) 10.270 10.760 4.6%
CV 99% (wild bootstrap) 12.628 14.096 10.4%
Reject H₀ at 95% Yes Yes Match
Reject H₀ at 99% Yes Yes Match

Key findings:

  • GSADF statistic matches to 6 decimal places (0.0002% difference)
  • BSADF sequence correlation: 0.999 — the entire time-varying sequence matches
  • All rejection decisions agree at every confidence level
  • CV differences (4–10%) are expected — Python and R use different RNG implementations for bootstrap Rademacher draws; the underlying distributions converge as B → ∞
  • pygsadf's default BIC lag selection produces higher GSADF values than exuber's fixed lag=1 default, because BIC can select lag=0 for some windows, yielding sharper test statistics. This is a methodological choice, not a discrepancy — both are valid implementations of PSY (2015)

The full validation suite is in validation/, including the synthetic dataset, both Python and R scripts, and an automated comparison tool. To reproduce:

cd validation/
python generate_test_data.py        # create validation_series.csv
python run_pygsadf_lag1.py           # pygsadf with fixed lag=1
Rscript run_exuber.R                 # R exuber (requires R + exuber package)
python compare_results.py            # side-by-side comparison

Citation

If you use pygsadf in academic work, please cite:

@software{pygsadf,
  title  = {pygsadf: Fast Parallel GSADF Bubble Detection},
  author = {Madkhali, Ali},
  year   = {2025},
  url    = {https://github.com/alixecon/pygsadf},
}

And the original methodology:

@article{psy2015,
  title   = {Testing for Multiple Bubbles: Historical Episodes of
             Exuberance and Collapse in the {S\&P} 500},
  author  = {Phillips, Peter C.B. and Shi, Shuping and Yu, Jun},
  journal = {International Economic Review},
  volume  = {56},
  number  = {4},
  pages   = {1043--1078},
  year    = {2015},
}

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pygsadf-2.0.2.tar.gz (9.7 kB view details)

Uploaded Source

File details

Details for the file pygsadf-2.0.2.tar.gz.

File metadata

  • Download URL: pygsadf-2.0.2.tar.gz
  • Upload date:
  • Size: 9.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for pygsadf-2.0.2.tar.gz
Algorithm Hash digest
SHA256 d7b0cdfb883354382b7807a2d48f58c5bf43187567aee5a59fb6473e182a64ba
MD5 e750491c7ae456af0383797b4083e679
BLAKE2b-256 102c3237e63e1fd3cfaeddf79655088f15e52221441f0745cfc81299b8b4da61

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page