Skip to main content

Fast parallel GSADF bubble detection (PSY 2015) with wild bootstrap critical values

Project description

pygsadf

Fast parallel GSADF bubble detection (PSY 2015) with wild-bootstrap critical values.

The first Python package to deliver production-grade, parallelised Generalised Sup ADF testing. Detects explosive bubbles in financial time series at 90%, 95%, and 99% confidence levels.

Why pygsadf?

Feature pygsadf R exuber Stata gsadf EViews
Parallel bootstrap Yes (all cores) Limited No No
CV 99% Yes Yes No No
Numba JIT kernel Yes C++ (Rcpp) Mata Proprietary
Pointwise BSADF CVs Yes Yes No No
CLI tool Yes No No No
One-line API Yes No No No
Free & open source MIT GPL $$ $$$

Performance: 1499 bootstrap iterations on T=3500 series in ~12 minutes on 192 cores, vs ~8 hours sequential.

Installation

pip install pygsadf              # core (NumPy + joblib)
pip install pygsadf[fast]        # + Numba JIT (10-50x faster)
pip install pygsadf[full]        # + Numba + matplotlib + statsmodels + tqdm

Quick Start

import pygsadf
import pandas as pd

# Load your log-price series
prices = pd.read_csv("eth_prices.csv", index_col=0, parse_dates=True)
log_prices = prices["close"].apply(np.log)

# Run GSADF test (one line)
result = pygsadf.gsadf(log_prices)

# Results
print(result)                    # Full summary
result.reject_h0(0.95)          # True = bubble detected at 95%
result.reject_h0(0.99)          # True = bubble detected at 99%
result.bubbles                   # List of (start, end) episodes
result.plot()                    # Publication-ready figure

# Save / load
result.to_pickle("gsadf_result.pkl")
loaded = pygsadf.GSADFResult.from_pickle("gsadf_result.pkl")

Command Line

# Full run with 1499 bootstrap replications
pygsadf --csv data.csv --col log_price --B 1499 --out result.pkl --plot bsadf.png

# Quick test (B=199)
pygsadf --csv data.csv --col close --log --B 199

# Use fewer cores
pygsadf --csv data.csv --col log_price --n-jobs 8

API Reference

pygsadf.gsadf(y, B=1499, ...)

Main entry point. Accepts NumPy array or pandas Series.

Parameters:

  • y — Log-price series (array or pd.Series with DatetimeIndex)
  • B — Bootstrap replications (default 1499; use 199 for quick tests)
  • max_lag — Maximum ADF augmentation lags (BIC selects optimal)
  • quantiles — Confidence levels, default (0.90, 0.95, 0.99)
  • seed — RNG seed for reproducibility
  • n_jobs — Parallel workers (-1 = all cores)

Returns: GSADFResult with:

  • .gsadf_stat — Scalar GSADF statistic
  • .cv — Dict of critical values {"90%": ..., "95%": ..., "99%": ...}
  • .bsadf — Full BSADF sequence (ndarray)
  • .bsadf_cv — Pointwise CV sequences
  • .bubbles — List of BubbleEpisode objects
  • .reject_h0(confidence) — Boolean hypothesis test
  • .plot() — Matplotlib figure
  • .summary() — Formatted text output
  • .to_pickle() / .from_pickle() — Serialisation

pygsadf.wild_bootstrap_cv(y, r0, B=1499, ...)

Low-level bootstrap function for custom workflows.

pygsadf.date_stamp_bubbles(bsadf, cv, dates, min_duration=5)

Date-stamp explosive episodes from BSADF vs critical value sequences.

How It Works

  1. BSADF Computation — For each endpoint, compute the supremum of right-tailed ADF statistics over all valid start points (PSY 2015, Section 3)
  2. GSADF — The overall supremum of the BSADF sequence
  3. Wild Bootstrap — Generate synthetic unit-root series using Rademacher weights, compute GSADF on each, take empirical quantiles as critical values
  4. Date-Stamping — Episodes where BSADF exceeds the pointwise 95% CV for at least log(T) consecutive days

The bootstrap is embarrassingly parallel — each replication is independent with its own deterministic RNG seed, giving identical results whether run on 1 core or 192.

Architecture

pygsadf/
├── __init__.py          # Public API
├── core.py              # gsadf() + GSADFResult
├── adf.py               # Numba-JIT ADF kernel with BIC lag selection
├── bsadf.py             # GSADF + BSADF computation
├── bootstrap.py         # Parallel wild bootstrap
├── datestamp.py          # Bubble episode detection
└── cli.py               # Command-line interface

Citation

If you use pygsadf in academic work, please cite:

@software{pygsadf,
  title  = {pygsadf: Fast Parallel GSADF Bubble Detection},
  author = {Madkhali, Ali},
  year   = {2025},
  url    = {https://github.com/alixecon/pygsadf},
}

And the original methodology:

@article{psy2015,
  title   = {Testing for Multiple Bubbles: Historical Episodes of
             Exuberance and Collapse in the {S\&P} 500},
  author  = {Phillips, Peter C.B. and Shi, Shuping and Yu, Jun},
  journal = {International Economic Review},
  volume  = {56},
  number  = {4},
  pages   = {1043--1078},
  year    = {2015},
}

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pygsadf-2.0.1.tar.gz (8.7 kB view details)

Uploaded Source

File details

Details for the file pygsadf-2.0.1.tar.gz.

File metadata

  • Download URL: pygsadf-2.0.1.tar.gz
  • Upload date:
  • Size: 8.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for pygsadf-2.0.1.tar.gz
Algorithm Hash digest
SHA256 a6908e1e3e6fb8b12bb27788bcecf09fe197c877fa795ee7cb4c9b01b3756495
MD5 46c3f32e1b915ad5e2a2f214a87c88b6
BLAKE2b-256 087facb57abe48e8a4d20fec1a6078f336441b99817f1d323e5430e76759729b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page