Skip to main content

Deflated Sharpe Ratio and statistical gates for quantitative strategy validation

Project description

deflated-sharpe

Is your backtest real?

PyPI Python 3.10+ License Tests

You tested 1,000 parameter combinations and found a Sharpe 2.0 strategy. Is it real — or did you just test enough times to get lucky?

deflated-sharpe implements the Deflated Sharpe Ratio (Bailey & Lopez de Prado, 2014) and related statistical gates. Pure Python, zero required dependencies, designed to be the last check before you deploy a strategy.

Install

pip install deflated-sharpe

Quick Start

Deflated Sharpe Ratio

from deflated_sharpe import deflated_sharpe_ratio

dsr, p_value = deflated_sharpe_ratio(observed_sr=2.0, num_trials=1000, num_obs=252)
print(f"DSR={dsr:.2f}, p={p_value:.4f}")

A positive DSR with p < 0.05 means your strategy likely has real alpha after accounting for the number of trials you ran. A negative DSR means the expected maximum Sharpe from random chance alone exceeds your observed Sharpe.

Minimum Backtest Length

from deflated_sharpe import min_backtest_length

min_obs = min_backtest_length(target_sr=1.5, num_trials=500)
print(f"Need at least {min_obs} observations")

Before running a large search, compute how many observations you need. If your backtest window is shorter than min_obs, no strategy can pass the DSR gate regardless of performance.

Regime Decay Detection

from deflated_sharpe import RegimeDecayDetector, StrategyBaseline, TradeResult

baseline = StrategyBaseline(win_rate=0.55, trade_count=200, max_drawdown_pct=12.0)
detector = RegimeDecayDetector(baseline=baseline)
detector.fit_market_baseline(training_features)  # list of (atr_ratio, trend_pct, atr_percentile)

for trade in live_trades:
    detector.add_trade(trade)
assessment = detector.assess()
print(f"Decay confirmed: {assessment.decay_confirmed} ({assessment.signals_fired}/3 signals)")

Triple-confirmation system: Bayesian win rate decay, drawdown exceedance (1.5x backtest MDD), and Mahalanobis out-of-distribution detection. Two of three signals must fire simultaneously to confirm decay.

How DSR Saved Us

In March 2026, we ran a grid search over 19,200 parameter combinations on BTCUSDT 1H walk-forward data (23 periods, 3-month IS + 3-month OOS). Multiple strategies showed Sharpe ratios above 1.5 in-sample. The DSR gate rejected every single one — correctly preventing deployment of overfitted strategies.

The math was simple: with M=19,200 trials and only 30-50 trades per window, the expected maximum Sharpe from pure chance exceeded every observed value. We then tested LLM-guided search (M~30 per period, 640x fewer trials) and found the same result: trade count was the binding constraint, not search method. DSR saved us from deploying strategies that looked profitable but had zero statistical significance.

Full analysis: Phase 15 Case Study

Before and After DSR

Tools

deflated_sharpe_ratio(observed_sr, num_trials, num_obs, skewness, kurtosis)

Computes the Deflated Sharpe Ratio per Bailey & Lopez de Prado (2014). Adjusts the observed Sharpe for selection bias from multiple testing, accounting for return non-normality via skewness and kurtosis corrections.

The key insight: when you test M strategies, the maximum Sharpe you expect from pure luck grows as O(sqrt(ln(M))). DSR subtracts this expected maximum from your observed Sharpe and normalizes by the standard error.

Parameter Type Default Description
observed_sr float required Observed Sharpe ratio
num_trials int required Number of strategies tested (M)
num_obs int required Number of observations (T)
skewness float 0.0 Return skewness (0 = normal)
kurtosis float 3.0 Return kurtosis (3 = normal)

Returns (dsr, p_value). DSR > 0 with p < 0.05 indicates statistical significance.

Reference: Bailey, D.H. & Lopez de Prado, M. (2014), Eq. 2-4.

min_backtest_length(target_sr, num_trials, alpha, skewness, kurtosis)

Binary search for the minimum number of observations T such that DSR > 0 at the given significance level. Use this to determine if your backtest window is long enough before running a parameter search.

from deflated_sharpe import min_backtest_length

# "I want Sharpe 1.5 after testing 200 strategies. How much data do I need?"
min_obs = min_backtest_length(target_sr=1.5, num_trials=200, alpha=0.05)

benjamini_hochberg(p_values, alpha)

Benjamini-Hochberg FDR correction for evaluating multiple strategies simultaneously. When you have N candidate strategies each with a DSR p-value, BH controls the false discovery rate at level alpha.

from deflated_sharpe import deflated_sharpe_ratio, benjamini_hochberg

p_values = [
    deflated_sharpe_ratio(sr, num_trials=50, num_obs=500)[1]
    for sr in [1.2, 0.8, 1.5, 0.3]
]
results = benjamini_hochberg(p_values, alpha=0.05)
for idx, p, sig in results:
    print(f"Strategy {idx}: p={p:.4f}, significant={sig}")

RegimeDecayDetector

Live monitoring for strategy regime decay. Three independent signals with 2/3 majority vote:

  • S1 Win Rate Decay: Bayesian Beta updating with backtest prior. Fires when P(win_rate < breakeven) exceeds threshold.
  • S2 Drawdown Exceedance: Fires when current drawdown exceeds dd_multiplier (default 1.5x) times backtest maximum drawdown.
  • S3 Market OOD: Mahalanobis distance on market features (ATR ratio, trend, ATR percentile). Fires when recent trades are beyond the training distribution's 95th percentile.

Anti-false-positive measures: minimum 20 trades before assessment, cooling period of 5 trades after trigger, Bonferroni correction for multiple strategies.

Config Parameter Default Description
min_trades 20 Minimum trades before assessment
cooling_period 5 Trades to skip after trigger
win_rate_decay_prob_threshold 0.80 P(wr < breakeven) threshold
dd_multiplier 1.5 Drawdown exceedance multiplier
ood_percentile 95.0 Mahalanobis percentile threshold
num_strategies 1 N for Bonferroni correction

Paper Verification

The DSR implementation is verified against the original paper's mathematics: Gumbel approximation for E[Z_max], standard error with non-normality correction, and the full DSR test statistic. See tests/test_paper_verification.py for numerical checks against known values from Bailey & Lopez de Prado (2014).

Zero Dependencies

The core library uses only Python standard library (math, dataclasses). The _math.py module implements norm_cdf, matrix inversion, and Mahalanobis distance from scratch to avoid pulling in NumPy/SciPy for basic usage.

For the regime detector's Bayesian win rate signal (S1), scipy.stats.beta is used if available; otherwise a point-estimate fallback is used. Install the optional dependency:

pip install "deflated-sharpe[scipy]"

References

Bailey, D. H., & Lopez de Prado, M. (2014). "The Deflated Sharpe Ratio: Correcting for Selection Bias, Backtest Overfitting, and Non-Normality." Journal of Portfolio Management, 40(5), 94-107. DOI: 10.3905/jpm.2014.40.5.094

License

Apache-2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

deflated_sharpe-0.1.0.tar.gz (71.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

deflated_sharpe-0.1.0-py3-none-any.whl (14.9 kB view details)

Uploaded Python 3

File details

Details for the file deflated_sharpe-0.1.0.tar.gz.

File metadata

  • Download URL: deflated_sharpe-0.1.0.tar.gz
  • Upload date:
  • Size: 71.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for deflated_sharpe-0.1.0.tar.gz
Algorithm Hash digest
SHA256 4da61e76e5acd4222bf10e18c682423d983a6c204cb2ff6fdecce64d7d1c12d9
MD5 1d5c74aa99dee8d0c72cc366f99ff67b
BLAKE2b-256 900610eb6cd87bf5ec61ee691fa4f8e9b10ced4a2545f45721af3f1b68d2f772

See more details on using hashes here.

Provenance

The following attestation bundles were made for deflated_sharpe-0.1.0.tar.gz:

Publisher: publish.yml on mnemox-ai/deflated-sharpe

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file deflated_sharpe-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for deflated_sharpe-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f6c323ea3e84c2861dd35deb5f5126ff1476b7b181fee946ebf5c99fa32a251e
MD5 ba1834819986ba9ecf74927de01450ad
BLAKE2b-256 bb8009bcbae3a54fe98ff0fbf15995681f834bc8fe954f3dbe52192d82a44ea0

See more details on using hashes here.

Provenance

The following attestation bundles were made for deflated_sharpe-0.1.0-py3-none-any.whl:

Publisher: publish.yml on mnemox-ai/deflated-sharpe

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page