Skip to main content

Comprehensive diagnostic and evaluation framework for quantitative finance ML workflows

Project description

ml4t-diagnostic

Python 3.11+ License: MIT

Comprehensive diagnostics and statistical validation for quantitative trading strategies. Covers the complete ML workflow from feature analysis through portfolio performance.

Features

  • Feature Analysis: Importance (MDI, PFI, MDA, SHAP), interactions, drift detection
  • Signal Analysis: IC analysis, quantile returns, turnover, multi-signal comparison
  • Trade Diagnostics: SHAP-based error pattern discovery, trade analysis
  • Portfolio Analysis: Rolling metrics, drawdown analysis, risk metrics
  • Statistical Validation: DSR, CPCV, RAS, PBO, FDR corrections
  • Time Series Diagnostics: Stationarity, ACF, volatility, distribution tests
  • Binary Metrics: Precision, recall, lift, coverage with Wilson intervals
  • Performance: Polars-powered for 10-100x faster analysis than pandas

Installation

# Core library
pip install ml4t-diagnostic

# With ML dependencies (SHAP, importance analysis)
pip install ml4t-diagnostic[ml]

# With visualization (Plotly reports)
pip install ml4t-diagnostic[viz]

# Everything
pip install ml4t-diagnostic[all]

Quick Start

Trade Diagnostics

from ml4t.diagnostic.evaluation import TradeAnalysis, TradeShapAnalyzer

# Identify worst trades from backtest
analyzer = TradeAnalysis(trade_records)
worst_trades = analyzer.worst_trades(n=20)

# Explain with SHAP
shap_analyzer = TradeShapAnalyzer(model, features_df, shap_values)
result = shap_analyzer.explain_worst_trades(worst_trades)

# Get actionable hypotheses
for pattern in result.error_patterns:
    print(f"Pattern: {pattern.hypothesis}")
    print(f"  Actions: {pattern.actions}")
    print(f"  Potential savings: ${pattern.potential_impact:,.2f}")

Feature Importance

from ml4t.diagnostic.evaluation import analyze_ml_importance

# Combines MDI, PFI, MDA, SHAP methods
results = analyze_ml_importance(model, X, y)

# Consensus ranking
print(results.consensus_ranking)
# [('momentum', 1.2), ('volatility', 2.1), ...]

# Warnings and interpretation
print(results.warnings)
print(results.interpretation)

Statistical Validation (DSR)

from ml4t.diagnostic.evaluation import stats

# Deflated Sharpe Ratio - accounts for multiple testing
dsr_result = stats.compute_dsr(
    returns=strategy_returns,
    benchmark_sr=0.0,
    n_trials=100,         # Number of strategies tested
    expected_max_sharpe=1.5
)

print(f"Sharpe Ratio: {dsr_result['sr']:.2f}")
print(f"Deflated Sharpe: {dsr_result['dsr']:.2f}")
print(f"Significant: {dsr_result['is_significant']}")

Signal Analysis

from ml4t.diagnostic.evaluation import SignalAnalysis

analyzer = SignalAnalysis(
    signal=factor_data,
    returns=forward_returns,
    periods=[1, 5, 21],  # 1D, 1W, 1M
)

# IC analysis with HAC adjustment
ic_result = analyzer.compute_ic_analysis()
print(f"IC Mean: {ic_result.ic_mean:.4f}")
print(f"HAC t-stat: {ic_result.hac_tstat:.2f}")

# Quantile returns
quantile_result = analyzer.compute_quantile_analysis()
print(f"Q5-Q1 spread: {quantile_result.spread:.2%}")

Portfolio Analysis

from ml4t.diagnostic.evaluation import PortfolioAnalysis

portfolio = PortfolioAnalysis(returns, benchmark=spy_returns)

# Summary metrics
metrics = portfolio.compute_summary_stats()
print(f"Sharpe: {metrics.sharpe_ratio:.2f}")
print(f"Max Drawdown: {metrics.max_drawdown:.2%}")

# Rolling metrics
rolling = portfolio.compute_rolling_metrics(window=252)

# Generate tear sheet
portfolio.generate_tear_sheet()

Time Series Diagnostics

from ml4t.diagnostic.evaluation import (
    analyze_stationarity,
    analyze_autocorrelation,
    analyze_volatility,
)

# Stationarity: ADF, KPSS, Phillips-Perron with consensus
result = analyze_stationarity(returns)
print(f"Consensus: {result.consensus}")  # 'stationary', 'non_stationary'

# Autocorrelation: ACF/PACF with significance
acf_result = analyze_autocorrelation(returns, nlags=20)
print(f"Significant lags: {acf_result.significant_lags}")

# Volatility: ARCH-LM test, GARCH fitting
vol_result = analyze_volatility(returns)
print(f"ARCH effects: {vol_result.has_arch_effects}")

Four-Tier Framework

Tier 1: Feature Analysis (Pre-Modeling)
├── Time series diagnostics (stationarity, ACF, volatility)
├── Distribution analysis (moments, normality, tails)
├── Feature importance (MDI, PFI, MDA, SHAP)
└── Feature interactions (Conditional IC, H-stat)

Tier 2: Signal Analysis (Model Outputs)
├── IC analysis (time series, histogram, heatmap)
├── Quantile returns (bar, violin, cumulative)
├── Turnover analysis (autocorrelation)
└── Multi-signal comparison and ranking

Tier 3: Backtest Analysis (Post-Modeling)
├── Trade analysis (win/loss, PnL, holding periods)
├── Statistical validity (DSR, RAS, PBO)
├── Trade-SHAP diagnostics (error patterns)
└── Excursion analysis (TP/SL optimization)

Tier 4: Portfolio Analysis (Production)
├── Performance metrics (Sharpe, Sortino, Calmar)
├── Drawdown analysis (underwater, top drawdowns)
├── Rolling metrics (Sharpe, volatility, beta)
└── Risk metrics (VaR, CVaR, tail ratio)

Statistical Methods

Method Purpose
DSR (Deflated Sharpe) Corrects for multiple testing bias
CPCV (Combinatorial Purged CV) Leak-free time series validation
RAS (Rademacher Anti-Serum) Backtest overfitting detection
PBO Probability of backtest overfitting
HAC-adjusted IC Autocorrelation-robust information coefficient
FDR Control Multiple comparisons (Benjamini-Hochberg)

Performance

Operation Dataset Time
5-fold CV 1M rows <10 seconds
Feature importance 100 features <5 seconds
CPCV backtest 100K bars <30 seconds
DSR calculation 252 returns <50ms

API Reference

Feature Analysis

from ml4t.diagnostic.evaluation import (
    analyze_ml_importance,      # Combined importance analysis
    compute_shap_importance,    # SHAP values
    analyze_interactions,       # Feature interactions
    analyze_stationarity,       # Stationarity tests
    analyze_autocorrelation,    # ACF/PACF
    analyze_volatility,         # ARCH effects
    analyze_distribution,       # Distribution tests
)

Signal Analysis

from ml4t.diagnostic.evaluation import (
    SignalAnalysis,             # Single signal analysis
    MultiSignalAnalysis,        # Multi-signal comparison
    compute_ic_series,          # IC time series
)

Trade Analysis

from ml4t.diagnostic.evaluation import (
    TradeAnalysis,              # Trade statistics
    TradeShapAnalyzer,          # SHAP-based diagnostics
)

Portfolio Analysis

from ml4t.diagnostic.evaluation import (
    PortfolioAnalysis,          # Portfolio metrics
)

Statistical Validation

from ml4t.diagnostic.evaluation import stats
from ml4t.diagnostic.splitters import (
    WalkForwardCV,        # Walk-forward with purging
    CombinatorialCV,      # CPCV
)

Binary Metrics

from ml4t.diagnostic.evaluation import (
    binary_classification_report,
    precision, recall, lift, coverage,
    wilson_score_interval,
    find_optimal_threshold,
)

Integration with ML4T Libraries

from ml4t.data import DataManager
from ml4t.engineer import compute_features
from ml4t.backtest import Engine
from ml4t.diagnostic.evaluation import TradeAnalysis, PortfolioAnalysis

# Complete workflow
data = DataManager().fetch("SPY", "2020-01-01", "2023-12-31")
features = compute_features(data, ["rsi", "macd", "atr"])
# ... train model ...
result = engine.run()

# Analyze trades
trade_analysis = TradeAnalysis(result.trades)
print(f"Win rate: {trade_analysis.win_rate:.1%}")

# Portfolio analysis
portfolio = PortfolioAnalysis(result.returns)
portfolio.generate_tear_sheet()

Ecosystem

  • ml4t-data: Market data acquisition and storage
  • ml4t-engineer: Feature engineering and indicators
  • ml4t-diagnostic: Statistical validation and evaluation (this library)
  • ml4t-backtest: Event-driven backtesting
  • ml4t-live: Live trading platform

Testing

# Run tests (4,887 tests)
uv run pytest tests/ -q -n auto

# Type checking
uv run ty check

# Linting
uv run ruff check src/

Development

git clone https://github.com/applied-ai/ml4t-diagnostic.git
cd ml4t-diagnostic

# Install with dev dependencies
uv sync

# Run tests
uv run pytest tests/ -q -n auto

# Type checking
uv run ty check

Optional Dependencies

Group Packages Features
[ml] shap, lightgbm, xgboost SHAP importance, tree explainers
[viz] plotly, streamlit Interactive visualizations
[deep] tensorflow Deep learning explainers
[gpu] cupy GPU acceleration
[all] All above Everything
# Check what's available
from ml4t.diagnostic.utils import get_dependency_summary
print(get_dependency_summary())

References

  • López de Prado, M. (2018). Advances in Financial Machine Learning. Wiley.
  • López de Prado, M. (2020). Machine Learning for Asset Managers. Cambridge.
  • Bailey, D., & López de Prado, M. (2012). "The Sharpe Ratio Efficient Frontier."
  • Bailey, D., et al. (2014). "The Deflated Sharpe Ratio."

See docs/REFERENCES.md for complete academic citations.

Current Status

Version: 0.1.0a1 (Alpha)

The library is functional and tested (4,887 tests) but still in alpha. API may change before 1.0.

License

MIT License - see LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ml4t_diagnostic-0.1.0a5.tar.gz (1.2 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ml4t_diagnostic-0.1.0a5-py3-none-any.whl (776.6 kB view details)

Uploaded Python 3

File details

Details for the file ml4t_diagnostic-0.1.0a5.tar.gz.

File metadata

  • Download URL: ml4t_diagnostic-0.1.0a5.tar.gz
  • Upload date:
  • Size: 1.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for ml4t_diagnostic-0.1.0a5.tar.gz
Algorithm Hash digest
SHA256 53d33d29a193cbb398f7bbee40962058c0da719e1f690cb6ce6f7093f1efc216
MD5 1be8092c2a0a6ee464cfab604e1b6df6
BLAKE2b-256 566dfc449b0383b1c9fc935a47d787cb3f7ba30b54db4277336b4cacff42a703

See more details on using hashes here.

File details

Details for the file ml4t_diagnostic-0.1.0a5-py3-none-any.whl.

File metadata

File hashes

Hashes for ml4t_diagnostic-0.1.0a5-py3-none-any.whl
Algorithm Hash digest
SHA256 b33df394a0c838d0e6bcdb58c0fa9b964e4ebf8f27da68f3a136adc9e8cd25f6
MD5 5bd6da39d0ace366052b11f55c5a67c1
BLAKE2b-256 beb16b4d35e2f628f307f117f513caa987b36e9fbdf61bb9d24454c23b86203b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page