Skip to main content

Statistical validation for synthetic financial time series

Project description

financial-data-validation

Lightweight statistical validation for financial time series data.

Tests PyPI Python License

Why This Exists

Need to validate synthetic market data, backtest results, or trading signals? statsmodels has everything, but it's 50+ MB with complex dependencies.

This package extracts only the diagnostic tests that matter for financial data:

  • Ljung-Box - autocorrelation in returns
  • ARCH effects - volatility clustering
  • Jarque-Bera - return distribution normality
  • Kolmogorov-Smirnov - distribution shape
  • Variance Ratio - mean reversion vs momentum
  • Runs Test - randomness of return signs

2 MB install. numpy + scipy only. Purpose-built for finance.

Installation

With uv:

uv sync

With pip:

pip install financial-data-validation

Quick Start

import numpy as np

from financial_data_validation import validate_paths

# Your price paths (n_paths, n_timesteps)
paths = np.random.lognormal(0, 0.02, size=(1000, 252))

# Validate
report = validate_paths(paths, frequency="daily")

print(report)
# Financial Data Validation Report ✓ PASSED
# Overall Quality Score: 86.3/100
# ...

# Check individual scores
print(f"ARCH (volatility clustering): {report.arch_score:.2f}")
print(f"Passed: {report.passed}")

What Gets Tested

Test What It Validates Good Data Should...
Ljung-Box Autocorrelation in returns Show no autocorrelation (p > 0.05)
ARCH Volatility clustering Show clustering (p < 0.05)
Jarque-Bera Skewness and kurtosis Have reasonable moments (|skew| < 1, kurt < 5)
Kolmogorov-Smirnov Distribution shape vs normal Fit reasonably well (D < 0.08)
Variance Ratio Random walk behavior Have VR ≈ 1 at multiple horizons
Runs Test Sign randomness Show random +/- sequencing

Individual Tests

from financial_data_validation.utils import compute_returns
from financial_data_validation.diagnostics.arch import arch_test

returns = compute_returns(paths)
score, details = arch_test(returns, lags=20)

print(f"ARCH score: {score:.3f}")
print(f"Volatility clustering: {'Yes' if details['passed'] else 'No'}")

Available tests:

  • ljung_box_test(returns, lags=20) - autocorrelation
  • arch_test(returns, lags=20) - volatility clustering
  • jarque_bera_test(returns) - normality
  • ks_test(returns) - distribution shape
  • variance_ratio_test(returns, lags=[2,5,10]) - random walk
  • runs_test(returns) - sign randomness

Custom Validation

# Stricter threshold
report = validate_paths(paths, threshold=85.0)

# Custom weights (emphasize volatility clustering)
weights = {
    "ljung_box": 0.15,
    "arch": 0.40,        # Increased importance
    "jarque_bera": 0.15,
    "ks": 0.10,
    "variance_ratio": 0.10,
    "runs": 0.10
}
report = validate_paths(paths, weights=weights)

# Different data frequency
report = validate_paths(paths, frequency="hourly")  # Uses 24 lags

Examples

See examples/ directory:

Quality Score Interpretation

  • 90-100: Excellent - indistinguishable from real markets
  • 80-89: Good - suitable for most applications
  • 70-79: Acceptable - passes minimum requirements
  • < 70: Poor - may produce unreliable results

When to Use This

Use for:

  • Validating synthetic market data from Monte Carlo simulations
  • Quality-checking GARCH, Heston, or other stochastic models
  • Verifying backtest input data integrity
  • Testing financial data generation pipelines

Don't use for:

  • Time series forecasting (use statsmodels instead)
  • Econometric modeling (use statsmodels instead)
  • Non-financial time series (this is finance-specific)

Performance

Vectorized operations make validation fast:

  • 10,000 paths × 252 timesteps: ~0.5 seconds
  • 50,000 paths × 252 timesteps: ~2 seconds

Requirements

  • Python ≥ 3.12
  • numpy ≥ 2.0.0
  • scipy ≥ 1.14.0

Contributing

Contributions welcome! See CONTRIBUTING.md.

License

MIT License - see LICENSE

Built By

QPaths - We use this package to validate every synthetic dataset we generate.

Citation

If you use this package in academic research:

@software{financial_data_validation,
  title = {financial-data-validation: Statistical validation for financial time series},
  author = {QPaths},
  year = {2026},
  url = {https://github.com/qpaths/financial-data-validation}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

financial_data_validation-0.1.0.tar.gz (44.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

financial_data_validation-0.1.0-py3-none-any.whl (17.8 kB view details)

Uploaded Python 3

File details

Details for the file financial_data_validation-0.1.0.tar.gz.

File metadata

File hashes

Hashes for financial_data_validation-0.1.0.tar.gz
Algorithm Hash digest
SHA256 a3b99bcd00b783e20f5a7e6b2842029dac9057af8d05015b27966f8b73cf2679
MD5 3196991efdd5ba30090452db9333b787
BLAKE2b-256 373c5bcb7f28b5a97cf189696e6672f92f8f086f0107fff357e77c6e7b825c91

See more details on using hashes here.

File details

Details for the file financial_data_validation-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for financial_data_validation-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7ce3bd3a6a7d4b06bdb87a3a86243b23e49735fa68801d838b81f1283a3b6fc3
MD5 676937cd803f1b8f1a5cc21e79b3ac57
BLAKE2b-256 6ace69fdaf2d84a0e5605a18c8f3de5d89a08886dba44fe663b6e027e1c1ec6b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page