Statistical validation for synthetic financial time series
Project description
financial-data-validation
Lightweight statistical validation for financial time series data.
Why This Exists
Need to validate synthetic market data, backtest results, or trading signals? statsmodels has everything, but it's 50+ MB with complex dependencies.
This package extracts only the diagnostic tests that matter for financial data:
- Ljung-Box - autocorrelation in returns
- ARCH effects - volatility clustering
- Jarque-Bera - return distribution normality
- Kolmogorov-Smirnov - distribution shape
- Variance Ratio - mean reversion vs momentum
- Runs Test - randomness of return signs
2 MB install. numpy + scipy only. Purpose-built for finance.
Installation
With uv:
uv sync
With pip:
pip install financial-data-validation
Quick Start
import numpy as np
from financial_data_validation import validate_paths
# Your price paths (n_paths, n_timesteps)
paths = np.random.lognormal(0, 0.02, size=(1000, 252))
# Validate
report = validate_paths(paths, frequency="daily")
print(report)
# Financial Data Validation Report ✓ PASSED
# Overall Quality Score: 86.3/100
# ...
# Check individual scores
print(f"ARCH (volatility clustering): {report.arch_score:.2f}")
print(f"Passed: {report.passed}")
What Gets Tested
| Test | What It Validates | Good Data Should... |
|---|---|---|
| Ljung-Box | Autocorrelation in returns | Show no autocorrelation (p > 0.05) |
| ARCH | Volatility clustering | Show clustering (p < 0.05) |
| Jarque-Bera | Skewness and kurtosis | Have reasonable moments (|skew| < 1, kurt < 5) |
| Kolmogorov-Smirnov | Distribution shape vs normal | Fit reasonably well (D < 0.08) |
| Variance Ratio | Random walk behavior | Have VR ≈ 1 at multiple horizons |
| Runs Test | Sign randomness | Show random +/- sequencing |
Individual Tests
from financial_data_validation.utils import compute_returns
from financial_data_validation.diagnostics.arch import arch_test
returns = compute_returns(paths)
score, details = arch_test(returns, lags=20)
print(f"ARCH score: {score:.3f}")
print(f"Volatility clustering: {'Yes' if details['passed'] else 'No'}")
Available tests:
ljung_box_test(returns, lags=20)- autocorrelationarch_test(returns, lags=20)- volatility clusteringjarque_bera_test(returns)- normalityks_test(returns)- distribution shapevariance_ratio_test(returns, lags=[2,5,10])- random walkruns_test(returns)- sign randomness
Custom Validation
# Stricter threshold
report = validate_paths(paths, threshold=85.0)
# Custom weights (emphasize volatility clustering)
weights = {
"ljung_box": 0.15,
"arch": 0.40, # Increased importance
"jarque_bera": 0.15,
"ks": 0.10,
"variance_ratio": 0.10,
"runs": 0.10
}
report = validate_paths(paths, weights=weights)
# Different data frequency
report = validate_paths(paths, frequency="hourly") # Uses 24 lags
Examples
See examples/ directory:
basic_usage.py- Complete validation workflowindividual_tests.py- Run tests independentlycustom_validation.py- Custom settingscomparing_models.py- Compare GBM vs GARCH
Quality Score Interpretation
- 90-100: Excellent - indistinguishable from real markets
- 80-89: Good - suitable for most applications
- 70-79: Acceptable - passes minimum requirements
- < 70: Poor - may produce unreliable results
When to Use This
Use for:
- Validating synthetic market data from Monte Carlo simulations
- Quality-checking GARCH, Heston, or other stochastic models
- Verifying backtest input data integrity
- Testing financial data generation pipelines
Don't use for:
- Time series forecasting (use
statsmodelsinstead) - Econometric modeling (use
statsmodelsinstead) - Non-financial time series (this is finance-specific)
Performance
Vectorized operations make validation fast:
- 10,000 paths × 252 timesteps: ~0.5 seconds
- 50,000 paths × 252 timesteps: ~2 seconds
Requirements
- Python ≥ 3.12
- numpy ≥ 2.0.0
- scipy ≥ 1.14.0
Contributing
Contributions welcome! See CONTRIBUTING.md.
License
MIT License - see LICENSE
Built By
QPaths - We use this package to validate every synthetic dataset we generate.
Citation
If you use this package in academic research:
@software{financial_data_validation,
title = {financial-data-validation: Statistical validation for financial time series},
author = {QPaths},
year = {2026},
url = {https://github.com/qpaths/financial-data-validation}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file financial_data_validation-0.1.0.tar.gz.
File metadata
- Download URL: financial_data_validation-0.1.0.tar.gz
- Upload date:
- Size: 44.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a3b99bcd00b783e20f5a7e6b2842029dac9057af8d05015b27966f8b73cf2679
|
|
| MD5 |
3196991efdd5ba30090452db9333b787
|
|
| BLAKE2b-256 |
373c5bcb7f28b5a97cf189696e6672f92f8f086f0107fff357e77c6e7b825c91
|
File details
Details for the file financial_data_validation-0.1.0-py3-none-any.whl.
File metadata
- Download URL: financial_data_validation-0.1.0-py3-none-any.whl
- Upload date:
- Size: 17.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7ce3bd3a6a7d4b06bdb87a3a86243b23e49735fa68801d838b81f1283a3b6fc3
|
|
| MD5 |
676937cd803f1b8f1a5cc21e79b3ac57
|
|
| BLAKE2b-256 |
6ace69fdaf2d84a0e5605a18c8f3de5d89a08886dba44fe663b6e027e1c1ec6b
|