Rigorous validation for synthetic financial time series

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

teamsablier

These details have not been verified by PyPI

Project description

finval

Rigorous validation for synthetic financial time series.

finval is a Python library for assessing the quality of synthetic market data against real data. It was built because no existing library covers the financial stylized facts that matter: fat tails, volatility clustering, leverage effect, crash co-movement, and probabilistic forecast calibration.

finval is the scoring backend behind FinBench, the public leaderboard for multivariate financial time-series generation.

Current release: 0.5.0 (0.2.0/0.3.0/0.4.0 preserved at their tags for reproducibility). 0.5.0 adds an opt-in subwindow mode for long-horizon path validation plus a per-metric effective_n, both purely additive — the default (subwindow=None) is byte-identical to 0.4.0, so existing FinBench scores do not move. The 0.4.0 scoring structure is unchanged: 6 weighted lenses — marginal (0.15), dependence (0.20), temporal (0.13), joint (0.10), conditional (0.22) and generative (0.20) — plus 7 hard gates that fail a model outright regardless of the weighted score (memorization, tail_quantiles, tail_dependence_lower, drawdown_distribution, conditional_sensitivity, c2st, coverage_deficit). New in 0.4.0: a regime-stratified conditional_sensitivity axis (catches a calibrated-but-climatological generator whose forecast barely moves across regimes), a generative lens (Naeem density/coverage scored as the delta vs a block-bootstrap replay of the real data — a generator must beat replay to justify itself), a joint C2ST omnibus, and graceful degradation (metrics flagged non-applicable on thin-data / long-horizon inputs rather than scored as failures). The library is in active use (FinBench v2 production scoring); pin an exact version if you need score-stability across releases.

Why finval?

General-purpose synthetic data libraries (sdmetrics, synthcity, tsgm) treat time series as generic sequences. They don't know what "leverage effect" is, don't check PIT uniformity, and don't compute tail dependence coefficients. For financial applications — risk management, backtesting, derivatives — you need a suite that tests the things that actually matter for market data.

We're not aware of another library that combines, for financial time series, the full stylized-fact battery (fat tails, volatility clustering, leverage effect, crash co-movement, time-irreversibility, long memory, aggregational gaussianity) with a path-law signature metric, a C2ST omnibus, proper-scoring calibration (CRPS / PIT / coverage), regime-conditional sensitivity, and memorization gating against a real-vs-real baseline — in one weighted, gated battery.

finval computes 35 metric functions for synthetic financial time series: 26 carry weight across 6 lenses (marginal, dependence, temporal, joint, conditional, generative); the balance are weight-0 diagnostic localizers — computed and reported so they pinpoint where a defect is without destabilizing the headline score, and including a path-signature path-law metric — and 7 hard gates fail a model outright regardless of the weighted score. Every threshold is calibrated against real financial data and justified by the statistical literature. The metrics split across three entry points by input shape:

11 flat metrics — 3 distributional (marginal_ks, energy_distance, tail_quantiles) and 8 dependence (pearson_corr, spearman_corr, copula_distance, tail_dependence_upper, tail_dependence_lower, correlation_breakdown, tail_dependence_asymmetry, covariance_calibration) — run by validate(...) on 2D flat data.
7 path-level metrics (acf_returns, volatility_clustering, leverage_effect, cross_correlation, drawdown_distribution, regime_conditional, memorization) — run by validate_paths(...) on 3D sample paths, which also reshapes the paths and runs the 11 flat metrics on them (so validate_paths produces 18 scores total).
5 calibration metrics (pit_uniformity, crps, coverage_50, coverage_90, coverage_95) — run by validate_calibration(...) on per-observation forecast distributions paired with realized actuals.

Implementation note: 26 metrics carry weight in overall_score (across the 6 lenses), drawn from a larger surface of 35 compute functions. The balance are weight-0 localizers — e.g. signature_distance (level-2 truncated path-signature distance), time_reversal_asymmetry, aggregational_gaussianity, long_memory, coskewness — computed and reported, but kept out of the weighted aggregate because they are lower-power, so a noisy metric can flag a defect without swinging the score. Some compute functions emit multiple outputs (compute_tail_dependence → upper + lower; compute_coverage → three levels).

Installation

pip install finval

Quickstart

import numpy as np
import finval

# 2D data: (n_samples, n_features) returns
real = np.random.randn(1000, 3) * 0.01
synthetic = np.random.randn(1000, 3) * 0.01

report = finval.validate(synthetic, real)
print(report.summary())
print(f"Overall quality: {report.overall_quality}")
print(f"Pass rate: {report.pass_rate:.0%}")

# 3D data: (n_paths, horizon, n_features) for path-level validation
real_paths = np.random.randn(100, 60, 3) * 0.01
syn_paths = np.random.randn(100, 60, 3) * 0.01

report = finval.validate_paths(syn_paths, real_paths)
print(report.summary())

Metrics

As of 0.4.0 the weighted score is a six-lens mean (weights below); a set of diagnostic localizers is computed + reported at weight 0; and 7 hard gates fail a model outright. validate_full(...) runs every lens it has inputs for.

Marginal (15%)

marginal_ks — Kolmogorov-Smirnov test on each feature's marginal
energy_distance — multivariate distribution difference
tail_quantiles (gate) — 1st/5th/95th/99th percentile comparison (robust alternative to kurtosis)
tail_heaviness — tail-index fidelity

Dependence (20%)

pearson_corr / spearman_corr — linear / rank correlation matrix error
copula_distance — Cramér-von Mises distance between empirical copulas
tail_dependence_upper (λ_U) / tail_dependence_lower (gate) (λ_L) — rally / crash co-movement
correlation_breakdown — stress vs calm regime correlation shift
tail_dependence_asymmetry — fidelity of the crash-vs-rally gap A = λ_L − λ_U (0 by construction for any elliptical/Gaussian model, so it catches what the individual λ levels miss)
covariance_calibration — variance/correlation dispersion ratio (catches a covariance right on average but wrong in spread)

Temporal (13%)

acf_returns — autocorrelation of returns (should be ~0)
volatility_clustering — autocorrelation of squared returns
leverage_effect — corr(r_t, |r_{t+k}|) (negative for equities)
cross_correlation — contemporaneous cross-asset correlation

Joint (10%)

c2st (gate) — Classifier Two-Sample Test: train a classifier to tell synthetic from real; ~0.5 accuracy = indistinguishable. An omnibus that catches joint-distribution defects the per-axis metrics miss.

Conditional (22%)

regime_conditional — regime-conditional distributional fidelity: paths bucketed into low/mid/high realized-vol regimes (tercile edges from the real paths) and scored on stress-path frequency + within-regime shape.
conditional_sensitivity (gate) — regime-stratified energy-distance ratio: does the forecast distribution actually move across vol/trend regimes, or is the model a calibrated climatology that ignores the conditioning? The pooled lenses are blind to this.
pit_uniformity / crps / coverage_50/90/95 (coverage_deficit is a gate) — per-observation forecast-distribution calibration.

Generative (20%)

Naeem density / coverage — manifold realism scored as the delta vs a block-bootstrap replay of the real data (a generator must beat replay to justify itself, not merely tie it). Surfaces coverage_deficit (gate) + plausibility_deficit.

Hard gates (fail outright, regardless of the weighted score)

memorization (data-copying: synth→real vs real→real NN distances — pass the training set as real), tail_quantiles, tail_dependence_lower, drawdown_distribution, conditional_sensitivity, c2st, coverage_deficit.

Baselines

Compare your model against simple reference generators to calibrate what "good" means for your data:

from finval.baselines import gaussian_baseline, historical_bootstrap, block_bootstrap

# Gaussian: matches mean+cov, no temporal structure
gauss = gaussian_baseline(real, n_samples=1000)

# i.i.d. bootstrap: matches joint distribution exactly, zero temporal
boot = historical_bootstrap(real, n_samples=1000)

# Block bootstrap: preserves short-range temporal structure
blocks = block_bootstrap(real, n_paths=100, path_length=60, block_size=20)

# Validate each
for name, syn in [("gaussian", gauss), ("iid", boot)]:
    r = finval.validate(syn, real)
    print(f"{name}: {r.overall_quality} ({r.overall_score:.0%})")

Design principles

Reliable over comprehensive. Each metric is chosen because it's robust and informative, not because it's impressive.
Mean over max for pairwise metrics. Max over n(n-1)/2 feature pairs is dominated by sampling noise. finval uses mean error, which is harder to fool and more stable run-to-run.
Lower is always better. Every metric is normalized so that zero is perfect and higher is worse. No flipped signs to remember.
Financial stylized facts first. Leverage effect, vol clustering, fat tails, crash co-movement — these aren't optional for financial data.
Proper scoring rules. CRPS and PIT uniformity are proper scoring rules, not just rank-order checks. Your model is evaluated against the ground truth the statistics literature actually endorses.

Changelog

0.5.0 — Opt-in sub-window validation + per-metric effective_n, both purely additive: the default (subwindow=None) is byte-identical to 0.4.0, so existing scores do not move. validate_paths(...) and validate_full(...) gain a keyword-only subwindow=W argument. When set (and path_length > W), the short/medium-scale path metrics are scored on non-overlapping W-length sub-windows of the long paths ((n, H, f) → (n*(H//W), W, f), dropping the H % W remainder), giving many more independent samples — and so more statistical power — at long horizons (e.g. H=252) where a full-horizon path functional otherwise has only a handful of independent episodes. Genuinely full-horizon metrics (long_memory, variance_term_structure, signature_distance, memorization) stay on the full paths; flat metrics are unchanged (flattening is invariant to sub-windowing). Every MetricResult now carries effective_n — the number of independent real (sub)windows it was computed on (n_real_paths for full-horizon/default path metrics, n_real_paths * (H // W) for sub-windowed ones, the flattened real row count for flat metrics) — the reference sample size that bounds the metric's power. NOTE: effective_n is the count finval was given; if the caller passes overlapping real windows the truly-independent N is lower (independence is the caller's responsibility).
0.4.0 — Scoring reorganized into 6 weighted lenses (marginal 0.15, dependence 0.20, temporal 0.13, joint 0.10, conditional 0.22, generative 0.20)
- 7 hard gates. New: conditional_sensitivity (regime-stratified — catches a climatological generator whose forecast ignores the conditioning), a generative lens (Naeem density/coverage vs a block-bootstrap replay baseline), a joint c2st omnibus, and graceful degradation (metrics flagged non-applicable on thin-data / long-horizon inputs instead of scored as failures). validate_full(...) is the all-lenses entry point.
0.3.0 — Two new dependence metrics: tail_dependence_asymmetry (scores whether synthetic paths reproduce the real lower-vs-upper tail-dependence asymmetry A = λ_L − λ_U that elliptical/Gaussian baselines get as 0) and covariance_calibration (scores the variance/correlation dispersion ratios of synthetic vs real, catching a covariance that is right on average but wrong in spread). Two new category axes: regime_conditional (12% — regime-conditional fidelity, the measured option-pricing gap that no pooled metric sees) and memorization (5% — nearest-neighbor data-copying diagnostic). Scoring is now de-quantized (continuous bands rather than discrete tiers). Category weights rebalanced: distribution 0.15→0.20, temporal 0.20→0.15, calibration 0.30→0.15, path 0.10→0.08, with the new conditional/memorization axes carved out. The dependence and path metrics run under validate_paths(...); the two dependence metrics also run under validate(...).

License

MIT

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

teamsablier

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.6.1

Jul 8, 2026

0.5.0

Jun 29, 2026

0.4.0

Jun 25, 2026

0.3.0

Jun 17, 2026

0.2.0

Jun 2, 2026

0.1.0

May 28, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

finval-0.6.1.tar.gz (98.0 kB view details)

Uploaded Jul 8, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

finval-0.6.1-py3-none-any.whl (80.6 kB view details)

Uploaded Jul 8, 2026 Python 3

File details

Details for the file finval-0.6.1.tar.gz.

File metadata

Download URL: finval-0.6.1.tar.gz
Upload date: Jul 8, 2026
Size: 98.0 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for finval-0.6.1.tar.gz
Algorithm	Hash digest
SHA256	`351a18251bfc0fd0f13b7ab22fa055a26c1b1597648d06764ff219ffceeed4c7`
MD5	`601a8e49d6744c56c1aa3be584c7f4ff`
BLAKE2b-256	`872ed2e68774932b8b1eb57ceb2a6a68d3619223339f45f06481e432a1e3d0dc`

See more details on using hashes here.

Provenance

The following attestation bundles were made for finval-0.6.1.tar.gz:

Publisher: release.yml on sablier-ai/finval

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: finval-0.6.1.tar.gz
- Subject digest: 351a18251bfc0fd0f13b7ab22fa055a26c1b1597648d06764ff219ffceeed4c7
- Sigstore transparency entry: 2115264132
- Sigstore integration time: Jul 8, 2026
Source repository:
- Permalink: sablier-ai/finval@28637239fc00283d20068a104a0c018c1d48dea2
- Branch / Tag: refs/tags/v0.6.1
- Owner: https://github.com/sablier-ai
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@28637239fc00283d20068a104a0c018c1d48dea2
- Trigger Event: push

File details

Details for the file finval-0.6.1-py3-none-any.whl.

File metadata

Download URL: finval-0.6.1-py3-none-any.whl
Upload date: Jul 8, 2026
Size: 80.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for finval-0.6.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`063ca79383cae0b4ecaed2c061ced462d1a796a75be2b3901e0f230c36dc5d50`
MD5	`5f0707d837bdc25dcb201b641bb1b8ff`
BLAKE2b-256	`1d28a4f88f13c78038a9fd3b97b82b9efd7d6c9073d85cc1a95b6d381a84ff3c`

See more details on using hashes here.

Provenance

The following attestation bundles were made for finval-0.6.1-py3-none-any.whl:

Publisher: release.yml on sablier-ai/finval

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: finval-0.6.1-py3-none-any.whl
- Subject digest: 063ca79383cae0b4ecaed2c061ced462d1a796a75be2b3901e0f230c36dc5d50
- Sigstore transparency entry: 2115264185
- Sigstore integration time: Jul 8, 2026
Source repository:
- Permalink: sablier-ai/finval@28637239fc00283d20068a104a0c018c1d48dea2
- Branch / Tag: refs/tags/v0.6.1
- Owner: https://github.com/sablier-ai
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@28637239fc00283d20068a104a0c018c1d48dea2
- Trigger Event: push

finval 0.6.1

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

finval

Why finval?

Installation

Quickstart

Metrics

Marginal (15%)

Dependence (20%)

Temporal (13%)

Joint (10%)

Conditional (22%)

Generative (20%)

Hard gates (fail outright, regardless of the weighted score)

Baselines

Design principles

Changelog

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance