Skip to main content

Synthetic alternative-history generation for backtest overfitting detection

Project description

sablier-flow

Stop shipping overfit backtests. Run your strategy on N alternative versions of history that share your data's statistical fingerprint. If the strategy only works on the one specific past that happened, that's a problem you can now measure.


What you get in 30 lines

import pandas as pd
import sablier_flow as sf

# 0. Auth — set SABLIER_FLOW_API_KEY env var, or pass api_key="sk_live_..." per call.
# 1. Load your data — any DataFrame with a DatetimeIndex + numeric columns.
real = pd.read_parquet("my_universe.parquet")
backtest_window = real.loc["2023-01-01":"2024-01-01"]    # the slice you'll evaluate

# 2. Fit (one-time, several minutes). 80/20 train/OOS split + 21-bar embargo by
#    default. The held-out OOS slice is kept encrypted next to the model so
#    sf.validate() picks it up automatically.
fit = sf.fit(real, features=list(real.columns), horizon=252,
             train_split=0.8, embargo_days=21, seed=42)

# 3. Validate the model on the held-out OOS slice — full structural metric
#    suite (calibration, dependence, tails, dynamics, memorization).
report = sf.validate(fit.model_id)
print(report.overall)                  # 'pass' | 'warn' | 'fail'
print(report.memorization_risk)        # 'low' | 'medium' | 'high'

# 4. Generate N synthetic alternative-history paths shaped like your backtest
#    window. ``like=`` derives length + dates + price anchor from the window.
paths = sf.generate(fit.model_id, n_paths=1000, like=backtest_window, seed=42)
synth_dfs = paths.as_dataframes()      # list[pd.DataFrame], one per path

# 5. Run *your existing* backtest on the real window AND on each synthetic.
real_result   = my_backtest(backtest_window)
synth_results = [my_backtest(df) for df in synth_dfs]

# 6. The smoking gun.
verdict = sf.robustness(real_result, synth_results, primary_metric="sharpe")
print(verdict.verdict)                 # 'robust' | 'borderline' | 'overfit' | 'highly_overfit'
print(verdict.overfit_score)           # 0.85 → real beat 85% of synthetic → overfit
print(verdict.summary())               # one-line English summary you can paste anywhere
verdict.to_html("audit.html")          # shareable single-file report

Two-step fit + generate (instead of one shot) means you fit once and generate as many windows as you want from the same model_id — cheap iteration on your strategy without paying to retrain.

See examples/00_getting_started.ipynb for the step-by-step walkthrough.


Install

# In a fresh venv (recommended — pip --upgrade in a shared kernel is risky):
python -m venv .venv && source .venv/bin/activate

pip install sablier-flow                           # thin client (~30 MB, no GPU deps)
pip install 'sablier-flow[adapters-backtrader]'    # + backtrader integration
pip install 'sablier-flow[adapters-vectorbt]'      # + vectorbt integration

Pin to an exact version (e.g. sablier-flow==1.0.0) rather than a range when publishing a backtest audit so the analysis re-runs identically months later. Bump the pin explicitly when you want a newer build.

Transitive deps: pandas, numpy, pyarrow, httpx, cryptography, pydantic — installed automatically with sablier-flow. No vendor data libraries (yfinance etc.); bundled demo datasets ship inside the wheel.

Get your API key at https://sablier.ai → Settings → API Keys (new accounts get free credits to cover several full cycles). Then:

export SABLIER_FLOW_API_KEY=sk_live_<your-token>

That's the whole setup. The default endpoint https://flow.sablier.ai/v1 over standard TLS — no gcloud, no cert pinning, no extra steps. Credit usage is shown live on the dashboard; a full fit + validate + generate cycle on the bundled demo dataset uses a small fraction of the free starter balance.


What ships

Capability API
Fit a flow model on your history (joint over all your features at any granularity — daily, weekly, monthly, or intraday at any bar period; auto-detected from your DatetimeIndex) sf.fit(df, features=[...], horizon=..., train_split=..., embargo_days=...)
Generate N synthetic alternative-history paths, anchored at any window sf.generate(model_id, n_paths=..., like=window)
Run the full structural validation suite on the held-out OOS slice sf.validate(model_id) — returns ValidationReport with overall, memorization_risk, and ~20 per-metric entries
Score a backtest's overfit (real vs synthetic distribution) sf.robustness(real_result, synth_results) — returns RobustnessReport with overfit_score, verdict, and synthetic_* percentiles
Deflated Sharpe Ratio under two nulls (analytical Bailey-LdP + empirical synthetic-best-of-N) sf.deflated_sharpe(...) or verdict.deflated_sharpe(n_trials=N)
Evaluate a family of strategy variants — CSCV PBO + family-best DSR sf.evaluate_family({"name": fn, ...}, real_data, n_paths=...)
Live drift monitoring once a strategy is deployed sf.consistency_check(realized_metric, baseline=robustness_report)
List / inspect / delete your fitted models sf.list_models(), sf.get_model(model_id), sf.delete_model(model_id)
Bundled demo datasets so you can try it with zero data setup sf.demo_data() — daily SPY/QQQ/IWM/TLT + macro series; sf.demo_data('us_equities_macro_5min_3mo') — 5-min intraday
Engine adapters result.as_dataframes() for pandas; from sablier_flow.adapters import as_backtrader_feeds, as_vectorbt_panel, write_lean_csv_universe

Full API reference: sablier.ai/flow/docs.


What happens under the hood

your laptop ──HTTPS──> Sablier API (Cloud Run) ──Cloud Tasks──> GPU worker (Cloud Run + L4)
     │                                                                    │
     │   1. POST /v1/jobs                                                  │
     │   ◄── 2. ephemeral X25519 pubkey + image digest                     │
     │                                                                     │
     │   3. envelope-encrypt your DataFrame (X25519 + AES-256-GCM)         │
     │   ──> PUT /v1/jobs/{id}/data ──────────────────────────────────────►│
     │                                                                     │
     │                                          4. decrypt in worker RAM,  │
     │                                             train the flow model,   │
     │                                             generate N paths,       │
     │                                             AES-GCM-encrypt back    │
     │                                                                     │
     │   5. GET /v1/jobs/{id}/result ◄────────────────────────────────────-│
     │   6. decrypt locally; result.paths_returns is yours                 │
     ▼
backtester (pandas / backtrader / vectorbt / LEAN / your own)

Security posture today (alpha)

Honest picture of what the SDK actually guarantees right now:

Layer Status
TLS 1.3 in transit (client ↔ API ↔ worker)
One-shot AES-256-GCM symmetric key per job (wrapped in X25519 envelope to the worker's ephemeral pubkey; never re-used, never persisted)
GCS at-rest encryption with Cloud KMS-managed keys (checkpoints + OOS holdouts + result blobs)
Customer data isolation — each job runs in its own Cloud Run instance, scaled to zero between jobs
Image-digest pinning — the SDK ships a pinned digest of the worker image; mismatched server image is rejected before any data is sent
AMD SEV-SNP CPU memory encryption — encrypted RAM, so even a privileged host OS or GCP operator cannot inspect plaintext during training 🚧 Not yet — Cloud Run L4 is not a confidential VM. Plaintext customer data exists in worker RAM during the ~minutes-long training job.
NVIDIA H100 CC mode (GPU memory encryption) + NRAS attestation chain 🚧 Awaiting H100 quota
Cryptographic attestation verified against AMD / NVIDIA root keys before the customer's encryption key is released 🚧 Same gate — the SDK's AttestationVerifier exists and runs the protocol, but the digest pinned today corresponds to a regular Cloud Run image, not a measured-boot enclave

What this means concretely: today the SDK delivers strong network-layer + storage-layer + key-lifecycle protection that's meaningfully better than most quant data SaaS offerings. It does not yet deliver memory-encryption-grade protection against a privileged GCP operator. A CISO evaluating us before SEV-SNP + H100 CC ship needs to see and accept that trade-off.

The full SEV-SNP + H100 CC + NRAS attestation deploys with v0.6, which lands when GCP releases our H100 confidential-compute quota. The wire protocol the SDK already speaks is the same one we'll use post-rollout, so customer code does not change.


Try it in 20 minutes — no setup beyond pip install

pip install sablier-flow matplotlib

Open examples/00_getting_started.ipynb and run the cells top-to-bottom. The notebook uses the bundled demo dataset (sf.demo_data() for daily SPY/QQQ/IWM/TLT + VIX/TNX/DXY macro series; sf.demo_data('us_equities_macro_5min_3mo') for 5-min intraday), so it works with zero data setup and zero external network beyond the SDK's hosted endpoint.


Engine integrations

sablier-flow is a data layer, not a backtest engine. We integrate with whatever you already use:

Engine How
Raw pandas / numpy result.as_dataframes(index=...)list[pd.DataFrame]; from sablier_flow.adapters import as_arraynp.ndarray
backtrader from sablier_flow.adapters import as_backtrader_feedslist[bt.feeds.PandasData]
vectorbt from sablier_flow.adapters import as_vectorbt_panel → wide pd.DataFrame
LEAN / QuantConnect from sablier_flow.adapters import write_lean_csv_universe → per-path CSV directories
In-house C++ / KDB / proprietary result.paths_prices is plain np.float32[n_paths, horizon, n_features] — shim ≤ 50 lines

Why "in-sample is correct" — short version

The model is trained on the same history your backtest will run on. If that triggers your overfit alarm, the answer is in the memorization metric: every fit ships a memorization_risk flag computed via Carlini-style nearest-neighbor-distance ratio against the training set. If the model is regurgitating samples, the SDK marks it high and the overfit verdict on top of it is explicitly not to be trusted. When it's low or medium, the synthetic distribution lives in the data-generating process's neighborhood, not on top of the training points themselves — which is precisely what you want to stress-test a strategy against. See docs/concepts/in-sample-is-correct.md for the long form.


License

sablier-flow (this package) — Apache-2.0.


Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sablier_flow-1.0.0.tar.gz (525.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sablier_flow-1.0.0-py3-none-any.whl (531.3 kB view details)

Uploaded Python 3

File details

Details for the file sablier_flow-1.0.0.tar.gz.

File metadata

  • Download URL: sablier_flow-1.0.0.tar.gz
  • Upload date:
  • Size: 525.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for sablier_flow-1.0.0.tar.gz
Algorithm Hash digest
SHA256 57ca00f5f65ef818187582cddb8b8010813f8bdcc4b90022275521f206a0c97c
MD5 d948c6c291355faa4d189513f628f261
BLAKE2b-256 8e7ed5be303eedfcbd9e98ccb391751a624fe3454cd049e80ac12ca74332b8b8

See more details on using hashes here.

File details

Details for the file sablier_flow-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: sablier_flow-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 531.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for sablier_flow-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 96588595860146c6f133e184196b97abbbae2a20d5e17aac63a9dbae40a63d3f
MD5 8f464b5821392317fcaa6f535c0074fc
BLAKE2b-256 62efc3959e3c2e85627056d91992de14733053a41dd0a20e7771320985db08be

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page