Skip to main content

Valgrind for Time-Series ML โ€” automatically detect look-ahead bias in data science pipelines.

Project description

๐Ÿ•ต๏ธ Temporal Leaks: Valgrind for Time-Series ML


Look-ahead bias is the silent killer of quant strategies and forecasting models.
Your backtest shows 40% annual returns. You deploy. You lose money.
Somewhere in your feature pipeline, a rolling average peeked at tomorrow's prices.

temporal-leaks catches this automatically โ€” before it costs you.


The Problem: Future Data in Your Past Features

In time-series machine learning, look-ahead bias (also called data leakage or future leakage) occurs when a feature computed for timestamp t inadvertently uses data from timestamps t+1, t+2, โ€ฆ t+n.

This is devastatingly easy to introduce:

# BUG: center=True means the window is centred โ€” it looks forward AND backward
df["roll_mean"] = df["price"].rolling(window=5, center=True).mean()

# BUG: shift(-1) reads the NEXT row's value
df["next_return"] = df["return"].shift(-1)

# BUG: global z-score uses future data to compute mean/std
df["znorm"] = (df["price"] - df["price"].mean()) / df["price"].std()

None of these will raise an error.
Your tests will pass.
Your backtests will look amazing.
And then reality hits.


How It Works: The Temporal Perturbation Test

  Timeline:   โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ถ
                              T (midpoint)
                              โ”‚
  Past โ—€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”คโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ถ Future
                              โ”‚
  Step 1:  Run pipeline on original data
           baseline_features = pipeline(df)

  Step 2:  MUTATE the future
           df_perturbed[t > T] = ๐Ÿ”ฅ (noise / sign flip / NaN)

  Step 3:  Re-run pipeline on perturbed data
           perturbed_features = pipeline(df_perturbed)

  Step 4:  Compare features for PAST rows only (t โ‰ค T)
           If baseline_features[tโ‰คT] โ‰  perturbed_features[tโ‰คT]
           then the past features DEPEND on future data โ†’ LEAK! ๐Ÿšจ

The key insight: if your past features are truly causal, mutating the future should not change them. If they change, future data crept in.


Installation

pip install temporal-leaks

Or from source:

git clone https://github.com/temporal-leaks/temporal-leaks
cd temporal-leaks
pip install -e ".[dev]"

Quick Start

import pandas as pd
import numpy as np
from temporal_leaks import TemporalAudit, TemporalLeakageError

# Build a sample time-series dataset
df = pd.DataFrame({
    "ts":    np.arange(500),
    "price": np.random.default_rng(42).normal(100, 5, size=500),
})

# โ”€โ”€โ”€ โœ“ CLEAN PIPELINE โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
def causal_features(df: pd.DataFrame) -> pd.DataFrame:
    out = df.copy()
    # Expanding window only looks at past โ€” safe!
    out["expanding_mean"] = out["price"].expanding(min_count=1).mean()
    # shift(+1) looks at the previous row โ€” safe!
    out["lag1"] = out["price"].shift(1)
    return out

auditor = TemporalAudit(mode="nullify", random_seed=42)
report  = auditor.check(df, timestamp_col="ts", pipeline_fn=causal_features)
print(report)
# โœ“  CLEAN โ€” leakage_score=0.0000


# โ”€โ”€โ”€ โœ— LEAKING PIPELINE โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
def leaking_features(df: pd.DataFrame) -> pd.DataFrame:
    out = df.copy()
    # center=True peeks at future rows โ€” LEAKS!
    out["centred_roll"] = out["price"].rolling(11, center=True, min_periods=1).mean()
    return out

try:
    auditor.check(df, timestamp_col="ts", pipeline_fn=leaking_features)
except TemporalLeakageError as exc:
    print(exc)
    # TemporalLeakageError: leakage_score=0.4812
    #   Breached columns (1):
    #     โ€ข [HIGH] column='centred_roll' effect_size=0.4812 ...

Decorator API

from temporal_leaks import temporal_audit

@temporal_audit(timestamp_col="ts", mode="noise", random_seed=42)
def build_features(df: pd.DataFrame) -> pd.DataFrame:
    df = df.copy()
    df["expanding_mean"] = df["price"].expanding(min_count=1).mean()
    return df

# The audit runs automatically on every call.
# TemporalLeakageError is raised if leakage is detected.
result = build_features(df)

HTML Audit Reports

report = auditor.check(df, "ts", leaking_features)

# Write a beautiful standalone HTML report
with open("audit_report.html", "w") as f:
    f.write(report.to_html())

The HTML report includes:

  • Leakage score with a visual progress bar
  • Per-column severity badges (LOW / MEDIUM / HIGH / CRITICAL)
  • Effect size, mean |ฮ”|, max |ฮ”|, % rows changed
  • First timestamp where each leak was observed
  • Provenance hints describing likely causes

API Reference

TemporalAudit

TemporalAudit(
    mode: Literal["noise", "sign_flip", "nullify"] = "noise",
    random_seed: int = 42,
    delta_threshold: float = 1e-8,
    leakage_threshold: float = 0.0,
    ignore_columns: list[str] | None = None,
)
Parameter Description
mode Perturbation strategy: noise adds Gaussian noise, sign_flip multiplies by -1, nullify sets NaN
random_seed Integer seed โ€” fully deterministic, reproducible across runs
delta_threshold Minimum cell-level change to count as "different" (suppresses float noise)
leakage_threshold If leakage_score > leakage_threshold, raise TemporalLeakageError. Set to 1.1 to always return report
ignore_columns List of output columns to skip during comparison

AuditReport

@dataclass
class AuditReport:
    leakage_score:     float          # 0.0 = clean, 1.0 = fully compromised
    breached_columns:  list[ColumnLeakMeta]
    clean_columns:     list[str]
    perturbation_mode: str
    evaluation_time:   Any
    random_seed:       int
    provenance_hints:  dict[str, str]

    def to_html(self) -> str: ...     # standalone HTML report

ColumnLeakMeta

@dataclass(frozen=True)
class ColumnLeakMeta:
    column_name:          str
    first_leaky_timestamp: Any
    mean_absolute_delta:  float
    max_delta:            float
    pct_rows_changed:     float
    effect_size:          float    # normalised, 0โ€“1
    severity:             str      # LOW | MEDIUM | HIGH | CRITICAL

Severity Classification

Severity Effect Size
๐ŸŸฆ LOW effect_size < 0.15
๐ŸŸจ MEDIUM 0.15 โ‰ค effect_size < 0.40
๐ŸŸง HIGH 0.40 โ‰ค effect_size < 0.75
๐ŸŸฅ CRITICAL effect_size โ‰ฅ 0.75

Perturbation Modes

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Mode           โ”‚ What it does to future rows                          โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ noise          โ”‚ Adds Gaussian noise: ฮผ=0, ฯƒ=2ร—column_std             โ”‚
โ”‚ sign_flip      โ”‚ Multiplies all numeric values by โˆ’1                  โ”‚
โ”‚ nullify        โ”‚ Replaces all values with NaN / null                  โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Use nullify for the strictest test.
Use noise for pipelines that handle NaN gracefully (e.g., imputers).
Use sign_flip to test pipelines sensitive to sign changes (e.g., momentum factors).


Polars Support

import polars as pl
from temporal_leaks import TemporalAudit

df = pl.DataFrame({"ts": range(200), "value": [float(i) for i in range(200)]})

auditor = TemporalAudit(mode="nullify", random_seed=42)
report  = auditor.check(df, "ts", my_polars_pipeline)

temporal-leaks handles Polars DataFrames transparently โ€” pass them in, get results back in the same type.


Benchmarks

Dataset Rows Columns Backend Mode Time
Synthetic prices 1,000,000 5 Polars nullify ~1.1 s
Synthetic prices 10,000,000 5 Polars nullify ~3.2 s
Equity features 500,000 20 Pandas noise ~2.8 s

Benchmarks run on Apple M2 Pro, 16 GB RAM. Polars backend strongly recommended for large frames.


Running Tests

# Install dev extras
pip install -e ".[dev]"

# Run the full suite
pytest tests/ -v

# With coverage
pytest tests/ --cov=temporal_leaks --cov-report=term-missing

Contributing

Pull requests are welcome. For major changes, please open an issue first.

  1. Fork the repo
  2. Create your feature branch: git checkout -b feat/my-feature
  3. Commit your changes: git commit -m 'feat: add my feature'
  4. Push and open a PR

Please make sure ruff check . and mypy temporal_leaks/ pass before submitting.


License

MIT ยฉ temporal-leaks contributors

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

temporal_leaks-0.1.0.tar.gz (6.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

temporal_leaks-0.1.0-py3-none-any.whl (6.2 kB view details)

Uploaded Python 3

File details

Details for the file temporal_leaks-0.1.0.tar.gz.

File metadata

  • Download URL: temporal_leaks-0.1.0.tar.gz
  • Upload date:
  • Size: 6.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for temporal_leaks-0.1.0.tar.gz
Algorithm Hash digest
SHA256 a818c316ff0fe0fd1f873e57e5a5d2f343674495fe8f32982c692480a0d79af2
MD5 3d8eb6e81063f30c7f4f83a47b72e3d3
BLAKE2b-256 ea2bac9f91698fdab8832dcef996dd425b6a28d3a5ca094172d5e39c542271da

See more details on using hashes here.

File details

Details for the file temporal_leaks-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: temporal_leaks-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 6.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for temporal_leaks-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f8887d05085b70eca39729551e670e65d20aa5ce9ff6aba5e058c48c7e8f7a11
MD5 51c886c85688d0430ed8f5d14d18a309
BLAKE2b-256 3e9aee2d1e4bd84d869f71c0df4ac69e18c9a66d30624e950c42902bd2171927

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page