timefence

Temporal correctness layer for ML training data

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

gauthierpiarrette

These details have not been verified by PyPI

Project links

Project description

Timefence

Your ML model may be trained on the future. Find out in one command.

Website · Docs · Changelog · Contributing

Timefence finds and fixes temporal data leakage in ML training sets. No infrastructure required — runs locally, reads Parquet/CSV, and finishes in seconds.

If you build training data by joining features to labels, your model may be training on the future. A LEFT JOIN or merge_asof gives each label the latest feature row — including data from after the event you're predicting. The model trains on the future. Offline metrics look great. Production doesn't match. No error, no warning, no way to tell from the output alone.

pip install timefence

Try It in 60 Seconds

timefence quickstart churn-example && cd churn-example
timefence audit data/train_LEAKY.parquet

TEMPORAL AUDIT REPORT
Scanned 5,000 rows

WARNING  LEAKAGE DETECTED in 3 of 4 features

  LEAK  rolling_spend_30d
        1,520 rows (30.4%) use feature data from the future
        Severity: HIGH

  LEAK  days_since_login
        4,909 rows (98.2%) use feature data from the future
        Severity: HIGH

  OK    user_country - clean (5,000 rows)
  OK    account_age_days - clean (5,000 rows)

Rebuild it with temporal correctness:

timefence build --labels data/labels.parquet --features features.py --output train_CLEAN.parquet

Building training set...

  Labels     5,000 rows from data/labels.parquet
  Features   4 features

  Joining with point-in-time correctness (feature_time < label_time):

  OK  user_country         5,000 / 5,000 matched
  OK  account_age_days     5,000 / 5,000 matched
  OK  rolling_spend_30d    5,000 / 5,000 matched
  OK  days_since_login     5,000 / 5,000 matched

  Written   train_CLEAN.parquet (5,000 rows, 7 cols)

Verify:

timefence audit train_CLEAN.parquet
# ALL CLEAN - no temporal leakage detected

Audit Your Existing Data

You don't need to change your pipeline. Point Timefence at any training set you already have:

timefence audit your_training_set.parquet --features features.py --keys user_id --label-time label_time

If it's clean, you'll know. If it's not, you'll see exactly which features leak, how many rows, and the severity. Takes seconds.

Python API

Audit any existing dataset — no sources or feature definitions needed:

import timefence

report = timefence.audit("train.parquet", keys=["user_id"], label_time="label_time")
report.assert_clean()  # raises if leakage found

Or define sources and features to build a correct dataset from scratch:

users = timefence.Source(path="data/users.parquet", keys=["user_id"], timestamp="updated_at")
txns  = timefence.Source(path="data/txns.parquet", keys=["user_id"], timestamp="created_at")

country = timefence.Feature(source=users, columns=["country"])
spend   = timefence.Feature(source=txns, embargo="1d", name="spend_30d", sql="""
    SELECT user_id, created_at AS feature_time,
           SUM(amount) OVER (PARTITION BY user_id ORDER BY created_at
               RANGE BETWEEN INTERVAL 30 DAY PRECEDING AND CURRENT ROW) AS spend_30d
    FROM {source}
""")

labels = timefence.Labels(
    path="data/labels.parquet", keys=["user_id"],
    label_time="label_time", target=["churned"],
)

result = timefence.build(labels=labels, features=[country, spend], output="train.parquet")

Add to CI

Stop leakage before it reaches production:

- run: pip install timefence && timefence audit data/train.parquet --features features.py --strict

--strict exits with code 1 on leakage. Your pipeline fails before a leaky model ever trains.

Performance

Built on DuckDB's columnar engine. Median of 3 runs after warmup (Intel i7, 16 GB):

Scenario	Labels	Features	Build	Audit
Small project	100K	1	0.5s	0.3s
Typical project	100K	10	1.9s	1.7s
Large project	1M	1	3.0s	2.0s
Large + many features	1M	10	12s	8.5s

Adding embargo, staleness, and splits costs seconds, not minutes.

Run benchmarks yourself

uv run python benchmarks/bench.py --quick
uv run python benchmarks/bench.py --quick --include-pandas

How It Works

Timefence generates SQL (ASOF JOIN or ROW_NUMBER) and runs it in an embedded DuckDB. No server, no JVM, no Spark. It enforces one rule — feature_time < label_time - embargo — for every row, every feature, every build. Every query is inspectable via timefence -v build or timefence explain.

All Features


Joins	Point-in-time correct. ASOF JOIN fast path, ROW_NUMBER fallback
Guardrails	Embargo, max lookback, max staleness — all configurable
Inputs	Parquet, CSV, SQL query, DataFrame
Feature modes	Column selection, SQL, Python transform
Splitting	Time-based train / validation / test splits
Caching	Feature-level cache with content-hash keys
Audit	Full rebuild-and-compare or lightweight temporal check
Reports	Severity classification. JSON manifest, HTML report, Rich terminal
CLI	`quickstart` `build` `audit` `explain` `diff` `inspect` `catalog` `doctor`
Flags	`-v` verbose · `--debug` · `--strict` CI gate · `--json` · `--html`

What Timefence Is NOT

Not This	Why	Use Instead
Feature store	No server, no online serving	Tecton, Feast
Data orchestrator	No scheduling, no DAGs	Airflow, Dagster
Data quality framework	Temporal correctness only	Great Expectations
ML pipeline framework	Produces training data only	MLflow, Metaflow

One tool. One job. Temporal correctness for ML training data.

Documentation · Contributing · Changelog

MIT License

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

gauthierpiarrette

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.9.1

Feb 10, 2026

0.9.0

Feb 6, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

timefence-0.9.1.tar.gz (512.8 kB view details)

Uploaded Feb 10, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

timefence-0.9.1-py3-none-any.whl (44.0 kB view details)

Uploaded Feb 10, 2026 Python 3

File details

Details for the file timefence-0.9.1.tar.gz.

File metadata

Download URL: timefence-0.9.1.tar.gz
Upload date: Feb 10, 2026
Size: 512.8 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for timefence-0.9.1.tar.gz
Algorithm	Hash digest
SHA256	`76d9cdf437d11f7248bf56fad7339f882286efc1b768191ae203f823e8b8bf21`
MD5	`a1ddeebc50d53347cd5151d778e38a97`
BLAKE2b-256	`5799d09137eba2f7c9f9dcc55261eb7fc9ad5e8b657986d13da31f4e9cf61b1e`

See more details on using hashes here.

Provenance

The following attestation bundles were made for timefence-0.9.1.tar.gz:

Publisher: release.yml on gauthierpiarrette/timefence

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: timefence-0.9.1.tar.gz
- Subject digest: 76d9cdf437d11f7248bf56fad7339f882286efc1b768191ae203f823e8b8bf21
- Sigstore transparency entry: 937970928
- Sigstore integration time: Feb 10, 2026
Source repository:
- Permalink: gauthierpiarrette/timefence@59e31b122dd6422b09353728a22efbb8cc376d31
- Branch / Tag: refs/tags/v0.9.1
- Owner: https://github.com/gauthierpiarrette
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@59e31b122dd6422b09353728a22efbb8cc376d31
- Trigger Event: push

File details

Details for the file timefence-0.9.1-py3-none-any.whl.

File metadata

Download URL: timefence-0.9.1-py3-none-any.whl
Upload date: Feb 10, 2026
Size: 44.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for timefence-0.9.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`bbc60841e16a8105c43b508a809a108df4eaf694051c8af5f0d9e30dcf0b7717`
MD5	`6f578bb0f0264971d949b4cbff004660`
BLAKE2b-256	`2d86efb16373e12e32d56c5f421fa7131db04b06f107ab48ab0ecfb0590b6bac`

See more details on using hashes here.

Provenance

The following attestation bundles were made for timefence-0.9.1-py3-none-any.whl:

Publisher: release.yml on gauthierpiarrette/timefence

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: timefence-0.9.1-py3-none-any.whl
- Subject digest: bbc60841e16a8105c43b508a809a108df4eaf694051c8af5f0d9e30dcf0b7717
- Sigstore transparency entry: 937970952
- Sigstore integration time: Feb 10, 2026
Source repository:
- Permalink: gauthierpiarrette/timefence@59e31b122dd6422b09353728a22efbb8cc376d31
- Branch / Tag: refs/tags/v0.9.1
- Owner: https://github.com/gauthierpiarrette
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@59e31b122dd6422b09353728a22efbb8cc376d31
- Trigger Event: push

timefence 0.9.1

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Timefence

Try It in 60 Seconds

Audit Your Existing Data

Python API

Add to CI

Performance

How It Works

All Features

What Timefence Is NOT

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance