Temporal correctness layer for ML training data
Project description
Timefence
Your ML model may be trained on the future. Find out in one command.
Website · Docs · Changelog · Contributing
Timefence finds and fixes temporal data leakage in ML training sets. No infrastructure required — runs locally, reads Parquet/CSV, and finishes in seconds.
If you build training data by joining features to labels, your model may be training on the future. A LEFT JOIN or merge_asof gives each label the latest feature row — including data from after the event you're predicting. The model trains on the future. Offline metrics look great. Production doesn't match. No error, no warning, no way to tell from the output alone.
pip install timefence
Try It in 60 Seconds
timefence quickstart churn-example && cd churn-example
timefence audit data/train_LEAKY.parquet
TEMPORAL AUDIT REPORT
Scanned 5,000 rows
WARNING LEAKAGE DETECTED in 3 of 4 features
LEAK rolling_spend_30d
1,520 rows (30.4%) use feature data from the future
Severity: HIGH
LEAK days_since_login
4,909 rows (98.2%) use feature data from the future
Severity: HIGH
OK user_country - clean (5,000 rows)
OK account_age_days - clean (5,000 rows)
Rebuild it with temporal correctness:
timefence build --labels data/labels.parquet --features features.py --output train_CLEAN.parquet
Building training set...
Labels 5,000 rows from data/labels.parquet
Features 4 features
Joining with point-in-time correctness (feature_time < label_time):
OK user_country 5,000 / 5,000 matched
OK account_age_days 5,000 / 5,000 matched
OK rolling_spend_30d 5,000 / 5,000 matched
OK days_since_login 5,000 / 5,000 matched
Written train_CLEAN.parquet (5,000 rows, 7 cols)
Verify:
timefence audit train_CLEAN.parquet
# ALL CLEAN - no temporal leakage detected
Audit Your Existing Data
You don't need to change your pipeline. Point Timefence at any training set you already have:
timefence audit your_training_set.parquet --features features.py --keys user_id --label-time label_time
If it's clean, you'll know. If it's not, you'll see exactly which features leak, how many rows, and the severity. Takes seconds.
Python API
Audit any existing dataset — no sources or feature definitions needed:
import timefence
report = timefence.audit("train.parquet", keys=["user_id"], label_time="label_time")
report.assert_clean() # raises if leakage found
Or define sources and features to build a correct dataset from scratch:
users = timefence.Source(path="data/users.parquet", keys=["user_id"], timestamp="updated_at")
txns = timefence.Source(path="data/txns.parquet", keys=["user_id"], timestamp="created_at")
country = timefence.Feature(source=users, columns=["country"])
spend = timefence.Feature(source=txns, embargo="1d", name="spend_30d", sql="""
SELECT user_id, created_at AS feature_time,
SUM(amount) OVER (PARTITION BY user_id ORDER BY created_at
RANGE BETWEEN INTERVAL 30 DAY PRECEDING AND CURRENT ROW) AS spend_30d
FROM {source}
""")
labels = timefence.Labels(
path="data/labels.parquet", keys=["user_id"],
label_time="label_time", target=["churned"],
)
result = timefence.build(labels=labels, features=[country, spend], output="train.parquet")
Add to CI
Stop leakage before it reaches production:
- run: pip install timefence && timefence audit data/train.parquet --features features.py --strict
--strict exits with code 1 on leakage. Your pipeline fails before a leaky model ever trains.
Performance
Built on DuckDB's columnar engine. Median of 3 runs after warmup (Intel i7, 16 GB):
| Scenario | Labels | Features | Build | Audit |
|---|---|---|---|---|
| Small project | 100K | 1 | 0.5s | 0.3s |
| Typical project | 100K | 10 | 1.9s | 1.7s |
| Large project | 1M | 1 | 3.0s | 2.0s |
| Large + many features | 1M | 10 | 12s | 8.5s |
Adding embargo, staleness, and splits costs seconds, not minutes.
Run benchmarks yourself
uv run python benchmarks/bench.py --quick
uv run python benchmarks/bench.py --quick --include-pandas
How It Works
Timefence generates SQL (ASOF JOIN or ROW_NUMBER) and runs it in an embedded DuckDB. No server, no JVM, no Spark. It enforces one rule — feature_time < label_time - embargo — for every row, every feature, every build. Every query is inspectable via timefence -v build or timefence explain.
All Features
| Joins | Point-in-time correct. ASOF JOIN fast path, ROW_NUMBER fallback |
| Guardrails | Embargo, max lookback, max staleness — all configurable |
| Inputs | Parquet, CSV, SQL query, DataFrame |
| Feature modes | Column selection, SQL, Python transform |
| Splitting | Time-based train / validation / test splits |
| Caching | Feature-level cache with content-hash keys |
| Audit | Full rebuild-and-compare or lightweight temporal check |
| Reports | Severity classification. JSON manifest, HTML report, Rich terminal |
| CLI | quickstart build audit explain diff inspect catalog doctor |
| Flags | -v verbose · --debug · --strict CI gate · --json · --html |
What Timefence Is NOT
| Not This | Why | Use Instead |
|---|---|---|
| Feature store | No server, no online serving | Tecton, Feast |
| Data orchestrator | No scheduling, no DAGs | Airflow, Dagster |
| Data quality framework | Temporal correctness only | Great Expectations |
| ML pipeline framework | Produces training data only | MLflow, Metaflow |
One tool. One job. Temporal correctness for ML training data.
Documentation · Contributing · Changelog
MIT License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file timefence-0.9.1.tar.gz.
File metadata
- Download URL: timefence-0.9.1.tar.gz
- Upload date:
- Size: 512.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
76d9cdf437d11f7248bf56fad7339f882286efc1b768191ae203f823e8b8bf21
|
|
| MD5 |
a1ddeebc50d53347cd5151d778e38a97
|
|
| BLAKE2b-256 |
5799d09137eba2f7c9f9dcc55261eb7fc9ad5e8b657986d13da31f4e9cf61b1e
|
Provenance
The following attestation bundles were made for timefence-0.9.1.tar.gz:
Publisher:
release.yml on gauthierpiarrette/timefence
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
timefence-0.9.1.tar.gz -
Subject digest:
76d9cdf437d11f7248bf56fad7339f882286efc1b768191ae203f823e8b8bf21 - Sigstore transparency entry: 937970928
- Sigstore integration time:
-
Permalink:
gauthierpiarrette/timefence@59e31b122dd6422b09353728a22efbb8cc376d31 -
Branch / Tag:
refs/tags/v0.9.1 - Owner: https://github.com/gauthierpiarrette
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@59e31b122dd6422b09353728a22efbb8cc376d31 -
Trigger Event:
push
-
Statement type:
File details
Details for the file timefence-0.9.1-py3-none-any.whl.
File metadata
- Download URL: timefence-0.9.1-py3-none-any.whl
- Upload date:
- Size: 44.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bbc60841e16a8105c43b508a809a108df4eaf694051c8af5f0d9e30dcf0b7717
|
|
| MD5 |
6f578bb0f0264971d949b4cbff004660
|
|
| BLAKE2b-256 |
2d86efb16373e12e32d56c5f421fa7131db04b06f107ab48ab0ecfb0590b6bac
|
Provenance
The following attestation bundles were made for timefence-0.9.1-py3-none-any.whl:
Publisher:
release.yml on gauthierpiarrette/timefence
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
timefence-0.9.1-py3-none-any.whl -
Subject digest:
bbc60841e16a8105c43b508a809a108df4eaf694051c8af5f0d9e30dcf0b7717 - Sigstore transparency entry: 937970952
- Sigstore integration time:
-
Permalink:
gauthierpiarrette/timefence@59e31b122dd6422b09353728a22efbb8cc376d31 -
Branch / Tag:
refs/tags/v0.9.1 - Owner: https://github.com/gauthierpiarrette
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@59e31b122dd6422b09353728a22efbb8cc376d31 -
Trigger Event:
push
-
Statement type: