Temporal correctness layer for ML training data

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

gauthierpiarrette

These details have not been verified by PyPI

Project description

Timefence

Temporal correctness layer for ML training data.

Timefence guarantees no future data leakage, audits existing pipelines, and builds point-in-time correct datasets — locally, with zero infrastructure, in seconds.

From pip install to "I found leakage in my pipeline" in under 3 minutes.

Install

pip install timefence

Three runtime dependencies: duckdb, click, rich. Python 3.9+.

Quick Start

timefence quickstart churn-example
cd churn-example

This generates a self-contained project with synthetic data and planted leakage:

churn-example/
  timefence.yaml              # Project config
  features.py              # 4 feature definitions
  data/
    users.parquet           # Synthetic user data (10K users, 30K rows)
    transactions.parquet    # Synthetic transactions (200K rows)
    labels.parquet          # Churn labels (5K rows)
    train_LEAKY.parquet     # Pre-built dataset WITH planted leakage
  README.md

Step 1: Find the leakage.

timefence audit data/train_LEAKY.parquet

TEMPORAL AUDIT REPORT
Scanned 5,000 rows

WARNING  LEAKAGE DETECTED in 3 of 4 features

  LEAK  rolling_spend_30d
        1,520 rows (30.4%) use feature data from the future
        Severity: HIGH

  LEAK  days_since_login
        4,909 rows (98.2%) use feature data from the future
        Severity: HIGH

  OK    user_country - clean (5,000 rows)

  OK    account_age_days - clean (5,000 rows)

Step 2: Fix it.

timefence build --labels data/labels.parquet --features features.py --output data/train_CLEAN.parquet

Building training set...

  Labels     5,000 rows from data/labels.parquet
  Features   4 features

  Joining with point-in-time correctness (feature_time < label_time):

  OK  user_country         5,000 / 5,000 matched
  OK  account_age_days     5,000 / 5,000 matched
  OK  rolling_spend_30d    5,000 / 5,000 matched
  OK  days_since_login     5,000 / 5,000 matched

  Written   data/train_CLEAN.parquet (5,000 rows, 7 cols)
  Manifest  .timefence/builds/20260205T143022Z/build.json

Step 3: Verify.

timefence audit data/train_CLEAN.parquet

ALL CLEAN - no temporal leakage detected

Core Concepts

Timefence has 6 user-facing concepts:

Concept	Definition
Source	A table of historical data with timestamps
Feature	A named column derived from a source
Labels	Prediction targets with entity keys and event times
Build	Constructing a point-in-time correct dataset
Audit	Checking any dataset for temporal leakage
Store	A local directory that tracks builds and manifests (optional)

The Core Invariant

For every row in a Timefence-built training set:

feature_time < label_time - embargo

Strict less-than. No feature value used in training may have been recorded at or after the label event minus its embargo.

Python API

Source

Declare where historical data lives and how to interpret it temporally.

import timefence

users = timefence.Source(
    path="data/users.parquet",
    keys=["user_id"],
    timestamp="updated_at",
)

# CSV source
events = timefence.CSVSource(
    path="data/events.csv",
    keys=["user_id"],
    timestamp="event_time",
    delimiter="|",
)

# SQL source
txns = timefence.SQLSource(
    query="SELECT * FROM transactions WHERE amount > 0",
    keys=["user_id"],
    timestamp="created_at",
    name="transactions",
)

Keys and timestamp are always required. Timefence never infers them.

Feature

One class. Three modes. Exactly one of columns, sql, or transform must be provided.

Mode 1: Column Selection (~70% of features)

user_country = timefence.Feature(
    source=users,
    columns=["country"],
)

# Multiple columns
user_profile = timefence.Feature(
    source=users,
    columns=["country", "signup_platform", "account_tier"],
)

# Column rename (source_col -> feature_col)
user_region = timefence.Feature(
    source=users,
    columns={"region_code": "region"},
)

Mode 2: SQL (~25% of features)

rolling_spend = timefence.Feature(
    source=transactions,
    sql="""
        SELECT
            user_id,
            created_at AS feature_time,
            SUM(amount) OVER (
                PARTITION BY user_id
                ORDER BY created_at
                RANGE BETWEEN INTERVAL 30 DAY PRECEDING AND CURRENT ROW
            ) AS spend_30d
        FROM {source}
    """,
    name="rolling_spend_30d",
    embargo="1d",
)

# Or from a .sql file (recommended for production)
rolling_spend = timefence.Feature(
    source=transactions,
    sql=Path("features/rolling_spend.sql"),
    embargo="1d",
)

Mode 3: Python Transform (~5% of features)

def compute_complex_feature(conn, source_table):
    conn.create_function("my_udf", lambda x: x * 2.5, [float], float)
    return conn.sql(f"""
        SELECT user_id, created_at AS feature_time,
               my_udf(raw_score) AS adjusted_score
        FROM {source_table}
    """)

complex_feature = timefence.Feature(
    source=transactions,
    transform=compute_complex_feature,
)

Feature options (apply to all modes):

timefence.Feature(
    source=...,
    columns=... | sql=... | transform=...,
    name="rolling_spend_30d",        # Auto-derived when possible
    embargo="1d",                    # Computation lag buffer (default: "0d")
    key_mapping={"user_id": "customer_id"},  # When source uses different key names
    on_duplicate="error",            # "error" (default) or "keep_any"
)

Labels

labels = timefence.Labels(
    path="data/labels.parquet",
    keys=["user_id"],
    label_time="label_time",
    target=["churned"],
)

# From a DataFrame already in memory
labels = timefence.Labels(
    df=my_dataframe,
    keys=["user_id"],
    label_time="label_time",
    target=["churned"],
)

Build

result = timefence.build(
    labels=labels,
    features=[user_country, rolling_spend, complex_feature],
    output="train.parquet",

    # Temporal controls
    max_lookback="365d",       # Ignore features older than this
    max_staleness="30d",       # If best feature is older, treat as missing
    join="strict",             # "strict" (default, <) or "inclusive" (<=)
    on_missing="null",         # "null" (keep row) or "skip" (drop row)

    # Time-based splits
    splits={
        "train": ("2023-01-01", "2024-01-01"),
        "valid": ("2024-01-01", "2024-07-01"),
        "test":  ("2024-07-01", "2025-01-01"),
    },

    # Reproducibility
    store=timefence.Store(".timefence"),
)

# Inspect the result
print(result)               # Pretty summary
result.output_path           # "train.parquet"
result.manifest              # Full build manifest (dict)
result.stats                 # Row counts, feature stats, timing
result.splits                # {"train": Path, "valid": Path, "test": Path}
result.sql                   # The exact SQL executed
result.validate()            # Re-check audit passed

Audit

# Rebuild-and-compare mode (full audit)
report = timefence.audit(
    data="existing_training.parquet",
    features=[user_country, rolling_spend],
    keys=["user_id"],
    label_time="label_time",
)

# Temporal check mode (lightweight, no source data needed)
report = timefence.audit.temporal(
    data="existing_training.parquet",
    feature_time_columns={
        "spend_30d": "spend_computed_at",
        "country": "country_updated_at",
    },
    label_time="label_time",
)

# Use the report
report.has_leakage           # bool
report.clean_features        # ["user_country"]
report.leaky_features        # ["rolling_spend_30d"]
report["rolling_spend_30d"]  # FeatureAuditDetail

# Export
report.to_json("report.json")
report.to_html("report.html")

# CI integration
report.assert_clean()        # Raises TimefenceLeakageError if leakage found

Explain

Preview join logic without executing:

plan = timefence.explain(
    labels=labels,
    features=[user_country, rolling_spend],
)
print(plan)

Every query is copy-pasteable for manual verification.

Diff

Compare two training datasets:

diff = timefence.diff(
    old="train_v1.parquet",
    new="train_v2.parquet",
    keys=["user_id"],
    label_time="label_time",
    atol=1e-10,   # Absolute tolerance for numeric comparison
    rtol=1e-7,    # Relative tolerance for numeric comparison
)
print(diff)

FeatureSet

Group features for reuse:

user_features = timefence.FeatureSet(
    name="user_features",
    features=[user_country, account_age, user_tier],
)

result = timefence.build(
    labels=labels,
    features=[user_features, rolling_spend],  # Mix FeatureSets and Features
    output="train.parquet",
)

Store

Track builds for reproducibility:

store = timefence.Store(".timefence")
result = timefence.build(labels=labels, features=features, output="train.parquet", store=store)

# Later
builds = store.list_builds()          # All builds, newest first
manifest = store.get_build(build_id)  # Specific build manifest

CLI Reference

`timefence quickstart`

Generate a self-contained example project.

timefence quickstart [project-name]    # default: churn-example
timefence quickstart myproject --minimal

`timefence inspect`

Suggest keys and timestamps for a data file.

timefence inspect data/users.parquet

`timefence audit`

Audit any dataset for temporal leakage.

# With timefence.yaml config (flags inferred)
timefence audit data/train.parquet

# Explicit flags
timefence audit data/train.parquet \
  --features features.py \
  --keys user_id \
  --label-time label_time

# CI mode (exit 1 if leakage)
timefence audit data/train.parquet --strict

# Export
timefence audit data/train.parquet --json
timefence audit data/train.parquet --html report.html

`timefence build`

Build a point-in-time correct training set.

timefence build \
  --labels data/labels.parquet \
  --features features.py \
  --output train.parquet

# With options
timefence build \
  --labels data/labels.parquet \
  --features features.py \
  --output train.parquet \
  --max-lookback 365d \
  --max-staleness 30d \
  --on-missing null \
  --join-mode strict

# Time-based splits
timefence build \
  --labels data/labels.parquet \
  --features features.py \
  --output train.parquet \
  --split train:2023-01-01:2024-01-01 \
  --split test:2024-01-01:2025-01-01

# Dry run (show plan only)
timefence build --labels data/labels.parquet --features features.py --output train.parquet --dry-run

`timefence explain`

Preview join logic without executing.

timefence explain --labels data/labels.parquet --features features.py

# Single feature
timefence explain --features features.py:rolling_spend_30d

`timefence diff`

Compare two training datasets.

timefence diff train_v1.parquet train_v2.parquet --keys user_id --label-time label_time

# Custom numeric tolerance
timefence diff v1.parquet v2.parquet --keys user_id --label-time label_time --atol 0.01 --rtol 0.001

`timefence catalog`

List all features defined in the project.

timefence catalog --features features.py

`timefence doctor`

Diagnose project setup and common issues.

timefence doctor

`timefence init`

Initialize a project with a timefence.yaml config file.

timefence init

Configuration

timefence.yaml is optional. Every setting can be passed via CLI flags or the Python API.

name: churn-model
version: "1.0"

features:
  - features.py

labels:
  path: data/labels.parquet
  keys: [user_id]
  label_time: label_time
  target: [churned]

defaults:
  max_lookback: 365d
  join: strict
  on_missing: "null"

store: .timefence/

output:
  dir: artifacts/

Precedence: CLI flags > Python API arguments > timefence.yaml > built-in defaults.

The Join Algebra

Given a label row (K, T) and a feature with embargo E, max lookback L, and optional max staleness S:

candidate_rows = { f : f.key = K  AND  f.feature_time in [T - L,  T - E) }
selected       = latest feature_time from candidate_rows
if S is set and selected.feature_time < T - S: treat as missing

T - L                    T - S                    T - E          T
 |                        |                        |             |
 |     stale (miss)       |     fresh (usable)     |  embargo    |  future
 |                        |                        |  (blocked)  |  (blocked)

Parameter constraints: L > E, and if S is set: L >= S > E.

CI/CD Integration

# GitHub Actions
- name: Audit training data
  run: |
    pip install timefence
    timefence audit data/train.parquet \
      --features features.py \
      --strict    # Exit code 1 if leakage found

Error Messages

Timefence errors follow a consistent structure: what happened, why it matters, where (specific data), and how to fix it.

TimefenceSchemaError: Feature 'clicks_7d' is missing required key column 'user_id'.

  Point-in-time joins require matching keys between labels and features.

  Expected keys: ['user_id']
  Actual columns: ['customer_id', 'feature_time', 'clicks_7d']
                   ^^^^^^^^^^^^ similar to 'user_id' - possible rename?

  Fix: Add key_mapping to your feature definition:
    timefence.Feature(..., key_mapping={"user_id": "customer_id"})

What Timefence Is NOT

Not This	Why	Use Instead
Feature store platform	No server, no online serving	Tecton, Feast
Data orchestrator	No scheduling	Airflow, Dagster
Data quality framework	Temporal correctness only	Great Expectations
DataFrame library	Not general-purpose	Polars, Pandas, DuckDB
ML pipeline framework	Produces training data only	MLflow, Metaflow

Timefence is a single-purpose tool: temporal correctness for ML training data.

License

MIT

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

gauthierpiarrette

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.9.1

Feb 10, 2026

This version

0.9.0

Feb 6, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

timefence-0.9.0.tar.gz (54.5 kB view details)

Uploaded Feb 6, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

timefence-0.9.0-py3-none-any.whl (43.7 kB view details)

Uploaded Feb 6, 2026 Python 3

File details

Details for the file timefence-0.9.0.tar.gz.

File metadata

Download URL: timefence-0.9.0.tar.gz
Upload date: Feb 6, 2026
Size: 54.5 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for timefence-0.9.0.tar.gz
Algorithm	Hash digest
SHA256	`138004a51915d60a04fc71af9dce7fcc7c1bf5e0b5740423b37281bafd3d9d08`
MD5	`b4f57a9bb788a4c12c78d98d04bbbdd5`
BLAKE2b-256	`1a5dac6e127f73befe45cd36330e2ec9a3c65e80115b5252e676c585ecce2649`

See more details on using hashes here.

Provenance

The following attestation bundles were made for timefence-0.9.0.tar.gz:

Publisher: release.yml on gauthierpiarrette/timefence

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: timefence-0.9.0.tar.gz
- Subject digest: 138004a51915d60a04fc71af9dce7fcc7c1bf5e0b5740423b37281bafd3d9d08
- Sigstore transparency entry: 925604183
- Sigstore integration time: Feb 6, 2026
Source repository:
- Permalink: gauthierpiarrette/timefence@c5fc4219726bf651660ec4d8deea1ab791d1e974
- Branch / Tag: refs/tags/v0.9.0
- Owner: https://github.com/gauthierpiarrette
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@c5fc4219726bf651660ec4d8deea1ab791d1e974
- Trigger Event: push

File details

Details for the file timefence-0.9.0-py3-none-any.whl.

File metadata

Download URL: timefence-0.9.0-py3-none-any.whl
Upload date: Feb 6, 2026
Size: 43.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for timefence-0.9.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`961e24941a1330bbf93f7571b4378f5b0bafc9c17ef86832f523522bb063be01`
MD5	`37a9b6e7e2de043640f26f98ce52e1d4`
BLAKE2b-256	`db552be6c0d18c5337c20b79afdc2e9300a130e8a0237175d32e8e94e8626b20`

See more details on using hashes here.

Provenance

The following attestation bundles were made for timefence-0.9.0-py3-none-any.whl:

Publisher: release.yml on gauthierpiarrette/timefence

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: timefence-0.9.0-py3-none-any.whl
- Subject digest: 961e24941a1330bbf93f7571b4378f5b0bafc9c17ef86832f523522bb063be01
- Sigstore transparency entry: 925604186
- Sigstore integration time: Feb 6, 2026
Source repository:
- Permalink: gauthierpiarrette/timefence@c5fc4219726bf651660ec4d8deea1ab791d1e974
- Branch / Tag: refs/tags/v0.9.0
- Owner: https://github.com/gauthierpiarrette
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@c5fc4219726bf651660ec4d8deea1ab791d1e974
- Trigger Event: push

timefence 0.9.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

Timefence

Install

Quick Start

Core Concepts

The Core Invariant

Python API

Source

Feature

Labels

Build

Audit

Explain

Diff

FeatureSet

Store

CLI Reference

timefence quickstart

timefence inspect

timefence audit

timefence build

timefence explain

timefence diff

timefence catalog

timefence doctor

timefence init

Configuration

The Join Algebra

CI/CD Integration

Error Messages

What Timefence Is NOT

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

`timefence quickstart`

`timefence inspect`

`timefence audit`

`timefence build`

`timefence explain`

`timefence diff`

`timefence catalog`

`timefence doctor`

`timefence init`