Leakage-safe, point-in-time feature engineering for event logs.
Project description
safefeat
Leakage-safe, point-in-time feature engineering for event logs.
safefeat builds ML features from event data using only information available at prediction time — no future data, no silent leakage, no surprises in production.
The Problem
When you compute features like "total purchases in the last 30 days" without anchoring to a cutoff time, you accidentally include future events. Your model looks great in training — then falls apart in production.
# ❌ Leaky — uses ALL events, including future ones
features = events.groupby("user_id")["amount"].sum()
df = spine.merge(features, on="user_id")
# ✅ Safe — only uses events before each cutoff_time
X = build_features(spine, tables, spec, event_time_cols={"events": "event_time"})
Install
pip install safefeat
How It Works
safefeat works with three components:
| Component | Description |
|---|---|
| Spine | When to make predictions — one row per (entity_id, cutoff_time) |
| Events | Historical time-series data tied to each entity |
| Spec | Declarative definition of what features to compute |
For each row in the spine, safefeat joins only events where event_time <= cutoff_time, then computes your features. Future events are excluded.
Quick Start
Window aggregations
import pandas as pd
from safefeat import build_features, WindowAgg
spine = pd.DataFrame({
"entity_id": ["u1", "u2"],
"cutoff_time": ["2024-01-10", "2024-01-31"],
})
events = pd.DataFrame({
"entity_id": ["u1", "u1", "u2", "u2"],
"event_time": ["2024-01-05", "2024-01-06", "2024-01-10", "2024-01-30"],
"amount": [10.0, 20.0, 5.0, 25.0],
"event_type": ["click", "purchase", "purchase", "click"],
})
spec = [
WindowAgg(
table="events",
windows=["7D", "30D"],
metrics={
"*": ["count"], # total events
"amount": ["sum", "mean"], # numeric aggregations
"event_type": ["nunique"], # distinct event types
},
)
]
X = build_features(
spine=spine,
tables={"events": events},
spec=spec,
event_time_cols={"events": "event_time"},
allowed_lag="0s",
)
Output columns follow the pattern {table}__{column}__{agg}__{window}:
events__n_events__7d
events__amount__sum__7d
events__amount__mean__30d
events__event_type__nunique__30d
Recency features
Time since the most recent event before each cutoff — useful for churn, fraud, and behavioural modelling:
from safefeat import RecencyBlock
spec = [RecencyBlock(table="events")]
X = build_features(
spine=spine,
tables={"events": events},
spec=spec,
event_time_cols={"events": "event_time"},
)
# Adds: events__recency (days since last event before cutoff_time)
Filter by event type:
spec = [
RecencyBlock(
table="events",
filter_col="event_type",
filter_value="purchase",
)
]
# Adds: events__recency__event_type_purchase
Audit report
Verify exactly which events were included and dropped for each prediction point:
X, audit = build_features(
spine=spine,
tables={"events": events},
spec=spec,
event_time_cols={"events": "event_time"},
return_report=True,
)
events_audit = audit.tables.get("events")
print(events_audit.total_joined_pairs) # total event-cutoff pairs considered
print(events_audit.kept_pairs) # events before cutoff (used)
print(events_audit.dropped_future_pairs) # events after cutoff (excluded)
Development
pip install -e ".[dev]"
pytest -q
ruff check .
Documentation
Full documentation, concepts, and API reference: 👉 https://alishaang.github.io/safefeat/
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file safefeat-0.1.2.tar.gz.
File metadata
- Download URL: safefeat-0.1.2.tar.gz
- Upload date:
- Size: 22.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
eccf503fb7264379b4026385ff99d520a52e2ee0b9f8dbcbdcd00e9676a0d62b
|
|
| MD5 |
7379d8dd829ab94cafcdda5a13814902
|
|
| BLAKE2b-256 |
55dd607b1b08cb1a09e5d17b5ea21c16bea2b9904bd7251b2cdf813a6e9b14f6
|
File details
Details for the file safefeat-0.1.2-py3-none-any.whl.
File metadata
- Download URL: safefeat-0.1.2-py3-none-any.whl
- Upload date:
- Size: 10.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
45b7e94ce063a3b41ea6e4906156d521ef873229fcfe4a0d8b15d8e63cfdc81b
|
|
| MD5 |
dc791af78a3af838684c7a0660150f61
|
|
| BLAKE2b-256 |
8eeee57c246d71f3b21e835f451527bcf00ccb22a4014bf27e836d8a0c43c7ab
|