Point-in-time data operations for Polars — prevent look-ahead bias in quantitative research
Project description
ere
Point-in-time data operations for Polars. Prevent look-ahead bias in quantitative research and backtesting.
The problem
When you join quarterly earnings onto daily prices using a naive merge, every row gets the final value, including restated figures that weren't published yet. Your backtest sees the future and the results are meaningless.
Standard tools have no concept of when a data point became known. ere fixes this by tracking two dates per row:
- ref_date: when the event occurred (e.g. fiscal quarter end)
- knowledge_date — the date you actually had it (e.g. SEC filing date)
After any as_of() call, every row satisfies knowledge_date <= query_date. No future information leaks.
Install
pip install ere
Requires Python >= 3.12 and Polars >= 1.0.
Quick start
from datetime import date
import polars as pl
import ere
PRICES = ere.TemporalSpec(ref_date="date", knowledge_date="date", entity="ticker")
EARNINGS = ere.TemporalSpec(
ref_date="fiscal_quarter_end",
knowledge_date="filing_date",
entity="ticker",
)
ere.validate(prices_df, spec=PRICES)
ere.validate(earnings_df, spec=EARNINGS)
# What did we know on Feb 1?
snap = ere.as_of(earnings_df, query_date="2025-02-01", spec=EARNINGS)
# AAPL shows original filing (1.50), not the restatement filed Feb 20 (1.42)
snap_later = ere.as_of(earnings_df, query_date="2025-03-01", spec=EARNINGS)
# now AAPL shows 1.42, the restated value
# Multi-source snapshot at one date
combined = ere.as_of(
[(prices_df, PRICES), (earnings_df, EARNINGS)],
query_date="2025-02-15",
)
# Multi-date snapshots for backtesting
rebalance_dates = [date(2025, 1, 31), date(2025, 2, 15), date(2025, 2, 28)]
snapshots = ere.panel(prices_df, spec=PRICES, dates=rebalance_dates, lookback=252)
# Post-hoc audit
ere.audit(earn_snap, as_of_date="2025-02-15", knowledge_date_col="filing_date")
API
| Function | Purpose |
|---|---|
as_of(df, query_date, spec) * |
Point-in-time snapshot, filtered to what was known at query_date |
panel(df, spec, dates) * |
Multi-date snapshots; returns {date: DataFrame} |
panel_iter(df, spec, dates) * |
Iterator over panel(), yielding (date, DataFrame) |
panel_lazy(df, spec, dates) * |
Fully lazy version. Single inequality join under the hood, tagged by query date |
panel_map(df, spec, dates, fn) * |
Apply a Python function to each snapshot (for logic that doesn't fit as Polars expressions) |
validate(df, spec) |
Check temporal structure (columns, dtypes, no nulls, no time travel) |
audit(df, as_of_date) |
Assert no look-ahead leakage in a result frame |
tag_knowledge_date(df, ...) |
Add a knowledge_date column from a fixed lag |
deduplicate(df, spec) |
Remove duplicate versions |
align(sources, date) |
Per-source PIT snapshots at one date |
* Also accepts a list/dict of (frame, spec) sources; see Multi-source snapshots below.
Key concepts
TemporalSpec binds your column names to temporal roles. Define it once per dataset:
SPEC = ere.TemporalSpec(
ref_date="fiscal_quarter_end", # when the event happened
knowledge_date="filing_date", # when it became known
entity="ticker", # optional grouping key
)
Restatements are handled automatically. If a data point is published multiple times (same entity + ref_date, different knowledge_dates), as_of() returns the latest version that was known at the query date.
Multi-source snapshots. Pass a list (or dict) of (frame, spec) pairs anywhere a single (frame, spec) goes. Each source is PIT-aligned independently, then asof-joined into one frame:
sources = [(prices_df, PRICES), (earnings_df, EARNINGS)]
snapshot = ere.as_of(sources, "2025-02-15") # joined point-in-time view
panels = ere.panel(sources, dates=rebalance_dates) # multi-date snapshots
# Entity-less specs (no `entity=`) broadcast across entities automatically:
MACRO = ere.TemporalSpec(ref_date="release_period", knowledge_date="release_date")
with_macro = ere.as_of([*sources, (macro_df, MACRO)], "2025-02-15")
# Want them separate instead of merged? Use align:
per_source = ere.align(sources, "2025-02-15") # -> [prices_snap, earn_snap]
Examples
Runnable end-to-end scripts live in examples/:
quickstart.py— end-to-end backtest walkthrough: restatements, naive-join leakage, multi-date rebalancing.multi_source.py— list/dict sources, entity-less broadcasting,panel/panel_iter/align.panel_lazy.py— benchmarksas_of()loop vspanel_lazy/panel_mapon second-granularity orderbook data.
Run any of them with uv run python examples/<name>.py.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ere-0.1.0.tar.gz.
File metadata
- Download URL: ere-0.1.0.tar.gz
- Upload date:
- Size: 10.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
defb1343696eec9db163b635a5de9672abae2bbf8e05cdd0f2877605d4934a2d
|
|
| MD5 |
de93250310bf59df87a08e557d052e0b
|
|
| BLAKE2b-256 |
a74ac4c93b2c3ce564c8baaf4ab2499c76f1bbfa0bb86f50084950d5ed193716
|
Provenance
The following attestation bundles were made for ere-0.1.0.tar.gz:
Publisher:
publish.yml on Gilvir/ere
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ere-0.1.0.tar.gz -
Subject digest:
defb1343696eec9db163b635a5de9672abae2bbf8e05cdd0f2877605d4934a2d - Sigstore transparency entry: 1547095655
- Sigstore integration time:
-
Permalink:
Gilvir/ere@e781b339015437de7a336cb796d9c1c9f0a02740 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/Gilvir
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@e781b339015437de7a336cb796d9c1c9f0a02740 -
Trigger Event:
release
-
Statement type:
File details
Details for the file ere-0.1.0-py3-none-any.whl.
File metadata
- Download URL: ere-0.1.0-py3-none-any.whl
- Upload date:
- Size: 15.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b8373ade84fd08ca3027aebf1a34762f653bd6b89a2442c197b494a7eefa7dae
|
|
| MD5 |
1637b264a864b32a7af478d1b8b23292
|
|
| BLAKE2b-256 |
1ed487a40d56e61b9596b6bb0ced08dd568389d2989888f7e0f5d66312173d89
|
Provenance
The following attestation bundles were made for ere-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on Gilvir/ere
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ere-0.1.0-py3-none-any.whl -
Subject digest:
b8373ade84fd08ca3027aebf1a34762f653bd6b89a2442c197b494a7eefa7dae - Sigstore transparency entry: 1547095672
- Sigstore integration time:
-
Permalink:
Gilvir/ere@e781b339015437de7a336cb796d9c1c9f0a02740 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/Gilvir
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@e781b339015437de7a336cb796d9c1c9f0a02740 -
Trigger Event:
release
-
Statement type: