Skip to main content

Point-in-time data operations for Polars — prevent look-ahead bias in quantitative research

Project description

ere

CI PyPI Python

Point-in-time data operations for Polars. Prevent look-ahead bias in quantitative research and backtesting.

The problem

When you join quarterly earnings onto daily prices using a naive merge, every row gets the final value, including restated figures that weren't published yet. Your backtest sees the future and the results are meaningless.

Standard tools have no concept of when a data point became known. ere fixes this by tracking two dates per row:

  • ref_date: when the event occurred (e.g. fiscal quarter end)
  • knowledge_date — the date you actually had it (e.g. SEC filing date)

After any as_of() call, every row satisfies knowledge_date <= query_date. No future information leaks.

Install

pip install ere

Requires Python >= 3.12 and Polars >= 1.0.

Quick start

from datetime import date
import polars as pl
import ere

PRICES = ere.TemporalSpec(ref_date="date", knowledge_date="date", entity="ticker")
EARNINGS = ere.TemporalSpec(
    ref_date="fiscal_quarter_end",
    knowledge_date="filing_date",
    entity="ticker",
)

ere.validate(prices_df, spec=PRICES)
ere.validate(earnings_df, spec=EARNINGS)

# What did we know on Feb 1?
snap = ere.as_of(earnings_df, query_date="2025-02-01", spec=EARNINGS)
# AAPL shows original filing (1.50), not the restatement filed Feb 20 (1.42)

snap_later = ere.as_of(earnings_df, query_date="2025-03-01", spec=EARNINGS)
# now AAPL shows 1.42, the restated value

# Multi-source snapshot at one date
combined = ere.as_of(
    [(prices_df, PRICES), (earnings_df, EARNINGS)],
    query_date="2025-02-15",
)

# Multi-date snapshots for backtesting
rebalance_dates = [date(2025, 1, 31), date(2025, 2, 15), date(2025, 2, 28)]
snapshots = ere.panel(prices_df, spec=PRICES, dates=rebalance_dates, lookback=252)

# Post-hoc audit
ere.audit(earn_snap, as_of_date="2025-02-15", knowledge_date_col="filing_date")

API

Function Purpose
as_of(df, query_date, spec) * Point-in-time snapshot, filtered to what was known at query_date
panel(df, spec, dates) * Multi-date snapshots; returns {date: DataFrame}
panel_iter(df, spec, dates) * Iterator over panel(), yielding (date, DataFrame)
panel_lazy(df, spec, dates) * Fully lazy version. Single inequality join under the hood, tagged by query date
panel_map(df, spec, dates, fn) * Apply a Python function to each snapshot (for logic that doesn't fit as Polars expressions)
validate(df, spec) Check temporal structure (columns, dtypes, no nulls, no time travel)
audit(df, as_of_date) Assert no look-ahead leakage in a result frame
tag_knowledge_date(df, ...) Add a knowledge_date column from a fixed lag
deduplicate(df, spec) Remove duplicate versions
align(sources, date) Per-source PIT snapshots at one date

* Also accepts a list/dict of (frame, spec) sources; see Multi-source snapshots below.

Key concepts

TemporalSpec binds your column names to temporal roles. Define it once per dataset:

SPEC = ere.TemporalSpec(
    ref_date="fiscal_quarter_end",  # when the event happened
    knowledge_date="filing_date",   # when it became known
    entity="ticker",                # optional grouping key
)

Restatements are handled automatically. If a data point is published multiple times (same entity + ref_date, different knowledge_dates), as_of() returns the latest version that was known at the query date.

Multi-source snapshots. Pass a list (or dict) of (frame, spec) pairs anywhere a single (frame, spec) goes. Each source is PIT-aligned independently, then asof-joined into one frame:

sources = [(prices_df, PRICES), (earnings_df, EARNINGS)]

snapshot = ere.as_of(sources, "2025-02-15")            # joined point-in-time view
panels   = ere.panel(sources, dates=rebalance_dates)   # multi-date snapshots

# Entity-less specs (no `entity=`) broadcast across entities automatically:
MACRO = ere.TemporalSpec(ref_date="release_period", knowledge_date="release_date")
with_macro = ere.as_of([*sources, (macro_df, MACRO)], "2025-02-15")

# Want them separate instead of merged? Use align:
per_source = ere.align(sources, "2025-02-15")          # -> [prices_snap, earn_snap]

Examples

Runnable end-to-end scripts live in examples/:

  • quickstart.py — end-to-end backtest walkthrough: restatements, naive-join leakage, multi-date rebalancing.
  • multi_source.py — list/dict sources, entity-less broadcasting, panel/panel_iter/align.
  • panel_lazy.py — benchmarks as_of() loop vs panel_lazy/panel_map on second-granularity orderbook data.

Run any of them with uv run python examples/<name>.py.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ere-0.2.0.tar.gz (11.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ere-0.2.0-py3-none-any.whl (17.2 kB view details)

Uploaded Python 3

File details

Details for the file ere-0.2.0.tar.gz.

File metadata

  • Download URL: ere-0.2.0.tar.gz
  • Upload date:
  • Size: 11.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ere-0.2.0.tar.gz
Algorithm Hash digest
SHA256 263ba1d02ae3e3a0f50209d1d50ef05950896bdd149158530ebbd8c425d69c34
MD5 d966570f9bc02bc9f2acbd246d2bffea
BLAKE2b-256 17f839ca45d27e426bf09683e0876c6f5a296672d96cec70d1f7c224fb583afd

See more details on using hashes here.

Provenance

The following attestation bundles were made for ere-0.2.0.tar.gz:

Publisher: publish.yml on Gilvir/ere

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ere-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: ere-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 17.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ere-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3549b429e484c4435eb11e5ce8852ccb4d035e59c9205800f21b68b1ea25f893
MD5 dc783debedba18872d3efd486d18d701
BLAKE2b-256 e2af25627204d302fbf65b4dafa102a163b1609046eb6423bc9a329c283de7ce

See more details on using hashes here.

Provenance

The following attestation bundles were made for ere-0.2.0-py3-none-any.whl:

Publisher: publish.yml on Gilvir/ere

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page