Skip to main content

Point-in-time data operations for Polars — prevent look-ahead bias in quantitative research

Project description

ere

CI PyPI Python

Point-in-time data operations for Polars. Prevent look-ahead bias in quantitative research and backtesting.

The problem

When you join quarterly earnings onto daily prices using a naive merge, every row gets the final value, including restated figures that weren't published yet. Your backtest sees the future and the results are meaningless.

Standard tools have no concept of when a data point became known. ere fixes this by tracking two dates per row:

  • ref_date: when the event occurred (e.g. fiscal quarter end)
  • knowledge_date — the date you actually had it (e.g. SEC filing date)

After any as_of() call, every row satisfies knowledge_date <= query_date. No future information leaks.

Install

pip install ere

Requires Python >= 3.12 and Polars >= 1.0.

Quick start

from datetime import date
import polars as pl
import ere

PRICES = ere.TemporalSpec(ref_date="date", knowledge_date="date", entity="ticker")
EARNINGS = ere.TemporalSpec(
    ref_date="fiscal_quarter_end",
    knowledge_date="filing_date",
    entity="ticker",
)

ere.validate(prices_df, spec=PRICES)
ere.validate(earnings_df, spec=EARNINGS)

# What did we know on Feb 1?
snap = ere.as_of(earnings_df, query_date="2025-02-01", spec=EARNINGS)
# AAPL shows original filing (1.50), not the restatement filed Feb 20 (1.42)

snap_later = ere.as_of(earnings_df, query_date="2025-03-01", spec=EARNINGS)
# now AAPL shows 1.42, the restated value

# Multi-source snapshot at one date
combined = ere.as_of(
    [(prices_df, PRICES), (earnings_df, EARNINGS)],
    query_date="2025-02-15",
)

# Multi-date snapshots for backtesting
rebalance_dates = [date(2025, 1, 31), date(2025, 2, 15), date(2025, 2, 28)]
snapshots = ere.panel(prices_df, spec=PRICES, dates=rebalance_dates, lookback=252)

# Post-hoc audit
ere.audit(earn_snap, as_of_date="2025-02-15", knowledge_date_col="filing_date")

API

Function Purpose
as_of(df, query_date, spec) * Point-in-time snapshot, filtered to what was known at query_date
panel(df, spec, dates) * Multi-date snapshots; returns {date: DataFrame}
panel_iter(df, spec, dates) * Iterator over panel(), yielding (date, DataFrame)
panel_lazy(df, spec, dates) * Fully lazy version. Single inequality join under the hood, tagged by query date
panel_map(df, spec, dates, fn) * Apply a Python function to each snapshot (for logic that doesn't fit as Polars expressions)
validate(df, spec) Check temporal structure (columns, dtypes, no nulls, no time travel)
audit(df, as_of_date) Assert no look-ahead leakage in a result frame
tag_knowledge_date(df, ...) Add a knowledge_date column from a fixed lag
deduplicate(df, spec) Remove duplicate versions
align(sources, date) Per-source PIT snapshots at one date

* Also accepts a list/dict of (frame, spec) sources; see Multi-source snapshots below.

Key concepts

TemporalSpec binds your column names to temporal roles. Define it once per dataset:

SPEC = ere.TemporalSpec(
    ref_date="fiscal_quarter_end",  # when the event happened
    knowledge_date="filing_date",   # when it became known
    entity="ticker",                # optional grouping key
)

Restatements are handled automatically. If a data point is published multiple times (same entity + ref_date, different knowledge_dates), as_of() returns the latest version that was known at the query date.

Multi-source snapshots. Pass a list (or dict) of (frame, spec) pairs anywhere a single (frame, spec) goes. Each source is PIT-aligned independently, then asof-joined into one frame:

sources = [(prices_df, PRICES), (earnings_df, EARNINGS)]

snapshot = ere.as_of(sources, "2025-02-15")            # joined point-in-time view
panels   = ere.panel(sources, dates=rebalance_dates)   # multi-date snapshots

# Entity-less specs (no `entity=`) broadcast across entities automatically:
MACRO = ere.TemporalSpec(ref_date="release_period", knowledge_date="release_date")
with_macro = ere.as_of([*sources, (macro_df, MACRO)], "2025-02-15")

# Want them separate instead of merged? Use align:
per_source = ere.align(sources, "2025-02-15")          # -> [prices_snap, earn_snap]

Examples

Runnable end-to-end scripts live in examples/:

  • quickstart.py — end-to-end backtest walkthrough: restatements, naive-join leakage, multi-date rebalancing.
  • multi_source.py — list/dict sources, entity-less broadcasting, panel/panel_iter/align.
  • panel_lazy.py — benchmarks as_of() loop vs panel_lazy/panel_map on second-granularity orderbook data.

Run any of them with uv run python examples/<name>.py.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ere-0.1.0.tar.gz (10.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ere-0.1.0-py3-none-any.whl (15.3 kB view details)

Uploaded Python 3

File details

Details for the file ere-0.1.0.tar.gz.

File metadata

  • Download URL: ere-0.1.0.tar.gz
  • Upload date:
  • Size: 10.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ere-0.1.0.tar.gz
Algorithm Hash digest
SHA256 defb1343696eec9db163b635a5de9672abae2bbf8e05cdd0f2877605d4934a2d
MD5 de93250310bf59df87a08e557d052e0b
BLAKE2b-256 a74ac4c93b2c3ce564c8baaf4ab2499c76f1bbfa0bb86f50084950d5ed193716

See more details on using hashes here.

Provenance

The following attestation bundles were made for ere-0.1.0.tar.gz:

Publisher: publish.yml on Gilvir/ere

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ere-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: ere-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 15.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ere-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b8373ade84fd08ca3027aebf1a34762f653bd6b89a2442c197b494a7eefa7dae
MD5 1637b264a864b32a7af478d1b8b23292
BLAKE2b-256 1ed487a40d56e61b9596b6bb0ced08dd568389d2989888f7e0f5d66312173d89

See more details on using hashes here.

Provenance

The following attestation bundles were made for ere-0.1.0-py3-none-any.whl:

Publisher: publish.yml on Gilvir/ere

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page