Skip to main content

Point-in-time data operations for Polars — prevent look-ahead bias in quantitative research

Project description

ere

CI PyPI Python

Point-in-time data operations for Polars. Prevent look-ahead bias in quantitative research and backtesting.

The problem

When you join quarterly earnings onto daily prices using a naive merge, every row gets the final value, including restated figures that weren't published yet. Your backtest sees the future and the results are meaningless.

Standard tools have no concept of when a data point became known. ere fixes this by tracking two dates per row:

  • ref_date: when the event occurred (e.g. fiscal quarter end)
  • knowledge_date — the date you actually had it (e.g. SEC filing date)

After any as_of() call, every row satisfies knowledge_date <= query_date. No future information leaks.

Install

pip install ere

Requires Python >= 3.12 and Polars >= 1.0.

Quick start

from datetime import date
import polars as pl
import ere

PRICES = ere.TemporalSpec(ref_date="date", entity="ticker")
EARNINGS = ere.TemporalSpec(
    ref_date="fiscal_quarter_end",
    knowledge_date="filing_date",
    entity="ticker",
)

ere.validate(prices_df, spec=PRICES)
ere.validate(earnings_df, spec=EARNINGS)

# What did we know on Feb 1?
snap = ere.as_of(earnings_df, query_date="2025-02-01", spec=EARNINGS)
# AAPL shows original filing (1.50), not the restatement filed Feb 20 (1.42)

snap_later = ere.as_of(earnings_df, query_date="2025-03-01", spec=EARNINGS)
# now AAPL shows 1.42, the restated value

# Multi-source snapshot at one date
combined = ere.as_of(
    [(prices_df, PRICES), (earnings_df, EARNINGS)],
    query_date="2025-02-15",
)

# Multi-date snapshots for backtesting
rebalance_dates = [date(2025, 1, 31), date(2025, 2, 15), date(2025, 2, 28)]
snapshots = ere.panel(prices_df, spec=PRICES, dates=rebalance_dates, lookback=252)

# Post-hoc audit
ere.audit(earn_snap, as_of_date="2025-02-15", knowledge_date_col="filing_date")

API

Function Purpose
as_of(df, query_date, spec) * Point-in-time snapshot, filtered to what was known at query_date
panel(df, spec, dates) * Multi-date snapshots; returns {date: DataFrame}
panel_iter(df, spec, dates) * Iterator over panel(), yielding (date, DataFrame)
panel_lazy(df, spec, dates) * Fully lazy version. Single inequality join under the hood, tagged by query date
panel_map(df, spec, dates, fn) * Apply a Python function to each snapshot (for logic that doesn't fit as Polars expressions)
validate(df, spec) Check temporal structure (columns, dtypes, no nulls, no time travel)
audit(df, as_of_date) Assert no look-ahead leakage in a result frame
tag_knowledge_date(df, ...) Add a knowledge_date column from a fixed lag
deduplicate(df, spec) Remove duplicate versions
align(sources, date) Per-source PIT snapshots at one date

* Also accepts a list/dict of (frame, spec) sources; see Multi-source snapshots below.

Key concepts

TemporalSpec binds your column names to temporal roles. Define it once per dataset:

SPEC = ere.TemporalSpec(
    ref_date="fiscal_quarter_end",  # when the event happened
    knowledge_date="filing_date",   # when it became known
    entity="ticker",                # optional grouping key
)

Restatements are handled automatically. If a data point is published multiple times (same entity + ref_date, different knowledge_dates), as_of() returns the latest version that was known at the query date.

Multi-source snapshots. Pass a list (or dict) of (frame, spec) pairs anywhere a single (frame, spec) goes. Each source is PIT-aligned independently, then asof-joined into one frame:

sources = [(prices_df, PRICES), (earnings_df, EARNINGS)]

snapshot = ere.as_of(sources, "2025-02-15")            # joined point-in-time view
panels   = ere.panel(sources, dates=rebalance_dates)   # multi-date snapshots

# Entity-less specs (no `entity=`) broadcast across entities automatically:
MACRO = ere.TemporalSpec(ref_date="release_period", knowledge_date="release_date")
with_macro = ere.as_of([*sources, (macro_df, MACRO)], "2025-02-15")

# Want them separate instead of merged? Use align:
per_source = ere.align(sources, "2025-02-15")          # -> [prices_snap, earn_snap]

Examples

Runnable end-to-end scripts live in examples/:

  • quickstart.py — end-to-end backtest walkthrough: restatements, naive-join leakage, multi-date rebalancing.
  • multi_source.py — list/dict sources, entity-less broadcasting, panel/panel_iter/align.
  • panel_lazy.py — benchmarks as_of() loop vs panel_lazy/panel_map on second-granularity orderbook data.

Run any of them with uv run python examples/<name>.py.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ere-0.3.0.tar.gz (12.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ere-0.3.0-py3-none-any.whl (17.5 kB view details)

Uploaded Python 3

File details

Details for the file ere-0.3.0.tar.gz.

File metadata

  • Download URL: ere-0.3.0.tar.gz
  • Upload date:
  • Size: 12.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ere-0.3.0.tar.gz
Algorithm Hash digest
SHA256 cf0dbdd58d0409858d1bc405a13b31296a3682ee955da857a93783ad1de8c8c8
MD5 be559541e8c1f5bdf830205660f6ee32
BLAKE2b-256 18c811318bbf0afed027a5d335f83a599cae041117319e192e9c462680196f46

See more details on using hashes here.

Provenance

The following attestation bundles were made for ere-0.3.0.tar.gz:

Publisher: publish.yml on Gilvir/ere

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ere-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: ere-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 17.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ere-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e1ae97d543791714f021277d09b12ecc5fc90a38e7e0fc603614429fd19e7295
MD5 3c50835a0b0f929809114f7ac422507b
BLAKE2b-256 7b5d303cd779eac4edc769f6b5c9ad9fd54e3a0a98a91fb80cc65d6aadad3da5

See more details on using hashes here.

Provenance

The following attestation bundles were made for ere-0.3.0-py3-none-any.whl:

Publisher: publish.yml on Gilvir/ere

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page