Audit the capability gap between frontier AI models and the models tested in academic papers.

These details have not been verified by PyPI

Project links

Project description

frontierlag

Audit the capability gap between frontier AI and the models tested in academic papers.

Paste a DOI. Get a report: what model the paper tested, where it sat relative to the frontier at evaluation date, what configuration the paper disclosed, and whether the paper fails all three audit dimensions at the pre-registered thresholds from the companion study.

$ pip install frontierlag
$ frontierlag check 10.1038/s41591-024-03425-5

This package is the software companion to Frontier Lag: A Bibliometric Audit of Capability Misrepresentation in Academic AI Evaluation (Gringras and Salahshoor, 2026). The audit dataset embedded here is the frozen snapshot used in that paper; updates ship as point releases.

What it does

frontierlag classifies published AI-capability evaluations against the pre-registered audit dimensions from Gringras (2026). Three primary dimensions (the H5 compound-failure outcome), one secondary magnitude (capability-elicitation shortfall), and one tertiary transparency vector (temporal, tier, elicitation):

Dimension	What it captures
Capability failure	`eci_gap ≥ 12 ECI`, anchored to the mean observed within-family major-generation jump on the frozen April-2026 Epoch snapshot.
Elicitation failure	OR-of-three: reasoning-mode undisclosed for a reasoning-capable model, OR tool-use undisclosed for a tool-capable model, OR scaffolding undisclosed where a scaffolded baseline existed at evaluation date. AND-of-three reported alongside as a strict-conjunction sensitivity.
Interpretive failure	AND-of-two (pre-registered primary): no human comparator AND `conclusion_framing = ai_generic`. OR-of-two reported alongside as the inclusive sensitivity. Admissibility filter: tasks with machine-verifiable references (oracle code tests, MATH, exact-match QA) have the comparator signal suppressed.

A paper flagged on all three at the pre-registered thresholds is a compound failure (pre-reg §2.2 H5).

The package also returns:

capability_elicitation_shortfall — the secondary magnitude eci_gap × (1 - config_elicitation_index), capturing the interaction between capability distance and configuration under-disclosure.
Three-component vector (temporal_gap_months, tier_gap_count, elicitation_gap_fraction) — readers do their own weighting.

The package does not estimate counterfactual capability; it does not claim "the paper's conclusion would have been X if they had used Y." Descriptive, not normative: the audit documents structural lag, it does not rank authors or score papers as "bad research."

Quick start

import frontierlag as fl

# By DOI (hits the frozen corpus if the paper is in the audit; otherwise
# resolves publication date via CrossRef and leaves you to supply the model).
report = fl.check("10.1038/s41591-024-03425-5")
print(report.to_text())

# Override / supply fields for a paper not in the frozen corpus.
report = fl.check(
    "10.1000/your-doi",
    primary_model="GPT-4",
    evaluation_date="2024-06-01",
    configuration_disclosures={
        "model_version_exact": True,
        "access_date": True,
        "reasoning_mode": None,
        "tool_use": False,
    },
)

# Audit already-extracted metadata.
from frontierlag import audit, PaperMetadata
m = PaperMetadata(
    primary_model="GPT-3.5",
    publication_date="2025-07-01",
    evaluation_date="2025-05-01",
    configuration_disclosures={"reasoning_mode": False, "tool_use": False},
    human_comparator_present=False,
    conclusion_framing="ai_generic",
    task_admissibility="expected",
    domain="medicine",
)
report = audit(m)  # default: AND-of-two pre-registered primary
print(report.compound_failure)                 # pre-registered binary
print(report.capability_elicitation_shortfall) # secondary magnitude
print((report.temporal_gap_months, report.tier_gap_count, report.elicitation_gap_fraction))

# Provenance for false-positive diagnosis.
diag = audit(m, return_provenance=True).provenance
print(diag["classifications"]["compound_failure_prereg"])
print(diag["inputs"])

# Individual lookups.
fl.lookup_model("claude-3.5-sonnet")
fl.get_frontier_at_date("2025-06-01")
fl.list_known_models()

CLI

frontierlag check <DOI>               audit a paper
frontierlag lookup <MODEL>            single-model metadata
frontierlag frontier <YYYY-MM-DD>     frontier at a date
frontierlag models                    list known canonical names
frontierlag info                      version + data-freeze date

Every command accepts --json for machine-readable output. frontierlag check accepts --model, --eval-date, and --config-file to override or supply fields a paper does not otherwise provide.

Data freeze

The embedded dataset is frozen at FREEZE_DATE = 2026-04-01. Every report prints this at the top so readers know how stale the comparison is. Updates ship as point releases.

File	Source
`data/eci_scores.csv`	Epoch AI Capabilities Index snapshot (Epoch AI, 2026)
`data/monthly_frontier_trajectory.csv`	Derived from ECI + model release dates
`data/model_version_lookup.json`	Maintainer-curated, cross-checked against Epoch AI model tracker
`data/frozen_audit.json`	Audit-dataset DOI lookup index

Install

pip install frontierlag

Requires Python ≥ 3.9. Runtime dependencies are requests and pyyaml; no heavy scientific stack.

Companion artefacts

Empirical audit paper — Frontier Lag (Gringras and Salahshoor, 2026).
Reporting checklist — VERSIO-AI v1.2.
Pre-registration — Open Science Framework, 10.17605/OSF.IO/7XM3D.
Live web tool — https://frontierlag.org.

Citation

@software{gringras2026frontierlag,
  author  = {Gringras, David and Salahshoor, Misha},
  title   = {frontierlag: A {Python} package for auditing the capability gap of published {AI} evaluations},
  year    = {2026},
  version = {1.0.0},
  url     = {https://frontierlag.org}
}

License

MIT. See LICENSE.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.0.0

May 7, 2026

0.1.0

Apr 16, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

frontierlag-1.0.0.tar.gz (2.2 MB view details)

Uploaded May 7, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

frontierlag-1.0.0-py3-none-any.whl (2.3 MB view details)

Uploaded May 7, 2026 Python 3

File details

Details for the file frontierlag-1.0.0.tar.gz.

File metadata

Download URL: frontierlag-1.0.0.tar.gz
Upload date: May 7, 2026
Size: 2.2 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for frontierlag-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`d278987f26f8aba16ad8eb21107dde9baadabbf67db06bd342dfaf4422e5fbdb`
MD5	`46c4e2b0ed0b44a343f4169553cc3e4f`
BLAKE2b-256	`b3737f158e733bf3c130c601f5ecf4ac6e626a021104c97bd59dd591cdb83c6b`

See more details on using hashes here.

File details

Details for the file frontierlag-1.0.0-py3-none-any.whl.

File metadata

Download URL: frontierlag-1.0.0-py3-none-any.whl
Upload date: May 7, 2026
Size: 2.3 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for frontierlag-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`dbcb30480d6896f2c321af99a7d9524871891e28bd7ad5efe71762ab422bf109`
MD5	`dce434a0c34d5c9f1bde90132ee9ff20`
BLAKE2b-256	`56a320bc8a783a47ee8e5b4f5f2b468fe0e317031df0293dda02d7af0302e0bb`

See more details on using hashes here.

frontierlag 1.0.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

frontierlag

What it does

Quick start

CLI

Data freeze

Install

Companion artefacts

Citation

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes