Audit the capability gap between frontier AI models and the models tested in academic papers.
Project description
frontierlag
Audit the capability gap between frontier AI and the models tested in academic papers.
Paste a DOI. Get a report: what model the paper tested, where it sat relative to the frontier at evaluation date, what configuration the paper disclosed, and whether the paper fails all three audit dimensions at the pre-registered thresholds from the companion study.
$ pip install frontierlag
$ frontierlag check 10.1038/s41591-024-03425-5
This package is a companion to the paper Frontier Lag: A Bibliometric Audit of Capability Misrepresentation in Academic AI Evaluation (Gringras, 2026; arXiv:TBD). The audit dataset embedded here is the frozen snapshot used in that paper; quarterly refreshes are shipped as point releases.
What it does
frontierlag measures three dimensions of the gap between what published AI evaluations test and what the frontier can do at the same moment:
| Dimension | What it captures |
|---|---|
| Capability gap | ECI points and calendar months between the tested model and the frontier at evaluation date. |
| Tier gap | Number of same-family siblings with higher ECI that were already available at evaluation date. |
| Configuration | Fraction of reasoning mode, tools, scaffolding, and sampling items the paper discloses (items match the VERSIO-AI v1 checklist). |
A paper that fails all three at the pre-registered thresholds is flagged as a compound failure. See frontierlag/config.yaml for the thresholds (they mirror the paper's pre-registration).
The package does not estimate counterfactual capability — it does not claim "the paper's conclusion would have been X if they had used Y." That move is absent from the companion paper by design, and it is absent here too.
Quick start
import frontierlag as fl
# By DOI (hits the frozen corpus if the paper is in the audit; otherwise
# resolves publication date via CrossRef and leaves you to supply the model).
report = fl.check("10.1038/s41591-024-03425-5")
print(report.to_text())
# Override / supply fields for a paper not in the frozen corpus.
report = fl.check(
"10.1000/your-doi",
primary_model="GPT-4",
evaluation_date="2024-06-01",
configuration_disclosures={
"model_version_exact": True,
"access_date": True,
"reasoning_mode": None, # not applicable to GPT-4
"tool_use": False,
# ... other items default to "not reported"
},
)
# Or audit already-extracted metadata directly.
from frontierlag import audit, PaperMetadata
m = PaperMetadata(primary_model="GPT-3.5", publication_date="2024-07-01")
print(fl.audit(m).to_text())
# Individual lookups.
fl.lookup_model("claude-3.5-sonnet") # → ModelRecord
fl.get_frontier_at_date("2025-06-01") # → FrontierSnapshot
fl.list_known_models() # → list[str]
CLI
frontierlag check <DOI> audit a paper
frontierlag lookup <MODEL> single-model metadata
frontierlag frontier <YYYY-MM-DD> frontier at a date
frontierlag models list known canonical names
frontierlag info version + data-freeze date
Every command accepts --json for machine-readable output. frontierlag check accepts --model, --eval-date, and --config-file to override or supply fields a paper does not otherwise provide.
Example output
$ frontierlag check 10.1038/s41746-023-00961-1 --model GPT-4 --eval-date 2023-03-20
frontierlag audit (data freeze: 2026-04-01)
========================================================================
Paper: ChatGPT performance on USMLE-style medical examinations
DOI: 10.1038/s41746-023-00961-1
Evaluation date: 2023-03-20
Primary model tested
input : 'GPT-4' → canonical: GPT-4 (Mar 2023)
release: 2023-03-15 ECI: +126.2
Frontier at evaluation date
GPT-4 (Mar 2023) (released 2023-03-15, ECI +126.2)
Audit dimensions
Capability gap : +0.0 ECI pts (+0 months)
Tier gap : 0 stronger same-family sibling(s) available
Configuration : — of applicable items disclosed
Compound failure: undetermined (insufficient structured metadata).
(A fully-extracted audit with configuration disclosures returns a clean PASS/FAIL verdict.)
Data freeze
The embedded dataset is frozen at FREEZE_DATE = 2026-04-01. Every report prints this at the top so readers know how stale the comparison is. Quarterly updates ship as frontierlag >= 1.0.X; a banner on the static site tracks the current freeze.
| File | Source |
|---|---|
data/eci_scores.csv |
Epoch AI Capabilities Index snapshot (Epoch AI, 2026) |
data/monthly_frontier_trajectory.csv |
Derived from ECI + model release dates |
data/model_version_lookup.json |
Maintainer-curated, cross-checked against Epoch AI model tracker |
data/frozen_audit.json |
The companion paper's extracted audit (empty until production extraction completes) |
All dataset files are plain text and diffable; the freeze history is visible in git log.
Install
pip install frontierlag
From source:
git clone https://github.com/davidgringras/frontierlag.git
cd frontierlag
pip install -e '.[test]'
pytest
Requires Python ≥ 3.9. Runtime dependencies are requests and pyyaml; no heavy scientific stack.
Citation
@misc{gringras2026frontierlag,
author = {Gringras, David},
title = {Frontier Lag: A Bibliometric Audit of Capability Misrepresentation in Academic {AI} Evaluation},
year = {2026},
eprint = {TBD},
archivePrefix= {arXiv},
primaryClass = {cs.AI},
note = {Companion package: \url{https://github.com/davidgringras/frontierlag}}
}
Contributing
Two things the package needs from the community and will welcome pull requests for:
- Model aliases. Every paper spells model names differently.
config.yaml::aliasesis the single file to extend. PRs that add an alias mapping without touching code are the fastest path to review. - Frontier trajectory updates. When a new model ships, add a row to
data/monthly_frontier_trajectory.csvand bump_version.py::FREEZE_DATE. The package has a quarterly release cadence; out-of-cycle PRs are welcome for newly-released frontier models.
Code changes should include tests and run pytest. See tests/ for conventions.
License
MIT. See LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file frontierlag-0.1.0.tar.gz.
File metadata
- Download URL: frontierlag-0.1.0.tar.gz
- Upload date:
- Size: 36.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
699e80ff877dc540d03b62f63af949b40883a1143589fb92dcb68448f8290fc1
|
|
| MD5 |
0134b6e5bd3844c0547136cd4d601b06
|
|
| BLAKE2b-256 |
f2e0733db2caf582776e68c93711100b7db3f0f04a1c13992041f01f8c80749b
|
File details
Details for the file frontierlag-0.1.0-py3-none-any.whl.
File metadata
- Download URL: frontierlag-0.1.0-py3-none-any.whl
- Upload date:
- Size: 34.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
37ecc8e45596fcc8f1ab27bfbe1344f051f5b618c56689f495bbab368c9c00d3
|
|
| MD5 |
63c7edf708f0a70e098ccd2fcab93999
|
|
| BLAKE2b-256 |
28a542314988e64f7c86ec9e754887ade1808664bdbf44bf6632150de44c8073
|