Skip to main content

Audit the capability gap between frontier AI models and the models tested in academic papers.

Project description

frontierlag

Audit the capability gap between frontier AI and the models tested in academic papers.

Paste a DOI. Get a report: what model the paper tested, where it sat relative to the frontier at evaluation date, what configuration the paper disclosed, and whether the paper fails all three audit dimensions at the pre-registered thresholds from the companion study.

$ pip install frontierlag
$ frontierlag check 10.1038/s41591-024-03425-5

This package is a companion to the paper Frontier Lag: A Bibliometric Audit of Capability Misrepresentation in Academic AI Evaluation (Gringras, 2026; arXiv:TBD). The audit dataset embedded here is the frozen snapshot used in that paper; quarterly refreshes are shipped as point releases.


What it does

frontierlag measures three dimensions of the gap between what published AI evaluations test and what the frontier can do at the same moment:

Dimension What it captures
Capability gap ECI points and calendar months between the tested model and the frontier at evaluation date.
Tier gap Number of same-family siblings with higher ECI that were already available at evaluation date.
Configuration Fraction of reasoning mode, tools, scaffolding, and sampling items the paper discloses (items match the VERSIO-AI v1 checklist).

A paper that fails all three at the pre-registered thresholds is flagged as a compound failure. See frontierlag/config.yaml for the thresholds (they mirror the paper's pre-registration).

The package does not estimate counterfactual capability — it does not claim "the paper's conclusion would have been X if they had used Y." That move is absent from the companion paper by design, and it is absent here too.


Quick start

import frontierlag as fl

# By DOI (hits the frozen corpus if the paper is in the audit; otherwise
# resolves publication date via CrossRef and leaves you to supply the model).
report = fl.check("10.1038/s41591-024-03425-5")
print(report.to_text())

# Override / supply fields for a paper not in the frozen corpus.
report = fl.check(
    "10.1000/your-doi",
    primary_model="GPT-4",
    evaluation_date="2024-06-01",
    configuration_disclosures={
        "model_version_exact": True,
        "access_date": True,
        "reasoning_mode": None,  # not applicable to GPT-4
        "tool_use": False,
        # ... other items default to "not reported"
    },
)

# Or audit already-extracted metadata directly.
from frontierlag import audit, PaperMetadata
m = PaperMetadata(primary_model="GPT-3.5", publication_date="2024-07-01")
print(fl.audit(m).to_text())

# Individual lookups.
fl.lookup_model("claude-3.5-sonnet")          # → ModelRecord
fl.get_frontier_at_date("2025-06-01")          # → FrontierSnapshot
fl.list_known_models()                         # → list[str]

CLI

frontierlag check <DOI>               audit a paper
frontierlag lookup <MODEL>            single-model metadata
frontierlag frontier <YYYY-MM-DD>     frontier at a date
frontierlag models                    list known canonical names
frontierlag info                      version + data-freeze date

Every command accepts --json for machine-readable output. frontierlag check accepts --model, --eval-date, and --config-file to override or supply fields a paper does not otherwise provide.


Example output

$ frontierlag check 10.1038/s41746-023-00961-1 --model GPT-4 --eval-date 2023-03-20
frontierlag audit (data freeze: 2026-04-01)
========================================================================
Paper:  ChatGPT performance on USMLE-style medical examinations
DOI:    10.1038/s41746-023-00961-1
Evaluation date: 2023-03-20

Primary model tested
  input  : 'GPT-4' → canonical: GPT-4 (Mar 2023)
  release: 2023-03-15     ECI: +126.2

Frontier at evaluation date
  GPT-4 (Mar 2023) (released 2023-03-15, ECI +126.2)

Audit dimensions
  Capability gap : +0.0 ECI pts   (+0 months)
  Tier gap       : 0 stronger same-family sibling(s) available
  Configuration  : —  of applicable items disclosed

Compound failure: undetermined (insufficient structured metadata).

(A fully-extracted audit with configuration disclosures returns a clean PASS/FAIL verdict.)


Data freeze

The embedded dataset is frozen at FREEZE_DATE = 2026-04-01. Every report prints this at the top so readers know how stale the comparison is. Quarterly updates ship as frontierlag >= 1.0.X; a banner on the static site tracks the current freeze.

File Source
data/eci_scores.csv Epoch AI Capabilities Index snapshot (Epoch AI, 2026)
data/monthly_frontier_trajectory.csv Derived from ECI + model release dates
data/model_version_lookup.json Maintainer-curated, cross-checked against Epoch AI model tracker
data/frozen_audit.json The companion paper's extracted audit (empty until production extraction completes)

All dataset files are plain text and diffable; the freeze history is visible in git log.


Install

pip install frontierlag

From source:

git clone https://github.com/davidgringras/frontierlag.git
cd frontierlag
pip install -e '.[test]'
pytest

Requires Python ≥ 3.9. Runtime dependencies are requests and pyyaml; no heavy scientific stack.


Citation

@misc{gringras2026frontierlag,
  author       = {Gringras, David},
  title        = {Frontier Lag: A Bibliometric Audit of Capability Misrepresentation in Academic {AI} Evaluation},
  year         = {2026},
  eprint       = {TBD},
  archivePrefix= {arXiv},
  primaryClass = {cs.AI},
  note         = {Companion package: \url{https://github.com/davidgringras/frontierlag}}
}

Contributing

Two things the package needs from the community and will welcome pull requests for:

  1. Model aliases. Every paper spells model names differently. config.yaml::aliases is the single file to extend. PRs that add an alias mapping without touching code are the fastest path to review.
  2. Frontier trajectory updates. When a new model ships, add a row to data/monthly_frontier_trajectory.csv and bump _version.py::FREEZE_DATE. The package has a quarterly release cadence; out-of-cycle PRs are welcome for newly-released frontier models.

Code changes should include tests and run pytest. See tests/ for conventions.

License

MIT. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

frontierlag-0.1.0.tar.gz (36.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

frontierlag-0.1.0-py3-none-any.whl (34.7 kB view details)

Uploaded Python 3

File details

Details for the file frontierlag-0.1.0.tar.gz.

File metadata

  • Download URL: frontierlag-0.1.0.tar.gz
  • Upload date:
  • Size: 36.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for frontierlag-0.1.0.tar.gz
Algorithm Hash digest
SHA256 699e80ff877dc540d03b62f63af949b40883a1143589fb92dcb68448f8290fc1
MD5 0134b6e5bd3844c0547136cd4d601b06
BLAKE2b-256 f2e0733db2caf582776e68c93711100b7db3f0f04a1c13992041f01f8c80749b

See more details on using hashes here.

File details

Details for the file frontierlag-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: frontierlag-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 34.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for frontierlag-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 37ecc8e45596fcc8f1ab27bfbe1344f051f5b618c56689f495bbab368c9c00d3
MD5 63c7edf708f0a70e098ccd2fcab93999
BLAKE2b-256 28a542314988e64f7c86ec9e754887ade1808664bdbf44bf6632150de44c8073

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page