Skip to main content

Public-slice harness for the CONJURE transformative-creativity benchmark.

Project description

conjure-eval

Public-slice harness for the CONJURE transformative-creativity benchmark. Ships the 393-instance public split (70 percent of the 560-instance Phase 4.8 frozen corpus: 510 closed-problem instances across 17 Lakatos families plus the 50-instance C4-OPEN axis of formalised open mathematical conjectures, SHA-256 c1f32624c1a698ef30d3c0a7151d69d4706c413dc910abb9f0812434cfa625c7) so frontier-model developers can self-evaluate locally before submitting to the hidden split.

This package contains:

  • The frozen public-slice corpus JSON (conjure_eval.data.public_corpus).
  • A CLI for inspecting the corpus, driving a model pass, and checking submission files before they are sent to the hidden-split adjudicator.
  • The deterministic split provenance, so any third party can re-derive the public/hidden split byte-for-byte from the source corpus.

What this package is and isn't

conjure-eval is a self-service developer convenience: it lets a model team inspect the public contracts, run their model against the public slice, and smoke-test their submission format before sending results to the benchmark author. It does not ship the hidden split, and it does not run the kernel-verified tight-mode adjudicator that produces the headline accept rate. Those live in the private blanc repository and are operated by the benchmark author against frozen model snapshots; the headline number reported in the brief is the hidden-split rate.

Install

pip install conjure-eval

Usage

# List all 393 public-slice instance IDs
conjure-eval list-public

# Inspect a single instance
conjure-eval show C1-bv-001

# Drive a model pass (OpenAI-compatible endpoint)
conjure-eval run \
    --base-url https://your-endpoint/v1 \
    --api-key-env MY_API_KEY \
    --model your-model-name \
    --out submissions.jsonl

# Check submission file well-formedness before sending
conjure-eval verify-submission submissions.jsonl

# Print corpus provenance fields
conjure-eval provenance

Provenance

The public corpus is a deterministic 70/30 axis-stratified slice of the 560-instance Phase 4.8 frozen corpus maintained in the private blanc repository (510 closed-problem instances + 50 C4-OPEN instances). Seed: 4317. Anyone with the source corpus can reproduce both slices via scripts/build_conjure_split.py. Every C4-OPEN instance carries a snapshot-pinned open-status certificate generated by scripts/certify_open_status.py; certificate JSONs live under instances/open_status_certificates/.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

conjure_eval-0.3.0.tar.gz (57.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

conjure_eval-0.3.0-py3-none-any.whl (57.8 kB view details)

Uploaded Python 3

File details

Details for the file conjure_eval-0.3.0.tar.gz.

File metadata

  • Download URL: conjure_eval-0.3.0.tar.gz
  • Upload date:
  • Size: 57.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for conjure_eval-0.3.0.tar.gz
Algorithm Hash digest
SHA256 a6dbbb41ae34ec5bcb5b951ff7ac75bf7849c467f30efe81f24caacb7b470178
MD5 cd021f4678f13f10e260503ff521c499
BLAKE2b-256 96a81b4b453adaca7d4c4aae5f29910c8e5b7f3699b17842b5452c270c2314fd

See more details on using hashes here.

File details

Details for the file conjure_eval-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: conjure_eval-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 57.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for conjure_eval-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b71d97945d4c16bb7b4ceeecba0da864700b9aca5c462aee6553544571e08d96
MD5 535e007f3ad251f9b41f6b653347e89b
BLAKE2b-256 1dfa12f47f8776effed441a8e47f0fcf1f85507a8be8b0a3a493ae517eb39109

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page