Public-slice harness for the CONJURE transformative-creativity benchmark.
Project description
conjure-eval
Public-slice harness for the CONJURE transformative-creativity benchmark.
Ships the 393-instance public split (70 percent of the 560-instance Phase 4.8
frozen corpus: 510 closed-problem instances across 17 Lakatos families plus
the 50-instance C4-OPEN axis of formalised open mathematical conjectures,
SHA-256 a8c9842ea4d59072802689603b1e38c679fd1695194aa1cf73f81c076903daf6)
so frontier-model developers can self-evaluate locally before submitting to
the hidden split.
This package contains:
- The frozen public-slice corpus JSON (
conjure_eval.data.public_corpus). - A CLI for inspecting the corpus, driving a model pass, and checking submission files before they are sent to the hidden-split adjudicator.
- The deterministic split provenance, so any third party can re-derive the public/hidden split byte-for-byte from the source corpus.
What this package is and isn't
conjure-eval is a self-service developer convenience: it lets a model team
inspect the public contracts, run their model against the public slice, and
smoke-test their submission format before sending results to the benchmark
author. It does not ship the hidden split, and it does not run the
kernel-verified tight-mode adjudicator that produces the headline accept rate.
Those live in the private blanc repository and are operated by the benchmark
author against frozen model snapshots; the headline number reported in the
brief is the hidden-split rate.
Install
pip install conjure-eval
Usage
# List all 393 public-slice instance IDs
conjure-eval list-public
# Inspect a single instance
conjure-eval show C1-bv-001
# Drive a model pass (OpenAI-compatible endpoint)
conjure-eval run \
--base-url https://your-endpoint/v1 \
--api-key-env MY_API_KEY \
--model your-model-name \
--out submissions.jsonl
# Check submission file well-formedness before sending
conjure-eval verify-submission submissions.jsonl
# Print corpus provenance fields
conjure-eval provenance
Provenance
The public corpus is a deterministic 70/30 axis-stratified slice of the
560-instance Phase 4.8 frozen corpus maintained in the private blanc
repository (510 closed-problem instances + 50 C4-OPEN instances). Seed:
4317. Anyone with the source corpus can reproduce both slices via
scripts/build_conjure_split.py. Every C4-OPEN instance carries a
snapshot-pinned open-status certificate generated by
scripts/certify_open_status.py; certificate JSONs live under
instances/open_status_certificates/.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file conjure_eval-0.3.1.tar.gz.
File metadata
- Download URL: conjure_eval-0.3.1.tar.gz
- Upload date:
- Size: 58.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6bf9cdbbb6550067abb777c340d371f4f73128310513670835ef8203e43b1090
|
|
| MD5 |
608e3fc8d1fd0081c51b67a162eaf6aa
|
|
| BLAKE2b-256 |
7db8b3623f93bf235ed1c08dd7cbcf42562fbc8fdc714517d719cdf6a8263f0b
|
File details
Details for the file conjure_eval-0.3.1-py3-none-any.whl.
File metadata
- Download URL: conjure_eval-0.3.1-py3-none-any.whl
- Upload date:
- Size: 58.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2eb895b3e4023516db083367fb2cb996ae4d6d29ecbfee31e4baab657faa4e76
|
|
| MD5 |
79e369f99d1f6f390ebc1c0fff1bf1d1
|
|
| BLAKE2b-256 |
1792137fed2f772d5e4ab4e5548cbbb6bc2b00fd846ec5e7f8e46aac8e8ba8ac
|