Skip to main content

Public-slice harness for the CONJURE transformative-creativity benchmark.

Project description

conjure-eval

Public-slice harness for the CONJURE transformative-creativity benchmark. Ships the 358-instance public split (70 percent of the 510-instance Phase 4.6 frozen corpus across 17 Lakatos families, SHA-256 33e9daebbfc1382b08c4b518f6bc9b30e62c13cc9d7e178327675929ebd74cc9) so frontier-model developers can self-evaluate locally before submitting to the hidden split.

This package contains:

  • The frozen public-slice corpus JSON (conjure_eval.data.public_corpus).
  • A CLI for inspecting the corpus, driving a model pass, and checking submission files before they are sent to the hidden-split adjudicator.
  • The deterministic split provenance, so any third party can re-derive the public/hidden split byte-for-byte from the source corpus.

What this package is and isn't

conjure-eval is a self-service developer convenience: it lets a model team inspect the public contracts, run their model against the public slice, and smoke-test their submission format before sending results to the benchmark author. It does not ship the hidden split, and it does not run the kernel-verified tight-mode adjudicator that produces the headline accept rate. Those live in the private blanc repository and are operated by the benchmark author against frozen model snapshots; the headline number reported in the brief is the hidden-split rate.

Install

pip install conjure-eval

Usage

# List all 358 public-slice instance IDs
conjure-eval list-public

# Inspect a single instance
conjure-eval show C1-bv-001

# Drive a model pass (OpenAI-compatible endpoint)
conjure-eval run \
    --base-url https://your-endpoint/v1 \
    --api-key-env MY_API_KEY \
    --model your-model-name \
    --out submissions.jsonl

# Check submission file well-formedness before sending
conjure-eval verify-submission submissions.jsonl

# Print corpus provenance fields
conjure-eval provenance

Provenance

The public corpus is a deterministic 70/30 axis-stratified slice of the 510-instance Phase 4.6 frozen corpus maintained in the private blanc repository. Seed: 4317. Anyone with the source corpus can reproduce both slices via scripts/build_conjure_split.py.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

conjure_eval-0.2.0.tar.gz (51.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

conjure_eval-0.2.0-py3-none-any.whl (51.0 kB view details)

Uploaded Python 3

File details

Details for the file conjure_eval-0.2.0.tar.gz.

File metadata

  • Download URL: conjure_eval-0.2.0.tar.gz
  • Upload date:
  • Size: 51.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for conjure_eval-0.2.0.tar.gz
Algorithm Hash digest
SHA256 31bec9ac699218486d3c95e3457eb0fd11b3aab5c496c665e866436d6496c8e2
MD5 f40685cc236bea9ca73f72f0fd12969c
BLAKE2b-256 65f19d0d1dbadc5f5fc867c338a84296604bbb9e87d7272b9ace7adb726eef53

See more details on using hashes here.

File details

Details for the file conjure_eval-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: conjure_eval-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 51.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for conjure_eval-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 363ba65f92a8196375ff46484663b8f21c69b9be0949ffc60b469e46a7a3e024
MD5 ecaf6396ec5516521a1f99225dc324db
BLAKE2b-256 24f4ffb5ea3d90a91051d55fce09cd99e12aff6b7dc433e0a2c34a2cb68d9245

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page