Skip to main content

Public-slice harness for the CONJURE transformative-creativity benchmark.

Project description

conjure-eval

Public-slice harness for the CONJURE transformative-creativity benchmark. Ships the 358-instance public split (70 percent of the 510-instance Phase 4.6 frozen corpus across 17 Lakatos families, SHA-256 33e9daebbfc1382b08c4b518f6bc9b30e62c13cc9d7e178327675929ebd74cc9) so frontier-model developers can self-evaluate locally before submitting to the hidden split.

This package contains:

  • The frozen public-slice corpus JSON (conjure_eval.data.public_corpus).
  • A CLI for inspecting the corpus, driving a model pass, and checking submission files before they are sent to the hidden-split adjudicator.
  • The deterministic split provenance, so any third party can re-derive the public/hidden split byte-for-byte from the source corpus.

What this package is and isn't

conjure-eval is a self-service developer convenience: it lets a model team inspect the public contracts, run their model against the public slice, and smoke-test their submission format before sending results to the benchmark author. It does not ship the hidden split, and it does not run the kernel-verified tight-mode adjudicator that produces the headline accept rate. Those live in the private blanc repository and are operated by the benchmark author against frozen model snapshots; the headline number reported in the brief is the hidden-split rate.

Install

pip install conjure-eval

Usage

# List all 358 public-slice instance IDs
conjure-eval list-public

# Inspect a single instance
conjure-eval show C1-bv-001

# Drive a model pass (OpenAI-compatible endpoint)
conjure-eval run \
    --base-url https://your-endpoint/v1 \
    --api-key-env MY_API_KEY \
    --model your-model-name \
    --out submissions.jsonl

# Check submission file well-formedness before sending
conjure-eval verify-submission submissions.jsonl

# Print corpus provenance fields
conjure-eval provenance

Provenance

The public corpus is a deterministic 70/30 axis-stratified slice of the 510-instance Phase 4.6 frozen corpus maintained in the private blanc repository. Seed: 4317. Anyone with the source corpus can reproduce both slices via scripts/build_conjure_split.py.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

conjure_eval-0.1.0.tar.gz (48.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

conjure_eval-0.1.0-py3-none-any.whl (48.6 kB view details)

Uploaded Python 3

File details

Details for the file conjure_eval-0.1.0.tar.gz.

File metadata

  • Download URL: conjure_eval-0.1.0.tar.gz
  • Upload date:
  • Size: 48.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for conjure_eval-0.1.0.tar.gz
Algorithm Hash digest
SHA256 683a382eaa41a3f6c1d35a850dfdbe107840d3126d72426e2c94d9a9f8c64416
MD5 42ab913abf654ca591e59cbfc7d1f68c
BLAKE2b-256 97545289db117c88c390e3ebb4b5ae19f29054ec022a3b4d6e3a678d36893730

See more details on using hashes here.

File details

Details for the file conjure_eval-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: conjure_eval-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 48.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for conjure_eval-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9e0d87a18645142e6bcd8bf64d4683d225b0e3a1e48098992fbe54f702e999d2
MD5 a0849b71de64f3f128683f5a49335c4a
BLAKE2b-256 28be5c20af27b9848ac168d0d32f1900fcf9da7727fcca61980d635d348bf4d3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page