Search for compact executable verifier sets for LLM outputs.

These details have not been verified by PyPI

Project links

Project description

AutoPyVerifier: Learning Compact Executable Verifiers for Large Language Model Outputs

This repository is the implementation for the paper "AutoPyVerifier: Learning Compact Executable Verifiers for Large Language Model Outputs".

alt text

AutoPyVerifier

AutoPyVerifier is a pipeline for searching over deterministic Python verifier bundles for labeled LLM outputs.

Given a development set of (query, model_output, objective) examples and a task description, the system uses an LLM to iteratively:

propose initial verifier bundles,
critique their failures,
refine them into new candidates,
execute each bundle in a restricted sandbox,
search over the DAG, and
select a compact verifier set that best balances the score, exploration, feasibility, and size.

📂 Repository structure

.
├── src├── autopyverifier/
│           ├── cli.py              # CLI entry point
│           ├── config.py           # search and model configuration dataclasses
│           ├── data.py             # JSONL devset loading utilities
│           ├── execution.py        # verifier parsing, sandboxing, execution
│           ├── metrics.py          # scoring and feasibility metrics
│           ├── models.py           # shared dataclasses
│           ├── prompts.py          # seed / critic / refine / context prompts
│           ├── search.py           # main single-DAG search loop
│           └── llm/
│               ├── base.py
│               ├── mock.py
│               ├── openai_llms.py
│               ├── gemini_llms.py
│               └── claude_llms.py
├── data/
    └── toy/
        ├── devset.jsonl
        └── task_description.txt

⚙️ Requirements

Use Python >= 3.10.18.

Install dependencies:

pip install -r requirements.txt

🔑 API keys

Set only the key for the backend you plan to use. For example:

export OPENAI_API_KEY="..."

📥 Input format

The development set is a JSONL file. Each line should have:

id: example identifier
query: task input
output: model output to verify
objective: 1 if the output satisfies the target objective, else 0
metadata (optional): extra per-example metadata

Example:

{"id": "m1", "query": "Solve x^2 - 5x + 6 = 0.", "output": "x^2 - 5x + 6 = (x-2)(x-3), so x=2 or x=3.", "objective": 1}
{"id": "m2", "query": "Solve x^2 - 5x + 6 = 0.", "output": "The roots are 1 and 6.", "objective": 0}

The task description is a plain-text file describing:

what query and output represent,
what objective labels 1 and 0 mean,
what kinds of verifier logic are allowed or desired, and
what the search should optimize for.

🚀 Quick Start

To run the toy example, from the project root:

pip install -e .

python -m autopyverifier.cli search \
  --devset data/toy/devset.jsonl \
  --task_description_file data/toy/task_description.txt \
  --llm_backend openai \
  --seed_model gpt-5.4 \
  --critic_model gpt-5.4 \
  --refine_model gpt-5.4 \
  --context_model gpt-5.4 \
  --budget 20 \
  --feasible_coef 0.1 \
  --explore_coef 0.1 \
  --size_coef 0.1 \
  --out_dir results/toy/gpt54

Useful optional flags:

--budget: number of search iterations
--temperature: sampling temperature passed to the backend
--max_output_tokens: output token budget for model calls
--beta_pp: feasibility threshold for lower-confidence acceptance precision
--beta_np: feasibility threshold for lower-confidence rejection precision
--timeout_seconds: per-bundle execution timeout
--out_dir: where to write search artifacts

🗂️ What gets written to `out_dir`

When --out_dir is provided, the search writes:

selected_verifier.py: source code for the chosen verifier bundle
selected_verifier.json: summary of the chosen verifier
graph.json: metadata for all explored nodes
summary.json: high-level search summary
nodes/*.py: source code for each explored verifier bundle

⭐ Citation

If you would like to cite our work, the bibtex is:

@article{pezeshkpour2026autopyverifier,
title={AutoPyVerifier: Learning Compact Executable Verifiers for Large Language Model Outputs},
author={Pezeshkpour, Pouya and Hruschka, Estevam},
year={2026}
}

📜 Disclosure

Embedded in, or bundled with, this product are open source software (OSS) components, datasets and other third party components identified below. The license terms respectively governing the datasets and third-party components continue to govern those portions, and you agree to those license terms, which, when applicable, specifically limit any distribution. You may receive a copy of, distribute and/or modify any open source code for the OSS component under the terms of their respective licenses, which may be CC license and Apache 2.0 license. In the event of conflicts between Megagon Labs, Inc., license conditions and the Open Source Software license conditions, the Open Source Software conditions shall prevail with respect to the Open Source Software portions of the software. You agree not to, and are not permitted to, distribute actual datasets used with the OSS components listed below. You agree and are limited to distribute only links to datasets from known sources by listing them in the datasets overview table below. You are permitted to distribute derived datasets of data sets from known sources by including links to original dataset source in the datasets overview table below. You agree that any right to modify datasets originating from parties other than Megagon Labs, Inc. are governed by the respective third party’s license conditions. All OSS components and datasets are distributed WITHOUT ANY WARRANTY, without even implied warranty such as for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE, and without any liability to or claim against any Megagon Labs, Inc. entity other than as explicitly documented in this README document. You agree to cease using any part of the provided materials if you do not agree with the terms or the lack of any warranty herein. While Megagon Labs, Inc., makes commercially reasonable efforts to ensure that citations in this document are complete and accurate, errors may occur. If you see any error or omission, please help us improve this document by sending information to contact_oss@megagon.ai.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Apr 24, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

autopyverifier-0.1.0.tar.gz (22.4 kB view details)

Uploaded Apr 24, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

autopyverifier-0.1.0-py3-none-any.whl (25.4 kB view details)

Uploaded Apr 24, 2026 Python 3

File details

Details for the file autopyverifier-0.1.0.tar.gz.

File metadata

Download URL: autopyverifier-0.1.0.tar.gz
Upload date: Apr 24, 2026
Size: 22.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.18

File hashes

Hashes for autopyverifier-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`45c7c5b1f022ea7b0f23421c5fa8e4abd2816e7b9ac3243865ac9b34b3a7d5d7`
MD5	`792a1ee2a3a2f3e33a1c537718fac1f3`
BLAKE2b-256	`73f07532bfa001b04c30c0475425369707257b6843f4ffbdd1e0545e02024332`

See more details on using hashes here.

File details

Details for the file autopyverifier-0.1.0-py3-none-any.whl.

File metadata

Download URL: autopyverifier-0.1.0-py3-none-any.whl
Upload date: Apr 24, 2026
Size: 25.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.18

File hashes

Hashes for autopyverifier-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`10e432ecad0ff0a3fc768cb360bfe46136a391cedc8b974ae4f616d824775f8f`
MD5	`c032a8540cfeaa446ce57b480f49780b`
BLAKE2b-256	`191b6c6ed08303b425d347eb49348aefbfdf906b2366827204006a4e75e669b1`

See more details on using hashes here.

autopyverifier 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

AutoPyVerifier: Learning Compact Executable Verifiers for Large Language Model Outputs

AutoPyVerifier

📂 Repository structure

⚙️ Requirements

🔑 API keys

📥 Input format

🚀 Quick Start

🗂️ What gets written to `out_dir`

⭐ Citation

📜 Disclosure

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

autopyverifier 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

AutoPyVerifier: Learning Compact Executable Verifiers for Large Language Model Outputs

AutoPyVerifier

📂 Repository structure

⚙️ Requirements

🔑 API keys

📥 Input format

🚀 Quick Start

🗂️ What gets written to out_dir

⭐ Citation

📜 Disclosure

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

🗂️ What gets written to `out_dir`