Search for compact executable verifier sets for LLM outputs.
Project description
AutoPyVerifier: Learning Compact Executable Verifiers for Large Language Model Outputs
This repository is the implementation for the paper "AutoPyVerifier: Learning Compact Executable Verifiers for Large Language Model Outputs".
AutoPyVerifier
AutoPyVerifier is a pipeline for searching over deterministic Python verifier bundles for labeled LLM outputs.
Given a development set of (query, model_output, objective) examples and a task description, the system uses an LLM to iteratively:
- propose initial verifier bundles,
- critique their failures,
- refine them into new candidates,
- execute each bundle in a restricted sandbox,
- search over the DAG, and
- select a compact verifier set that best balances the score, exploration, feasibility, and size.
๐ Repository structure
.
โโโ srcโโโ autopyverifier/
โ โโโ cli.py # CLI entry point
โ โโโ config.py # search and model configuration dataclasses
โ โโโ data.py # JSONL devset loading utilities
โ โโโ execution.py # verifier parsing, sandboxing, execution
โ โโโ metrics.py # scoring and feasibility metrics
โ โโโ models.py # shared dataclasses
โ โโโ prompts.py # seed / critic / refine / context prompts
โ โโโ search.py # main single-DAG search loop
โ โโโ llm/
โ โโโ base.py
โ โโโ mock.py
โ โโโ openai_llms.py
โ โโโ gemini_llms.py
โ โโโ claude_llms.py
โโโ data/
โโโ toy/
โโโ devset.jsonl
โโโ task_description.txt
โ๏ธ Requirements
Use Python >= 3.10.18.
Install dependencies:
pip install -r requirements.txt
๐ API keys
Set only the key for the backend you plan to use. For example:
export OPENAI_API_KEY="..."
๐ฅ Input format
The development set is a JSONL file. Each line should have:
id: example identifierquery: task inputoutput: model output to verifyobjective:1if the output satisfies the target objective, else0metadata(optional): extra per-example metadata
Example:
{"id": "m1", "query": "Solve x^2 - 5x + 6 = 0.", "output": "x^2 - 5x + 6 = (x-2)(x-3), so x=2 or x=3.", "objective": 1}
{"id": "m2", "query": "Solve x^2 - 5x + 6 = 0.", "output": "The roots are 1 and 6.", "objective": 0}
The task description is a plain-text file describing:
- what
queryandoutputrepresent, - what objective labels
1and0mean, - what kinds of verifier logic are allowed or desired, and
- what the search should optimize for.
๐ Quick Start
To run the toy example, from the project root:
pip install -e .
python -m autopyverifier.cli search \
--devset data/toy/devset.jsonl \
--task_description_file data/toy/task_description.txt \
--llm_backend openai \
--seed_model gpt-5.4 \
--critic_model gpt-5.4 \
--refine_model gpt-5.4 \
--context_model gpt-5.4 \
--budget 20 \
--feasible_coef 0.1 \
--explore_coef 0.1 \
--size_coef 0.1 \
--out_dir results/toy/gpt54
Useful optional flags:
--budget: number of search iterations--temperature: sampling temperature passed to the backend--max_output_tokens: output token budget for model calls--beta_pp: feasibility threshold for lower-confidence acceptance precision--beta_np: feasibility threshold for lower-confidence rejection precision--timeout_seconds: per-bundle execution timeout--out_dir: where to write search artifacts
๐๏ธ What gets written to out_dir
When --out_dir is provided, the search writes:
selected_verifier.py: source code for the chosen verifier bundleselected_verifier.json: summary of the chosen verifiergraph.json: metadata for all explored nodessummary.json: high-level search summarynodes/*.py: source code for each explored verifier bundle
โญ Citation
If you would like to cite our work, the bibtex is:
@article{pezeshkpour2026autopyverifier,
title={AutoPyVerifier: Learning Compact Executable Verifiers for Large Language Model Outputs},
author={Pezeshkpour, Pouya and Hruschka, Estevam},
year={2026}
}
๐ Disclosure
Embedded in, or bundled with, this product are open source software (OSS) components, datasets and other third party components identified below. The license terms respectively governing the datasets and third-party components continue to govern those portions, and you agree to those license terms, which, when applicable, specifically limit any distribution. You may receive a copy of, distribute and/or modify any open source code for the OSS component under the terms of their respective licenses, which may be CC license and Apache 2.0 license. In the event of conflicts between Megagon Labs, Inc., license conditions and the Open Source Software license conditions, the Open Source Software conditions shall prevail with respect to the Open Source Software portions of the software. You agree not to, and are not permitted to, distribute actual datasets used with the OSS components listed below. You agree and are limited to distribute only links to datasets from known sources by listing them in the datasets overview table below. You are permitted to distribute derived datasets of data sets from known sources by including links to original dataset source in the datasets overview table below. You agree that any right to modify datasets originating from parties other than Megagon Labs, Inc. are governed by the respective third partyโs license conditions. All OSS components and datasets are distributed WITHOUT ANY WARRANTY, without even implied warranty such as for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE, and without any liability to or claim against any Megagon Labs, Inc. entity other than as explicitly documented in this README document. You agree to cease using any part of the provided materials if you do not agree with the terms or the lack of any warranty herein. While Megagon Labs, Inc., makes commercially reasonable efforts to ensure that citations in this document are complete and accurate, errors may occur. If you see any error or omission, please help us improve this document by sending information to contact_oss@megagon.ai.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file autopyverifier-0.1.0.tar.gz.
File metadata
- Download URL: autopyverifier-0.1.0.tar.gz
- Upload date:
- Size: 22.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.18
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
45c7c5b1f022ea7b0f23421c5fa8e4abd2816e7b9ac3243865ac9b34b3a7d5d7
|
|
| MD5 |
792a1ee2a3a2f3e33a1c537718fac1f3
|
|
| BLAKE2b-256 |
73f07532bfa001b04c30c0475425369707257b6843f4ffbdd1e0545e02024332
|
File details
Details for the file autopyverifier-0.1.0-py3-none-any.whl.
File metadata
- Download URL: autopyverifier-0.1.0-py3-none-any.whl
- Upload date:
- Size: 25.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.18
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
10e432ecad0ff0a3fc768cb360bfe46136a391cedc8b974ae4f616d824775f8f
|
|
| MD5 |
c032a8540cfeaa446ce57b480f49780b
|
|
| BLAKE2b-256 |
191b6c6ed08303b425d347eb49348aefbfdf906b2366827204006a4e75e669b1
|