A library of checklist generation and scoring methods for LLM evaluation
Project description
AutoChecklist
A library of composable pipelines for generating and scoring checklist criteria.
A checklist is a list of yes/no questions used. autochecklist provides 5 generator abstractions, each representing a different reasoning approach to producing evaluation criteria, along with a configurable ChecklistScorer that consolidates three scoring strategies from literature. You can mix, extend, and customize all components.
Terminology
input: The instruction, query, or task given to the LLM being evaluated (e.g., "Write a haiku about autumn").target: The output being evaluated against the checklist (e.g., the haiku the LLM produced).reference: An optional gold-standard response used by some methods to improve checklist generation.
Generator Abstractions
The core of the library is 5 generator classes, each implementing a distinct approach to producing checklists:
| Level | Generator | Approach | Analogy |
|---|---|---|---|
| Instance | DirectGenerator |
Prompt → checklist | Direct inference |
| Instance | ContrastiveGenerator |
Candidates → checklist | Counterfactual reasoning |
| Corpus | InductiveGenerator |
Observations → criteria | Inductive reasoning (bottom-up) |
| Corpus | DeductiveGenerator |
Dimensions → criteria | Deductive reasoning (top-down) |
| Corpus | InteractiveGenerator |
Eval sessions → criteria | Protocol analysis |
Instance-level generators produce one checklist per input — criteria are tailored to each specific task. Corpus-level generators produce one checklist for an entire dataset — criteria capture general quality patterns derived from higher-level signals.
Each generator is customizable via prompt templates (.md files with {input}, {target} placeholders). You can use the built-in paper implementations, write your own prompts, or chain generators with different refiners and scorers to build custom evaluation pipelines.
Built-in Pipelines
The library includes built-in pipelines implementing methods from research papers (TICK, RocketEval, RLCF, OpenRubrics, CheckEval, InteractEval, and more). See Supported Pipelines for the full list and configuration details.
Scoring
A single configurable ChecklistScorer class supports all scoring modes:
| Config | Description |
|---|---|
mode="batch" |
All items in one LLM call (efficient) |
mode="batch", capture_reasoning=True |
Batch with per-item explanations |
mode="item" |
One item per call |
mode="item", capture_reasoning=True |
One item per call with reasoning |
mode="item", primary_metric="weighted" |
Item weights (0-100) for importance |
mode="item", use_logprobs=True |
Logprob confidence calibration |
Refiners
Refiners are pipeline stages that clean up raw checklists before scoring. They're used by corpus-level generators internally, and can also be composed into custom pipelines:
- Deduplicator — merges semantically similar items via embeddings
- Tagger — filters by applicability and specificity
- UnitTester — validates that items are enforceable
- Selector — picks a diverse subset via beam search
Installation
pip install autochecklist
Optional extras
# ML dependencies for corpus-level refiners (embeddings, deduplication)
pip install "autochecklist[ml]"
# vLLM for offline GPU inference (no server needed)
pip install "autochecklist[vllm]"
# Everything
pip install "autochecklist[all]"
For development installation from source, see the GitHub repository.
Quick Start
from autochecklist import pipeline
pipe = pipeline("tick", generator_model="openai/gpt-5-mini", scorer_model="openai/gpt-5-mini")
result = pipe(input="Write a haiku about autumn.", target="Leaves fall gently down...")
print(f"Pass rate: {result.pass_rate:.0%}")
See the Quick Start guide for custom prompts, batch evaluation, and more.
CLI
autochecklist run --pipeline tick --data eval_data.jsonl -o results.jsonl \
--generator-model openai/gpt-4o-mini --scorer-model openai/gpt-4o-mini
See the CLI guide for all commands.
Links
- Documentation
- GitHub Repository — contributing, UI, dev setup
- Bug Tracker
Citation
TBA
License
Apache-2.0 (see LICENSE)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file autochecklist-0.2.1.tar.gz.
File metadata
- Download URL: autochecklist-0.2.1.tar.gz
- Upload date:
- Size: 4.9 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
adee2c635ad16d48a7090d66dfc329dcd7b88e505de04bfd24ebc24743f75679
|
|
| MD5 |
cb324dc38d903154023e3ebd12b5f2af
|
|
| BLAKE2b-256 |
ebc0bfa1fd52df7a211cd8f52ef8191d2b76a69b2ae82f7614cc5556dfce9d70
|
Provenance
The following attestation bundles were made for autochecklist-0.2.1.tar.gz:
Publisher:
publish.yml on ChicagoHAI/AutoChecklist
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
autochecklist-0.2.1.tar.gz -
Subject digest:
adee2c635ad16d48a7090d66dfc329dcd7b88e505de04bfd24ebc24743f75679 - Sigstore transparency entry: 1053809696
- Sigstore integration time:
-
Permalink:
ChicagoHAI/AutoChecklist@0131e0468b007fa1b5fe884bb156db3820285b42 -
Branch / Tag:
refs/tags/v0.2.1 - Owner: https://github.com/ChicagoHAI
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@0131e0468b007fa1b5fe884bb156db3820285b42 -
Trigger Event:
release
-
Statement type:
File details
Details for the file autochecklist-0.2.1-py3-none-any.whl.
File metadata
- Download URL: autochecklist-0.2.1-py3-none-any.whl
- Upload date:
- Size: 122.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0d8fb766138393c2444d3fd0a0e21b59b4782d32b8a9fb202896401c083d9fda
|
|
| MD5 |
7769c328fcae122913a659a58e867ad8
|
|
| BLAKE2b-256 |
bcaa9016b19dbaeb4379da23d329826571f1700299cb35f7117497f6f1a93a15
|
Provenance
The following attestation bundles were made for autochecklist-0.2.1-py3-none-any.whl:
Publisher:
publish.yml on ChicagoHAI/AutoChecklist
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
autochecklist-0.2.1-py3-none-any.whl -
Subject digest:
0d8fb766138393c2444d3fd0a0e21b59b4782d32b8a9fb202896401c083d9fda - Sigstore transparency entry: 1053809711
- Sigstore integration time:
-
Permalink:
ChicagoHAI/AutoChecklist@0131e0468b007fa1b5fe884bb156db3820285b42 -
Branch / Tag:
refs/tags/v0.2.1 - Owner: https://github.com/ChicagoHAI
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@0131e0468b007fa1b5fe884bb156db3820285b42 -
Trigger Event:
release
-
Statement type: