A library of checklist generation and scoring methods for LLM evaluation

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

chenhao9 kazhou

These details have not been verified by PyPI

Project links

Documentation

Project description

AutoChecklist

A library of composable pipelines for generating and scoring checklist criteria.

A checklist is a list of yes/no questions used. autochecklist provides 5 generator abstractions, each representing a different reasoning approach to producing evaluation criteria, along with a configurable ChecklistScorer that consolidates three scoring strategies from literature. You can mix, extend, and customize all components.

Terminology

input: The instruction, query, or task given to the LLM being evaluated (e.g., "Write a haiku about autumn").
target: The output being evaluated against the checklist (e.g., the haiku the LLM produced).
reference: An optional gold-standard response used by some methods to improve checklist generation.

Generator Abstractions

The core of the library is 5 generator classes, each implementing a distinct approach to producing checklists:

Level	Generator	Approach	Analogy
Instance	`DirectGenerator`	Prompt → checklist	Direct inference
Instance	`ContrastiveGenerator`	Candidates → checklist	Counterfactual reasoning
Corpus	`InductiveGenerator`	Observations → criteria	Inductive reasoning (bottom-up)
Corpus	`DeductiveGenerator`	Dimensions → criteria	Deductive reasoning (top-down)
Corpus	`InteractiveGenerator`	Eval sessions → criteria	Protocol analysis

Instance-level generators produce one checklist per input — criteria are tailored to each specific task. Corpus-level generators produce one checklist for an entire dataset — criteria capture general quality patterns derived from higher-level signals.

Each generator is customizable via prompt templates (.md files with {input}, {target} placeholders). You can use the built-in paper implementations, write your own prompts, or chain generators with different refiners and scorers to build custom evaluation pipelines.

Built-in Pipelines

The library includes built-in pipelines implementing methods from research papers (TICK, RocketEval, RLCF, OpenRubrics, CheckEval, InteractEval, and more). See Supported Pipelines for the full list and configuration details.

Scoring

A single configurable ChecklistScorer class supports all scoring modes:

Config	Description
`mode="batch"`	All items in one LLM call (efficient)
`mode="batch", capture_reasoning=True`	Batch with per-item explanations
`mode="item"`	One item per call
`mode="item", capture_reasoning=True`	One item per call with reasoning
`mode="item", primary_metric="weighted"`	Item weights (0-100) for importance
`mode="item", use_logprobs=True`	Logprob confidence calibration

Refiners

Refiners are pipeline stages that clean up raw checklists before scoring. They're used by corpus-level generators internally, and can also be composed into custom pipelines:

Deduplicator — merges semantically similar items via embeddings
Tagger — filters by applicability and specificity
UnitTester — validates that items are enforceable
Selector — picks a diverse subset via beam search

Installation

pip install autochecklist

Optional extras

# ML dependencies for corpus-level refiners (embeddings, deduplication)
pip install "autochecklist[ml]"

# vLLM for offline GPU inference (no server needed)
pip install "autochecklist[vllm]"

# Everything
pip install "autochecklist[all]"

For development installation from source, see the GitHub repository.

Quick Start

from autochecklist import pipeline

pipe = pipeline("tick", generator_model="openai/gpt-5-mini", scorer_model="openai/gpt-5-mini")
result = pipe(input="Write a haiku about autumn.", target="Leaves fall gently down...")
print(f"Pass rate: {result.pass_rate:.0%}")

See the Quick Start guide for custom prompts, batch evaluation, and more.

CLI

autochecklist run --pipeline tick --data eval_data.jsonl -o results.jsonl \
  --generator-model openai/gpt-4o-mini --scorer-model openai/gpt-4o-mini

See the CLI guide for all commands.

Citation

TBA

License

Apache-2.0 (see LICENSE)

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

chenhao9 kazhou

These details have not been verified by PyPI

Project links

Documentation

Release history Release notifications | RSS feed

0.2.2

Mar 9, 2026

This version

0.2.1

Mar 7, 2026

0.2.0

Mar 7, 2026

0.1.0

Feb 28, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

autochecklist-0.2.1.tar.gz (4.9 MB view details)

Uploaded Mar 7, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

autochecklist-0.2.1-py3-none-any.whl (122.7 kB view details)

Uploaded Mar 7, 2026 Python 3

File details

Details for the file autochecklist-0.2.1.tar.gz.

File metadata

Download URL: autochecklist-0.2.1.tar.gz
Upload date: Mar 7, 2026
Size: 4.9 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for autochecklist-0.2.1.tar.gz
Algorithm	Hash digest
SHA256	`adee2c635ad16d48a7090d66dfc329dcd7b88e505de04bfd24ebc24743f75679`
MD5	`cb324dc38d903154023e3ebd12b5f2af`
BLAKE2b-256	`ebc0bfa1fd52df7a211cd8f52ef8191d2b76a69b2ae82f7614cc5556dfce9d70`

See more details on using hashes here.

Provenance

The following attestation bundles were made for autochecklist-0.2.1.tar.gz:

Publisher: publish.yml on ChicagoHAI/AutoChecklist

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: autochecklist-0.2.1.tar.gz
- Subject digest: adee2c635ad16d48a7090d66dfc329dcd7b88e505de04bfd24ebc24743f75679
- Sigstore transparency entry: 1053809696
- Sigstore integration time: Mar 7, 2026
Source repository:
- Permalink: ChicagoHAI/AutoChecklist@0131e0468b007fa1b5fe884bb156db3820285b42
- Branch / Tag: refs/tags/v0.2.1
- Owner: https://github.com/ChicagoHAI
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@0131e0468b007fa1b5fe884bb156db3820285b42
- Trigger Event: release

File details

Details for the file autochecklist-0.2.1-py3-none-any.whl.

File metadata

Download URL: autochecklist-0.2.1-py3-none-any.whl
Upload date: Mar 7, 2026
Size: 122.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for autochecklist-0.2.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`0d8fb766138393c2444d3fd0a0e21b59b4782d32b8a9fb202896401c083d9fda`
MD5	`7769c328fcae122913a659a58e867ad8`
BLAKE2b-256	`bcaa9016b19dbaeb4379da23d329826571f1700299cb35f7117497f6f1a93a15`

See more details on using hashes here.

Provenance

The following attestation bundles were made for autochecklist-0.2.1-py3-none-any.whl:

Publisher: publish.yml on ChicagoHAI/AutoChecklist

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: autochecklist-0.2.1-py3-none-any.whl
- Subject digest: 0d8fb766138393c2444d3fd0a0e21b59b4782d32b8a9fb202896401c083d9fda
- Sigstore transparency entry: 1053809711
- Sigstore integration time: Mar 7, 2026
Source repository:
- Permalink: ChicagoHAI/AutoChecklist@0131e0468b007fa1b5fe884bb156db3820285b42
- Branch / Tag: refs/tags/v0.2.1
- Owner: https://github.com/ChicagoHAI
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@0131e0468b007fa1b5fe884bb156db3820285b42
- Trigger Event: release

autochecklist 0.2.1

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

AutoChecklist

Terminology

Generator Abstractions

Built-in Pipelines

Scoring

Refiners

Installation

Optional extras

Quick Start

CLI

Links

Citation

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance