CogBench — Verifiable Cognitive Constraint Benchmark for LLM Question Generation

These details have not been verified by PyPI

Project links

Homepage

Project description

CogBench

Verifiable Cognitive Constraint Benchmark for LLM Question Generation

CogBench evaluates whether large language models can generate educationally appropriate questions at specific Bloom's Taxonomy cognitive levels while satisfying 28 deterministic constraints.

Features

6 Bloom's Taxonomy Levels: Remember, Understand, Apply, Analyze, Evaluate, Create
8 Academic Subjects: Biology, Chemistry, Physics, Mathematics, Psychology, Economics, History, Computer Science
28 Deterministic Constraints: No LLM-as-judge — every constraint is verifiable and reproducible
Adversarial Mode: Tests robustness by using mismatched cognitive-level vocabulary
120 Passages: From OpenStax open-access textbooks

Installation

pip install cogbench

For NLP features (key concept extraction with spaCy):

pip install cogbench[nlp]
python -m spacy download en_core_web_sm

Quick Start

# Show benchmark configuration
cogbench info

# Run benchmark for a single model (requires Ollama)
cogbench run --model qwen2.5:14b --mode standard

# Run all local models in both modes
cogbench run --local-all --mode both

# Re-evaluate existing results with updated constraints
cogbench evaluate

# Generate leaderboard data
cogbench leaderboard

CLI Commands

Command	Description
`cogbench info`	Show benchmark config, models, and passage stats
`cogbench run`	Generate questions and evaluate models
`cogbench evaluate`	Re-evaluate existing generation files
`cogbench leaderboard`	Populate leaderboard data.json from results
`cogbench scrape`	Scrape passages from OpenStax textbooks
`cogbench submit`	Submit results to the CogBench leaderboard via GitHub

Submitting Results

After running the benchmark, submit your results to the public leaderboard:

# Submit all completed models
cogbench submit --name "Your Name"

# Submit a specific model
cogbench submit --model qwen2.5:14b --name "Your Name"

This creates a GitHub issue with your results. The maintainers will review and add them to the leaderboard. Requires the GitHub CLI (gh).

Python API

from cogbenchv2.passages.processor import load_all_passages
from cogbenchv2.evaluation.evaluate import evaluate_generations
from cogbenchv2.evaluation.metrics import compute_metrics

# Load the 120 bundled passages
passages = load_all_passages()

# Evaluate a generation file
evaluations = evaluate_generations("gen_qwen2_5_14b_standard.json", passages)

# Compute metrics
metrics = compute_metrics(evaluations)
print(f"Prompt-level strict: {metrics['prompt_level_strict']['rate']:.1%}")

Environment Variables

Variable	Default	Description
`COGBENCH_RESULTS_DIR`	`./data/results`	Where benchmark results are saved
`OPENAI_API_KEY`	—	For OpenAI API models
`GOOGLE_API_KEY`	—	For Google Gemini models
`TOGETHER_API_KEY`	—	For Together.ai models

License

Apache 2.0

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

1.1.0

Feb 28, 2026

This version

1.0.0

Feb 28, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cogbench-1.0.0.tar.gz (183.5 kB view details)

Uploaded Feb 28, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

cogbench-1.0.0-py3-none-any.whl (190.4 kB view details)

Uploaded Feb 28, 2026 Python 3

File details

Details for the file cogbench-1.0.0.tar.gz.

File metadata

Download URL: cogbench-1.0.0.tar.gz
Upload date: Feb 28, 2026
Size: 183.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for cogbench-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`a8fc3a3f0f8339b104dc1e54fe07c9516d604684f42f218c684e70efae400085`
MD5	`e67a72d5081344d4585183e1267b1d3f`
BLAKE2b-256	`fbd9d2a997893eda6c8599953a0c33dbddc01ad7fa21dc3f6ad96a7193db6ba2`

See more details on using hashes here.

File details

Details for the file cogbench-1.0.0-py3-none-any.whl.

File metadata

Download URL: cogbench-1.0.0-py3-none-any.whl
Upload date: Feb 28, 2026
Size: 190.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for cogbench-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8565dad7fe491956dd95e1f07a32717d8d090785e6c7ccc2855c5c39cadd4d18`
MD5	`af9aa6196ecba019b718764f46047014`
BLAKE2b-256	`b06e90e17c8c6db8d20c414773771395971a6f36be3c7ddafdfa2b7c2631278e`

See more details on using hashes here.

cogbench 1.0.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

CogBench

Features

Installation

Quick Start

CLI Commands

Submitting Results

Python API

Environment Variables

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes