LLM-based evaluation of multiple-choice items against item-writing guidelines
Project description
itemwise
LLM-based evaluation of multiple-choice items against the 43 item-writing rules from Haladyna & Downing (1989). Works with any LLM provider via litellm.
Installation
pip install itemwise
Requires Python 3.12+.
Quick Start
from itemwise import evaluate
result = evaluate(
item={
"stem": "Which of the following is NOT a characteristic of mammals?",
"options": [
"They are warm-blooded",
"They lay eggs",
"They have hair or fur",
"They produce milk",
],
"correct": 1,
},
model="azure/gpt-5.1-chat",
)
print(result.score) # fraction of rules passed
print(result.violations) # list of failed RuleResult
print(result.usage.cost) # LLM cost in USD
Usage
from itemwise import evaluate, evaluate_batch, async_evaluate_batch
# Select specific rules
evaluate(item=item, model="azure/gpt-5.1-chat", rules=[22, 28, 37])
# Batch with progress bar (disable via progress=False)
evaluate_batch(items=items, model="azure/gpt-5.1-chat")
# Async / parallel
await async_evaluate_batch(items=items, model="azure/gpt-5.1-chat")
# Extra kwargs are forwarded to litellm
evaluate(item=item, model="azure/gpt-5.1-chat", reasoning_effort="low")
CLI
itemwise evaluate questions.json --model azure/gpt-5.1-chat
itemwise evaluate questions.json --model azure/gpt-5.1-chat --rules 22,28,37 --param reasoning_effort=low
Input JSON format:
[{"stem": "...", "options": ["A", "B", "C", "D"], "correct": 0}]
LLM Configuration
Model names and parameters follow litellm conventions. For Azure OpenAI:
export AZURE_API_KEY=...
export AZURE_API_BASE=https://your-resource.cognitiveservices.azure.com/
export AZURE_API_VERSION=2024-12-01-preview
Item-Writing Rules
43 rules from Haladyna & Downing (1989) across 6 categories:
| Category | Rules | Description |
|---|---|---|
| General (Procedural) | 1-7 | Format, grammar, readability |
| General (Content) | 8-17 | Objectives, vocabulary, higher-order thinking |
| Stem Construction | 18-23 | Clarity, positive wording |
| General Option | 24-35 | Count, order, homogeneity, length |
| Correct Option | 36-37 | Position distribution, uniqueness |
| Distractor | 38-43 | Plausibility, common errors |
Rules 11 (item independence) and 36 (answer position distribution) require cross-item analysis and are excluded by default. Pass them explicitly via rules=[11, 36] to include them.
References
- Haladyna, T. M., & Downing, S. M. (1989). A taxonomy of multiple-choice item-writing rules. Applied Measurement in Education, 2(1), 37-50.
- Haladyna, T. M., Downing, S. M., & Rodriguez, M. C. (2002). A review of multiple-choice item-writing guidelines for classroom assessment. Applied Measurement in Education, 15(3), 309-333.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file itemwise-0.1.1.tar.gz.
File metadata
- Download URL: itemwise-0.1.1.tar.gz
- Upload date:
- Size: 8.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.11.7 {"installer":{"name":"uv","version":"0.11.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0ee46284e0e3955ee419580806d31f676638abd59a0ea2ebcd25aed588f5a340
|
|
| MD5 |
69d0835c4e5cf306653baeac60d90b19
|
|
| BLAKE2b-256 |
2f9ad3802458a592969b6738c6ba88daf4f0a8bff01ce2774f46b19d44339b51
|
File details
Details for the file itemwise-0.1.1-py3-none-any.whl.
File metadata
- Download URL: itemwise-0.1.1-py3-none-any.whl
- Upload date:
- Size: 10.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.11.7 {"installer":{"name":"uv","version":"0.11.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
678d3c09091ab7237b019cc5e368e1e5c9c4613170903c9a22ea55122008ae30
|
|
| MD5 |
057e3880b5743d957fc80e3d9e69632a
|
|
| BLAKE2b-256 |
0f774fc495e73b72fe0e81d11e9da5df9994fd0b43c5c6169abae51dbcc49cf7
|