Skip to main content

LLM-based evaluation of multiple-choice items against item-writing guidelines

Project description

itemwise

PyPI CI License: MIT Python 3.12+

LLM-based evaluation of multiple-choice items against the 43 item-writing rules from Haladyna & Downing (1989). Works with any LLM provider via litellm.

Installation

pip install itemwise

Requires Python 3.12+.

Quick Start

from itemwise import evaluate

result = evaluate(
    item={
        "stem": "Which of the following is NOT a characteristic of mammals?",
        "options": [
            "They are warm-blooded",
            "They lay eggs",
            "They have hair or fur",
            "They produce milk",
        ],
        "correct": 1,
    },
    model="azure/gpt-5.1-chat",
)

print(result.score)            # fraction of rules passed
print(result.violations)       # list of failed RuleResult
print(result.usage.cost)       # LLM cost in USD

Usage

from itemwise import evaluate, evaluate_batch, async_evaluate_batch

# Select specific rules
evaluate(item=item, model="azure/gpt-5.1-chat", rules=[22, 28, 37])

# Batch with progress bar (disable via progress=False)
evaluate_batch(items=items, model="azure/gpt-5.1-chat")

# Async / parallel
await async_evaluate_batch(items=items, model="azure/gpt-5.1-chat")

# Extra kwargs are forwarded to litellm
evaluate(item=item, model="azure/gpt-5.1-chat", reasoning_effort="low")

CLI

itemwise evaluate questions.json --model azure/gpt-5.1-chat
itemwise evaluate questions.json --model azure/gpt-5.1-chat --rules 22,28,37 --param reasoning_effort=low

Input JSON format:

[{"stem": "...", "options": ["A", "B", "C", "D"], "correct": 0}]

LLM Configuration

Model names and parameters follow litellm conventions. For Azure OpenAI:

export AZURE_API_KEY=...
export AZURE_API_BASE=https://your-resource.cognitiveservices.azure.com/
export AZURE_API_VERSION=2024-12-01-preview

Item-Writing Rules

43 rules from Haladyna & Downing (1989) across 6 categories:

Category Rules Description
General (Procedural) 1-7 Format, grammar, readability
General (Content) 8-17 Objectives, vocabulary, higher-order thinking
Stem Construction 18-23 Clarity, positive wording
General Option 24-35 Count, order, homogeneity, length
Correct Option 36-37 Position distribution, uniqueness
Distractor 38-43 Plausibility, common errors

Rules 11 (item independence) and 36 (answer position distribution) require cross-item analysis and are excluded by default. Pass them explicitly via rules=[11, 36] to include them.

References

  • Haladyna, T. M., & Downing, S. M. (1989). A taxonomy of multiple-choice item-writing rules. Applied Measurement in Education, 2(1), 37-50.
  • Haladyna, T. M., Downing, S. M., & Rodriguez, M. C. (2002). A review of multiple-choice item-writing guidelines for classroom assessment. Applied Measurement in Education, 15(3), 309-333.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

itemwise-0.1.1.tar.gz (8.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

itemwise-0.1.1-py3-none-any.whl (10.7 kB view details)

Uploaded Python 3

File details

Details for the file itemwise-0.1.1.tar.gz.

File metadata

  • Download URL: itemwise-0.1.1.tar.gz
  • Upload date:
  • Size: 8.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.7 {"installer":{"name":"uv","version":"0.11.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for itemwise-0.1.1.tar.gz
Algorithm Hash digest
SHA256 0ee46284e0e3955ee419580806d31f676638abd59a0ea2ebcd25aed588f5a340
MD5 69d0835c4e5cf306653baeac60d90b19
BLAKE2b-256 2f9ad3802458a592969b6738c6ba88daf4f0a8bff01ce2774f46b19d44339b51

See more details on using hashes here.

File details

Details for the file itemwise-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: itemwise-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 10.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.7 {"installer":{"name":"uv","version":"0.11.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for itemwise-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 678d3c09091ab7237b019cc5e368e1e5c9c4613170903c9a22ea55122008ae30
MD5 057e3880b5743d957fc80e3d9e69632a
BLAKE2b-256 0f774fc495e73b72fe0e81d11e9da5df9994fd0b43c5c6169abae51dbcc49cf7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page