Bayesian updating and Item Response Theory (MCQ 3PL, graded/continuous QA) for candidate skill assessment

These details have not been verified by PyPI

Project links

Project description

hb-irt

Bayesian ability estimation and Item Response Theory (IRT) for skill assessment

hb-irt scores candidate responses — multiple-choice questions and open-ended questions graded on a 0-10 or 0-100 scale — into a single latent ability estimate with a calibrated uncertainty range, and supports adaptive, multi-stage testing on top of it.

Features
Installation
Quickstart
Concepts
Usage guide
Package layout
Development
License

Features


Item models	3PL (multiple-choice), GRM (0-10 graded), CRM (0-100 continuous)
Ability estimation	EAP (Gauss-Hermite quadrature) and MAP, with posterior variance and 95% credible intervals
Sequential updating	Each test stage's posterior becomes the next stage's prior — no re-scoring from scratch
Item calibration	Marginal Maximum Likelihood (MMLE/EM) fit from raw response data
Score aggregation	Precision-weighted combination of per-topic estimates into a 0-100 score ± margin of error
Adaptive testing	Information-maximizing module selection with exposure control, and configurable stopping rules

Responses from any mix of item types combine into a single ability estimate:

Response type	Model	`value` represents
Multiple-choice	3PL	`0` / `1`
Open-ended, scored 0-10	GRM	integer category `0`–`10`
Open-ended, scored 0-100	CRM	continuous score `0`–`100`

Installation

pip install hb-irt

or, with uv:

uv add hb-irt

Requires Python 3.12+. Depends on numpy and scipy.

Quickstart

from hb_irt.types import Item
from hb_irt.models.threepl import ThreePLModel
from hb_irt.bayes.estimation import eap_estimate
from hb_irt.bayes.sequential import sequential_update
from hb_irt.scoring import build_subskill_score

# Define a small item bank (discrimination, difficulty, guessing)
items = [Item(item_id=f"q{i}", a=1.2, b=b, c=0.2)
         for i, b in enumerate([-1.0, -0.3, 0.4, 1.0, 1.6])]
models = [ThreePLModel(item) for item in items]

# Stage 1: estimate ability from a prior N(0, 1)
posterior = eap_estimate(list(zip(models, [1, 1, 0, 1, 0])), prior_mu=0.0, prior_sigma=1.0)

# Stage 2: the previous posterior becomes the new prior
posterior = sequential_update(posterior, list(zip(models, [1, 1, 1, 0, 1])))

# Rescale to a 0-100 score with a 95% margin of error
score = build_subskill_score(
    "python_debugging", posterior, items_administered=10, modules_completed=2
)
print(f"{score.score_0_100:.1f} ± {score.margin_error_95:.1f}")

Concepts

Ability (θ) is represented on a logit scale internally, typically in the range [-4, 4]. Use hb_irt.scoring.rescale_0_100 to convert a posterior to a 0-100 score with a margin of error whenever you need to display it.
Posterior(mu, variance) is the shared representation of a belief about a candidate's ability throughout the library. It exposes .sem (standard error) and .credible_interval(level).
Item models (3PL, GRM, CRM) share a minimal common interface: loglik(value, theta) and info(theta) — which is what lets responses of every type combine into a single ability estimate.

Usage guide

Multiple-choice items (3PL)

from hb_irt.types import Item
from hb_irt.models.threepl import ThreePLModel

item = ThreePLModel(Item(item_id="q1", a=1.2, b=0.3, c=0.2))
item.probability(theta=0.3)   # probability of a correct response at theta=0.3
item.loglik(value=1, theta=0.3)
item.info(theta=0.3)          # Fisher information at theta=0.3

Open-ended answers scored 0-10 (Graded Response Model)

from hb_irt.models.grm import GRMItem, GRMModel

# 10 boundaries define 11 ordered categories (scores 0..10)
item = GRMModel(GRMItem(item_id="qa1", a=1.0, boundaries=(-2, -1, 0, 1, 2, 3, 4, 5, 6, 7)))
item.category_probabilities(theta=0.5)   # probability of each of the 11 scores
item.loglik(value=7, theta=0.5)          # value is the observed 0-10 score
item.info(theta=0.5)

Open-ended answers scored 0-100 (Continuous Response Model)

from hb_irt.models.crm import CRMItem, CRMModel

item = CRMModel(CRMItem(item_id="qa2", a=1.0, b=0.0, max_score=100.0))
item.loglik(value=72.0, theta=0.4)
item.info(theta=0.4)

Ability estimation (EAP / MAP) across mixed item types

from hb_irt.bayes.estimation import eap_estimate, map_estimate

# Any mix of item models works, since each just contributes a scalar
# loglik(value, theta) — MCQ, graded, and continuous responses combine freely.
responses = [(mcq_model, 1), (grm_model, 8), (crm_model, 80.0)]

posterior = eap_estimate(responses, prior_mu=0.0, prior_sigma=1.0)
# posterior.mu, posterior.variance, posterior.sem, posterior.credible_interval(0.95)

theta_map = map_estimate(responses, prior_mu=0.0, prior_sigma=1.0)

Sequential updating across test stages

from hb_irt.types import Posterior
from hb_irt.bayes.sequential import sequential_update, sequential_update_all

prior = Posterior(mu=0.0, variance=1.0)
stage_1_posterior = sequential_update(prior, stage_1_responses)
stage_2_posterior = sequential_update(stage_1_posterior, stage_2_responses)

# or in one call, given an ordered list of each stage's responses:
history = sequential_update_all(prior, [stage_1_responses, stage_2_responses])

Posterior variance is guaranteed to never increase across stages (assuming non-degenerate item information), so estimates only get more precise as a candidate answers more items.

Item calibration (MMLE/EM)

Fit 3PL item parameters from a batch of raw response data:

import numpy as np
from hb_irt.calibration import calibrate_3pl

responses = np.array(...)  # shape (n_examinees, n_items), binary 0/1
result = calibrate_3pl(responses, item_ids=["q1", "q2", "q3"], fixed_c=0.2)
result.items          # tuple of Item, with fitted discrimination/difficulty (and fixed guessing)
result.converged      # bool
result.n_iterations   # int

Pass fixed_c=<value> when calibrating with fewer than ~500 responses per item; otherwise omit it to freely estimate a guessing parameter per item.

Difficulty mapping by cognitive level

from hb_irt.bloom import difficulty_anchor, shrink_difficulty

difficulty_anchor("L4")  # -> 1.2  (an "Analysis"-level item's typical difficulty)

# Pull a noisy raw calibration estimate toward its level's typical difficulty,
# weighted by how confident each estimate is.
shrunk_b = shrink_difficulty(raw_difficulty=1.9, raw_variance=0.3, level="L4", sigma_b=0.4)

Score rescaling and topic aggregation

from hb_irt.scoring import aggregate_levels, rescale_0_100, build_subskill_score

# Combine estimates from several cognitive levels into one topic-level posterior
level_posterior = aggregate_levels(
    level_thetas={"L1": 0.4, "L2": 0.6, "L3": 0.5},
    level_variances={"L1": 0.05, "L2": 0.08, "L3": 0.06},
)

score, margin, ci_lower, ci_upper = rescale_0_100(level_posterior)

subskill_score = build_subskill_score(
    subskill_id="python_debugging",
    posterior=level_posterior,
    items_administered=42,
    modules_completed=4,
    level_thetas={"L1": 0.4, "L2": 0.6, "L3": 0.5},
)
# subskill_score.score_0_100, .margin_error_95, .ci_lower_95, .ci_upper_95, ...

Adaptive testing (MSAT): module bank, selection, stopping

from hb_irt.types import Posterior
from hb_irt.msat.module_bank import ModuleBank
from hb_irt.msat.selection import select_next_module
from hb_irt.msat.stopping import StoppingConfig, evaluate_stopping

bank = ModuleBank(modules=(easy_module, medium_module, hard_module, challenge_module))

current_posterior = Posterior(mu=0.2, variance=0.6)
administered = ["easy_1"]

next_module = select_next_module(bank, current_posterior, administered_ids=administered)

decision = evaluate_stopping(
    posterior=current_posterior,
    previous_posterior=prior_posterior,   # or None on the first module
    n_modules=2,
    n_items=15,
    config=StoppingConfig(),  # sigma_min=0.3, max_modules=8, min_items=20, delta_saturation=0.01
)
if decision.should_stop:
    print("stopping:", decision.reasons)  # e.g. ("precision_threshold",)

ModuleBank.available(administered_ids) returns modules not yet given to a candidate. select_next_module picks the module that maximizes expected information gain at the candidate's current ability estimate, with an exposure-control bonus that favors less-used modules.

Package layout

Module	Provides
`hb_irt.types`	Core data types: `Item`, `Response`, `Posterior`, `TestModule`, `SubskillScore`
`hb_irt.models.threepl`	3PL model for multiple-choice items
`hb_irt.models.grm`	Graded Response Model for 0-10 scored answers
`hb_irt.models.crm`	Continuous Response Model for 0-100 scored answers
`hb_irt.bayes.estimation`	EAP and MAP ability estimation
`hb_irt.bayes.sequential`	Sequential Bayesian updating across test stages
`hb_irt.information`	Fisher test information and standard error of measurement
`hb_irt.calibration`	MMLE/EM calibration of 3PL item parameters
`hb_irt.bloom`	Cognitive-level difficulty anchors and shrinkage
`hb_irt.scoring`	0-100 rescaling and precision-weighted level aggregation
`hb_irt.msat`	Adaptive module bank, selection, and stopping rules

Import directly from the submodule you need, e.g. from hb_irt.models.threepl import ThreePLModel.

Development

For contribution guidelines, architecture notes, and project conventions, see CLAUDE.md.

uv sync
uv run pytest

License

MIT — see LICENSE.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.2

Jul 4, 2026

0.1.1

Jul 4, 2026

0.1.0

Jul 4, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hb_irt-0.1.2.tar.gz (16.2 kB view details)

Uploaded Jul 4, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

hb_irt-0.1.2-py3-none-any.whl (23.6 kB view details)

Uploaded Jul 4, 2026 Python 3

File details

Details for the file hb_irt-0.1.2.tar.gz.

File metadata

Download URL: hb_irt-0.1.2.tar.gz
Upload date: Jul 4, 2026
Size: 16.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.22 {"installer":{"name":"uv","version":"0.9.22","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for hb_irt-0.1.2.tar.gz
Algorithm	Hash digest
SHA256	`4572bb7af605b48f6863e2d480b593738b162743e0ebabb390376acabdc359b7`
MD5	`7ccb6f88ba8fbf3ca43fbbaf7840bb79`
BLAKE2b-256	`cdcd201bcc2ad6929b5f6c327814f98accb668bf59ccb2acee2a4b6e7d45804b`

See more details on using hashes here.

File details

Details for the file hb_irt-0.1.2-py3-none-any.whl.

File metadata

Download URL: hb_irt-0.1.2-py3-none-any.whl
Upload date: Jul 4, 2026
Size: 23.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.22 {"installer":{"name":"uv","version":"0.9.22","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for hb_irt-0.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`812c6dde8b8210b514a996fd3ef7430a738a0def600133eb36e33abc76d4e995`
MD5	`cde2d04186e2e9387b9a139bbb47d088`
BLAKE2b-256	`8ac5890b6e22a2a17ccad1aee7b4ad8b8881201f19e62adf3b811bcbd27d273b`

See more details on using hashes here.

hb-irt 0.1.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

hb-irt

Contents

Features

Installation

Quickstart

Concepts

Usage guide

Package layout

Development

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes