Bayesian evaluation and ranking toolkit
Project description
scorio
scorio implements the Bayes@N framework introduced in Don't Pass@k: A Bayesian Framework for Large Language Model Evaluation
Installation
pip install scorio
Requires Python 3.9–3.13 and NumPy.
Data and shape conventions
- Categories: encode outcomes per trial as integers in
{0, ..., C}. - Weights: choose rubric weights
wof lengthC+1(e.g.,[0, 1]for binary R). - Shapes:
RisM x N,R0isM x D(if provided); both must share the sameMand category set.
APIs
-
scorio.eval.bayes(R, w, R0=None) -> (mu: float, sigma: float)R:M x Nint array with entries in{0, ..., C}w: lengthC+1float array of rubric weightsR0(optional):M x Dint array of prior outcomes (same category set asR)- Returns posterior estimate
muof the rubric-weighted performance and its uncertaintysigma.
-
scorio.eval.avg(R) -> float- Returns the naive mean of elements in
R. For binary accuracy, encode incorrect=0, correct=1.
- Returns the naive mean of elements in
How to use
import numpy as np
from scorio.eval import bayes
# Outcomes R: shape (M, N) with integer categories in {0, ..., C}
R = np.array([
[0, 1, 2, 2, 1], # Item 1, N=5 trials
[1, 1, 0, 2, 2], # Item 2, N=5 trials
])
# Rubric weights w: length C+1. Here: 0=incorrect, 1=partial(0.5), 2=correct(1.0)
w = np.array([0.0, 0.5, 1.0])
# Optional prior outcomes R0: shape (M, D). If omitted, D=0.
R0 = np.array([
[0, 2],
[1, 2],
])
# With prior (D=2 → T=10)
mu, sigma = bayes(R, w, R0)
print(mu, sigma) # expected ~ (0.575, 0.084275)
# Without prior (D=0 → T=8)
mu2, sigma2 = bayes(R, w)
print(mu2, sigma2) # expected ~ (0.5625, 0.091998)
Citing
If you use scorio or Bayes@N, please cite:
@article{hariri2025dontpassk,
title = {Don't Pass@k: A Bayesian Framework for Large Language Model Evaluation},
author = {Hariri, Mohsen and Samandar, Amirhossein and Hinczewski, Michael and Chaudhary, Vipin},
journal={arXiv preprint arXiv:2510.04265},
year = {2025},
url = {https://scorio.readthedocs.io/}
}
License
MIT License. See the LICENSE file for details.
Support
- Documentation: https://scorio.readthedocs.io/
- Issues and feature requests: https://github.com/mohsenhariri/scorio/issues
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file scorio-0.1.0.tar.gz.
File metadata
- Download URL: scorio-0.1.0.tar.gz
- Upload date:
- Size: 12.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
418b1e040a9c73b968aa991aa243174a266a42c6843abc2f754fefcafe3fc4a6
|
|
| MD5 |
bda26f5ad406fcbf31763c161eaa694f
|
|
| BLAKE2b-256 |
e83eaf52d1ade7c8def033f454b44735f20ad5bcc32e7ded85501efaaa48e53f
|
File details
Details for the file scorio-0.1.0-py3-none-any.whl.
File metadata
- Download URL: scorio-0.1.0-py3-none-any.whl
- Upload date:
- Size: 8.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
80e184daf998ea5bc4fdc1eca1f609167db18605f5a86f03f72beade9cbc47d4
|
|
| MD5 |
17c2be1026fa20cefe8aeec8fce64ff3
|
|
| BLAKE2b-256 |
cd6371e25e09304588ac142ff437d248a358bb3d3447605311eaca0b4dbb83dd
|