Bayesian evaluation and ranking toolkit
Project description
Scorio
scorio implements the Bayes@N framework introduced in Don't Pass@k: A Bayesian Framework for Large Language Model Evaluation.
News
- February 2026: New paper released: "Ranking Reasoning LLMs under Test-Time Scaling"
- February 2026: Our paper "Don't Pass@k: A Bayesian Framework for Large Language Model Evaluation" has been accepted to ICLR 2026.
- February 2026: Reasoning traces will be released in about 2 weeks.
Installation
# Install from PyPI
pip install scorio
# Install latest from GitHub
pip install "git+https://github.com/mohsenhariri/scorio.git"
# Install a specific tag
pip install "git+https://github.com/mohsenhariri/scorio.git@v0.2.0"
# Install from local repository
pip install -e .
Requires Python 3.10+, NumPy, SciPy.
Data and shape conventions
- Categories: encode outcomes per trial as integers in
{0, ..., C}. - Weights: choose rubric weights
wof lengthC+1(e.g.,[0, 1]for binary outcomes). - Shapes:
RisM x N,R0isM x D(if provided); both must share the sameMand category set.
APIs
scorio.eval.bayes(R, w, R0=None) -> (mu: float, sigma: float)R:M x Nint array with entries in{0, ..., C}w: lengthC+1float array of rubric weightsR0(optional):M x Dint array of prior outcomes (same category set asR)- Returns posterior estimate
muof the rubric-weighted performance and uncertaintysigma.
scorio.eval.avg(R) -> float- Returns the naive mean of elements in
R. For binary accuracy, encode incorrect=0 and correct=1.
- Returns the naive mean of elements in
How to use
import numpy as np
from scorio import eval
# Outcomes R: shape (M, N) with integer categories in {0, ..., C}
R = np.array([[0, 1, 2, 2, 1], [1, 1, 0, 2, 2]])
# Rubric weights w: length C+1
# Here: 0=incorrect(0.0), 1=partial(0.5), 2=correct(1.0)
w = np.array([0.0, 0.5, 1.0])
# Optional prior outcomes R0: shape (M, D)
R0 = np.array([[0, 2], [1, 2]])
# Bayesian evaluation with prior
mu, sigma = eval.bayes(R, w, R0)
print(f"mu = {mu:.6f}, sigma = {sigma:.6f}")
# Expected: mu ~ 0.575, sigma ~ 0.084275
# Bayesian evaluation without prior
mu2, sigma2 = eval.bayes(R, w)
print(f"mu = {mu2:.6f}, sigma = {sigma2:.6f}")
# Expected: mu ~ 0.5625, sigma ~ 0.091998
# Simple average
accuracy = eval.avg(R)
print(f"Average: {accuracy:.6f}")
Citing
If you use scorio in your research, please cite:
@inproceedings{hariri2026don,
title={Don't Pass@k: A Bayesian Framework for Large Language Model Evaluation},
author={Hariri, Mohsen and Samandar, Amirhossein and Hinczewski, Michael and Chaudhary, Vipin},
booktitle={The Fourteenth International Conference on Learning Representations},
year={2026},
url={https://arxiv.org/abs/2510.04265}
}
@article{hariri2026ranking,
title={Ranking Reasoning LLMs under Test-Time Scaling},
author={Hariri, Mohsen and Hinczewski, Michael and Ma, Jing and Chaudhary, Vipin},
journal={arXiv preprint arXiv:2510.04265},
year={2026},
url={https://arxiv.org/abs/2510.04265}
}
License
MIT License. See the LICENSE file for details.
Links
- Landing page: https://mohsenhariri.github.io/scorio/
- Documentation: https://scorio.readthedocs.io/en/latest/
- Repository: https://github.com/mohsenhariri/scorio
- Issues: https://github.com/mohsenhariri/scorio/issues
- ICLR 2026 poster: https://iclr.cc/virtual/2026/poster/10009669
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
scorio-0.2.0.tar.gz
(72.8 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
scorio-0.2.0-py3-none-any.whl
(82.1 kB
view details)
File details
Details for the file scorio-0.2.0.tar.gz.
File metadata
- Download URL: scorio-0.2.0.tar.gz
- Upload date:
- Size: 72.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
62fb1fc1a4eff30aeca7db3a32783a66e7d638c916cc6889a225f99e01e5a6a5
|
|
| MD5 |
2ddfbf5616c183d91b247880c67522aa
|
|
| BLAKE2b-256 |
53cf711b5b4aae277855d86af709d114f7a02b994c9d79cbb7b589a419a03b20
|
File details
Details for the file scorio-0.2.0-py3-none-any.whl.
File metadata
- Download URL: scorio-0.2.0-py3-none-any.whl
- Upload date:
- Size: 82.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0ce1aaf22324b1f56ffbd670770b21287368b0fb894a60599263463793896398
|
|
| MD5 |
c05f29b0072a3517c773bf913f7579c4
|
|
| BLAKE2b-256 |
64f6f0d2cc3ce593f44b594e0652837120a958c71ce43b377f8acb475ba651d0
|