Bayesian evaluation and ranking toolkit
Project description
Scorio
News
-
April 2026 🎉: Our ranking paper "Ranking Reasoning LLMs under Test-Time Scaling" has been accepted to ACL 2026 Main Conference!
-
February 2026 🎉: Our paper "Don't Pass@k: A Bayesian Framework for Large Language Model Evaluation" has been accepted to ICLR 2026!
-
April 2026 🔜: Reasoning traces will be released soon.
Packages
This repository contains two packages:
scorio- Python implementationScorio.jl- Julia implementation
Quick Start
Python (scorio)
Installation
# Install from PyPI
pip install scorio
# Install latest from GitHub
pip install "git+https://github.com/mohsenhariri/scorio.git"
# Install a specific tag
pip install "git+https://github.com/mohsenhariri/scorio.git@v0.2.0"
# Install from local repository
pip install -e .
Basic Usage
import numpy as np
from scorio import eval
# Outcomes R: shape (M, N) with integer categories in {0, ..., C}
R = np.array([[0, 1, 2, 2, 1],
[1, 1, 0, 2, 2]])
# Rubric weights w: length C+1
# Here: 0=incorrect(0.0), 1=partial(0.5), 2=correct(1.0)
w = np.array([0.0, 0.5, 1.0])
# Optional prior outcomes R0: shape (M, D)
R0 = np.array([[0, 2],
[1, 2]])
# Bayesian evaluation with prior
mu, sigma = eval.bayes(R, w, R0)
print(f"μ = {mu:.6f}, σ = {sigma:.6f}")
# Expected: μ ≈ 0.575, σ ≈ 0.084275
# Bayesian evaluation without prior
mu2, sigma2 = eval.bayes(R, w)
print(f"μ = {mu2:.6f}, σ = {sigma2:.6f}")
# Expected: μ ≈ 0.5625, σ ≈ 0.091998
# Simple average
accuracy = eval.avg(R)
print(f"Average: {accuracy:.6f}")
Julia (Scorio.jl)
Installation
using Pkg
# From local development
Pkg.develop(path="./julia/Scorio.jl")
# Or from Julia General Registry
# Pkg.add("Scorio")
Basic Usage
using Scorio
# Outcomes R: shape (M, N) with integer categories in {0, ..., C}
R = [0 1 2 2 1;
1 1 0 2 2]
# Rubric weights w: length C+1
# Here: 0=incorrect(0.0), 1=partial(0.5), 2=correct(1.0)
w = [0.0, 0.5, 1.0]
# Optional prior outcomes R0: shape (M, D)
R0 = [0 2;
1 2]
# Bayesian evaluation with prior
mu, sigma = bayes(R, w, R0)
println("μ = $mu, σ = $sigma")
# Expected: μ ≈ 0.575, σ ≈ 0.084275
# Bayesian evaluation without prior
mu2, sigma2 = bayes(R, w)
println("μ = $mu2, σ = $sigma2")
# Expected: μ ≈ 0.5625, σ ≈ 0.091998
# Simple average
accuracy = avg(R)
println("Average: $accuracy")
Evaluation Functions
bayes(R, w, R0=None)
Bayesian performance evaluation with uncertainty quantification using the Bayes@N framework.
R:M × Ninteger matrix with entries in{0, ..., C}(outcomes for M questions over N trials)w: lengthC+1float vector of rubric weights mapping categories to scoresR0(optional):M × Dinteger matrix of prior outcomes- Returns:
(mu, sigma)- posterior estimate and uncertainty
Data and Shape Conventions
- Categories: Encode outcomes per trial as integers in
{0, ..., C} - Weights: Choose rubric weights
wof lengthC+1(e.g.,[0, 1]for binary outcomes) - Shapes:
RisM × N(M questions, N trials)R0isM × D(M questions, D prior trials)- Both must share the same
Mand category set
Requirements
Python
- Python 3.10+
- NumPy 2.0+
Julia
- Julia 1.6 or higher
Documentation
| APIs | Documentation | Status |
|---|---|---|
| Python | scorio.readthedocs.io | |
| Julia | mohsenhariri.github.io/scorio/julia |
Citation
If you use Scorio in your research, please cite the relevant papers:
Bayesian Evaluation Framework
@inproceedings{hariri2026don,
title={Don't Pass@k: A Bayesian Framework for Large Language Model Evaluation},
author={Hariri, Mohsen and Samandar, Amirhossein and Hinczewski, Michael and Chaudhary, Vipin},
booktitle={The Fourteenth International Conference on Learning Representations},
year={2026},
url={https://arxiv.org/abs/2510.04265}
}
Ranking Methods
@article{hariri2026ranking,
title={Ranking Reasoning LLMs under Test-Time Scaling},
author={Hariri, Mohsen and Hinczewski, Michael and Ma, Jing and Chaudhary, Vipin},
journal={arXiv preprint arXiv:2603.10960},
year={2026},
doi={10.48550/arXiv.2603.10960},
url={https://arxiv.org/abs/2603.10960}
}
License
This project is licensed under the MIT License - see the LICENSE file for details.
Links
- Landing Page: mohsenhariri.github.io/scorio
- Python Docs: scorio.readthedocs.io
- Julia Docs: mohsenhariri.github.io/scorio/julia
- Repository: github.com/mohsenhariri/scorio
- Issues: github.com/mohsenhariri/scorio/issues
- Papers:
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file scorio-0.2.1.tar.gz.
File metadata
- Download URL: scorio-0.2.1.tar.gz
- Upload date:
- Size: 96.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
674698db4daec1eecfb68c4e277b6279521781b7cf26c8ce659ff9790c1ac7b6
|
|
| MD5 |
ef9c4328102cd09bd1b6929c838d5476
|
|
| BLAKE2b-256 |
923375aad9940d278e2ca1967f1fdc72169a56e263cff9d6be2b89d2c0bffb95
|
File details
Details for the file scorio-0.2.1-py3-none-any.whl.
File metadata
- Download URL: scorio-0.2.1-py3-none-any.whl
- Upload date:
- Size: 111.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1d6f0e2d40ecf2f69df858c5e3a51feba8ee0b44b1c574ac6a10092077ffad6d
|
|
| MD5 |
a1eca1831322a142d59bab6dac3271ef
|
|
| BLAKE2b-256 |
a3ed1b12cdc7667cb6f7f75adabf1cabca44bb6621cb0f822aad97447a6384ec
|