Skip to main content

Bayesian evaluation and ranking toolkit

Project description

Scorio

arXiv (Bayes Evaluation) arXiv (Bayes Ranking) ICLR 2026 PyPI version Python versions License: MIT Python 3.10+ Julia 1.6+ Python Docs Julia Docs


News


Packages

This repository contains two packages:

  1. scorio - Python implementation
  2. Scorio.jl - Julia implementation

Quick Start

Python (scorio)

Installation

# Install from PyPI
pip install scorio

# Install latest from GitHub
pip install "git+https://github.com/mohsenhariri/scorio.git"

# Install a specific tag
pip install "git+https://github.com/mohsenhariri/scorio.git@v0.2.0"

# Install from local repository
pip install -e .

Basic Usage

import numpy as np
from scorio import eval

# Outcomes R: shape (M, N) with integer categories in {0, ..., C}
R = np.array([[0, 1, 2, 2, 1],
              [1, 1, 0, 2, 2]])

# Rubric weights w: length C+1
# Here: 0=incorrect(0.0), 1=partial(0.5), 2=correct(1.0)
w = np.array([0.0, 0.5, 1.0])

# Optional prior outcomes R0: shape (M, D)
R0 = np.array([[0, 2],
               [1, 2]])

# Bayesian evaluation with prior
mu, sigma = eval.bayes(R, w, R0)
print(f"μ = {mu:.6f}, σ = {sigma:.6f}")
# Expected: μ ≈ 0.575, σ ≈ 0.084275

# Bayesian evaluation without prior
mu2, sigma2 = eval.bayes(R, w)
print(f"μ = {mu2:.6f}, σ = {sigma2:.6f}")
# Expected: μ ≈ 0.5625, σ ≈ 0.091998

# Simple average
accuracy = eval.avg(R)
print(f"Average: {accuracy:.6f}")

Julia (Scorio.jl)

Installation

using Pkg

# From local development
Pkg.develop(path="./julia/Scorio.jl")

# Or from Julia General Registry
# Pkg.add("Scorio")

Basic Usage

using Scorio

# Outcomes R: shape (M, N) with integer categories in {0, ..., C}
R = [0 1 2 2 1;
     1 1 0 2 2]

# Rubric weights w: length C+1
# Here: 0=incorrect(0.0), 1=partial(0.5), 2=correct(1.0)
w = [0.0, 0.5, 1.0]

# Optional prior outcomes R0: shape (M, D)
R0 = [0 2;
      1 2]

# Bayesian evaluation with prior
mu, sigma = bayes(R, w, R0)
println("μ = $mu, σ = $sigma")
# Expected: μ ≈ 0.575, σ ≈ 0.084275

# Bayesian evaluation without prior
mu2, sigma2 = bayes(R, w)
println("μ = $mu2, σ = $sigma2")
# Expected: μ ≈ 0.5625, σ ≈ 0.091998

# Simple average
accuracy = avg(R)
println("Average: $accuracy")

Evaluation Functions

bayes(R, w, R0=None)

Bayesian performance evaluation with uncertainty quantification using the Bayes@N framework.

  • R: M × N integer matrix with entries in {0, ..., C} (outcomes for M questions over N trials)
  • w: length C+1 float vector of rubric weights mapping categories to scores
  • R0 (optional): M × D integer matrix of prior outcomes
  • Returns: (mu, sigma) - posterior estimate and uncertainty

Data and Shape Conventions

  • Categories: Encode outcomes per trial as integers in {0, ..., C}
  • Weights: Choose rubric weights w of length C+1 (e.g., [0, 1] for binary outcomes)
  • Shapes:
    • R is M × N (M questions, N trials)
    • R0 is M × D (M questions, D prior trials)
    • Both must share the same M and category set

Requirements

Python

  • Python 3.10+
  • NumPy 2.0+

Julia

  • Julia 1.6 or higher

Documentation

mohsenhariri.github.io/scorio

APIs Documentation Status
Python scorio.readthedocs.io ReadTheDocs
Julia mohsenhariri.github.io/scorio/julia GitHub Pages

Citation

If you use Scorio in your research, please cite the relevant papers:

Bayesian Evaluation Framework

@inproceedings{hariri2026don,
  title={Don't Pass@k: A Bayesian Framework for Large Language Model Evaluation},
  author={Hariri, Mohsen and Samandar, Amirhossein and Hinczewski, Michael and Chaudhary, Vipin},
  booktitle={The Fourteenth International Conference on Learning Representations},
  year={2026},
  url={https://arxiv.org/abs/2510.04265}
}

Ranking Methods

@article{hariri2026ranking,
  title={Ranking Reasoning LLMs under Test-Time Scaling},
  author={Hariri, Mohsen and Hinczewski, Michael and Ma, Jing and Chaudhary, Vipin},
  journal={arXiv preprint arXiv:2603.10960},
  year={2026},
  doi={10.48550/arXiv.2603.10960},
  url={https://arxiv.org/abs/2603.10960}
}

License

This project is licensed under the MIT License - see the LICENSE file for details.


Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scorio-0.2.1.tar.gz (96.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

scorio-0.2.1-py3-none-any.whl (111.1 kB view details)

Uploaded Python 3

File details

Details for the file scorio-0.2.1.tar.gz.

File metadata

  • Download URL: scorio-0.2.1.tar.gz
  • Upload date:
  • Size: 96.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for scorio-0.2.1.tar.gz
Algorithm Hash digest
SHA256 674698db4daec1eecfb68c4e277b6279521781b7cf26c8ce659ff9790c1ac7b6
MD5 ef9c4328102cd09bd1b6929c838d5476
BLAKE2b-256 923375aad9940d278e2ca1967f1fdc72169a56e263cff9d6be2b89d2c0bffb95

See more details on using hashes here.

File details

Details for the file scorio-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: scorio-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 111.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for scorio-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 1d6f0e2d40ecf2f69df858c5e3a51feba8ee0b44b1c574ac6a10092077ffad6d
MD5 a1eca1831322a142d59bab6dac3271ef
BLAKE2b-256 a3ed1b12cdc7667cb6f7f75adabf1cabca44bb6621cb0f822aad97447a6384ec

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page