Skip to main content

Bayesian evaluation and ranking toolkit

Project description

Scorio

arXiv: Bayes Evaluation arXiv: Bayes Ranking ICLR 2026 License: MIT Python 3.10+ Julia 1.6+ Python Docs Julia Docs


Documentation

mohsenhariri.github.io/scorio

APIs Documentation Status
Python scorio.readthedocs.io ReadTheDocs
Julia mohsenhariri.github.io/scorio/julia GitHub Pages

News


Packages

This repository contains two packages:

  1. scorio - Python implementation
  2. Scorio.jl - Julia implementation

Quick Start

Python (scorio)

Installation

# Install from PyPI
pip install scorio

# Install latest from GitHub
pip install "git+https://github.com/mohsenhariri/scorio.git"

# Install a specific tag
pip install "git+https://github.com/mohsenhariri/scorio.git@v0.2.2"

# Install from local repository
pip install -e .

Basic Usage

import numpy as np
from scorio import eval

# Outcomes R: shape (M, N) with integer categories in {0, ..., C}
R = np.array([[0, 1, 2, 2, 1],
              [1, 1, 0, 2, 2]])

# Rubric weights w: length C+1
# Here: 0=incorrect(0.0), 1=partial(0.5), 2=correct(1.0)
w = np.array([0.0, 0.5, 1.0])

# Optional prior outcomes R0: shape (M, D)
R0 = np.array([[0, 2],
               [1, 2]])

# Bayesian evaluation with prior
mu, sigma = eval.bayes(R, w, R0)
print(f"μ = {mu:.6f}, σ = {sigma:.6f}")
# Expected: μ ≈ 0.575, σ ≈ 0.084275

# Bayesian evaluation without prior
mu2, sigma2 = eval.bayes(R, w)
print(f"μ = {mu2:.6f}, σ = {sigma2:.6f}")
# Expected: μ ≈ 0.5625, σ ≈ 0.091998

# Weighted average
accuracy, accuracy_sigma = eval.avg(R, w)
print(f"Average = {accuracy:.6f}, σ = {accuracy_sigma:.6f}")

Julia (Scorio.jl)

Installation

using Pkg

# From local development
Pkg.develop(path="./julia/Scorio.jl")

# Or from Julia General Registry
# Pkg.add("Scorio")

Basic Usage

using Scorio

# Outcomes R: shape (M, N) with integer categories in {0, ..., C}
R = [0 1 2 2 1;
     1 1 0 2 2]

# Rubric weights w: length C+1
# Here: 0=incorrect(0.0), 1=partial(0.5), 2=correct(1.0)
w = [0.0, 0.5, 1.0]

# Optional prior outcomes R0: shape (M, D)
R0 = [0 2;
      1 2]

# Bayesian evaluation with prior
mu, sigma = bayes(R, w, R0)
println("μ = $mu, σ = $sigma")
# Expected: μ ≈ 0.575, σ ≈ 0.084275

# Bayesian evaluation without prior
mu2, sigma2 = bayes(R, w)
println("μ = $mu2, σ = $sigma2")
# Expected: μ ≈ 0.5625, σ ≈ 0.091998

# Weighted average
accuracy, accuracy_sigma = avg(R, w)
println("Average = $accuracy, σ = $accuracy_sigma")

Evaluation Functions

bayes(R, w, R0=None)

Bayesian performance evaluation with uncertainty quantification using the Bayes@N framework.

  • R: M × N integer matrix with entries in {0, ..., C} (outcomes for M questions over N trials)
  • w: length C+1 float vector of rubric weights mapping categories to scores
  • R0 (optional): M × D integer matrix of prior outcomes
  • Returns: (mu, sigma) - posterior estimate and uncertainty

Data and Shape Conventions

  • Categories: Encode outcomes per trial as integers in {0, ..., C}
  • Weights: Choose rubric weights w of length C+1 (e.g., [0, 1] for binary outcomes)
  • Shapes:
    • R is M × N (M questions, N trials)
    • R0 is M × D (M questions, D prior trials)
    • Both must share the same M and category set

Requirements

Python

  • Python 3.10+
  • NumPy 2.0+

Julia

  • Julia 1.6 or higher

Citation

If you use Scorio in your research, please cite the relevant papers:

Bayesian Evaluation Framework

@inproceedings{hariri2026don,
  title={Don't Pass@k: A Bayesian Framework for Large Language Model Evaluation},
  author={Hariri, Mohsen and Samandar, Amirhossein and Hinczewski, Michael and Chaudhary, Vipin},
  booktitle={The Fourteenth International Conference on Learning Representations},
  year={2026},
  url={https://openreview.net/forum?id=PTXi3Ef4sT},
  doi={10.48550/arXiv.2510.04265}
}

Ranking Methods

@article{hariri2026ranking,
  title={Ranking Reasoning LLMs under Test-Time Scaling},
  author={Hariri, Mohsen and Hinczewski, Michael and Ma, Jing and Chaudhary, Vipin},
  journal={arXiv preprint arXiv:2603.10960},
  year={2026},
  doi={10.48550/arXiv.2603.10960},
  url={https://arxiv.org/abs/2603.10960}
}

License

This project is licensed under the MIT License - see the LICENSE file for details.


Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scorio-0.2.2.tar.gz (98.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

scorio-0.2.2-py3-none-any.whl (112.6 kB view details)

Uploaded Python 3

File details

Details for the file scorio-0.2.2.tar.gz.

File metadata

  • Download URL: scorio-0.2.2.tar.gz
  • Upload date:
  • Size: 98.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for scorio-0.2.2.tar.gz
Algorithm Hash digest
SHA256 7b2edaad7a42329c63f5fee7a62c8b8a41867c1cf766e13402e6e5fb3e45c550
MD5 9678a0d6527ab1744fd5c8e353902d5d
BLAKE2b-256 50165e1f680376708b2e41bca6313fcb1b35212bedbd3716de443ddb19c9f970

See more details on using hashes here.

Provenance

The following attestation bundles were made for scorio-0.2.2.tar.gz:

Publisher: python-publish.yml on mohsenhariri/scorio

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file scorio-0.2.2-py3-none-any.whl.

File metadata

  • Download URL: scorio-0.2.2-py3-none-any.whl
  • Upload date:
  • Size: 112.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for scorio-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 4f965e92cfa1824dcf5b5374475a6a2b46b4641aee5c5748408c97a285fac736
MD5 823638fccfd2250c085dd2278c3cb7c0
BLAKE2b-256 ebfc4b0f783217b3b2bf1ddff27f2e4b73e5ca5d888d65c355bc2c664ad939d8

See more details on using hashes here.

Provenance

The following attestation bundles were made for scorio-0.2.2-py3-none-any.whl:

Publisher: python-publish.yml on mohsenhariri/scorio

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page