Skip to main content

Bayesian evaluation and ranking toolkit

Project description

scorio

scorio implements the Bayes@N framework introduced in Don't Pass@k: A Bayesian Framework for Large Language Model Evaluation

arXiv PyPI version Python versions License: MIT Documentation


Installation

pip install scorio

Requires Python 3.9–3.13 and NumPy.

Data and shape conventions

  • Categories: encode outcomes per trial as integers in {0, ..., C}.
  • Weights: choose rubric weights w of length C+1 (e.g., [0, 1] for binary R).
  • Shapes: R is M x N, R0 is M x D (if provided); both must share the same M and category set.

APIs

  • scorio.eval.bayes(R, w, R0=None) -> (mu: float, sigma: float)

    • R: M x N int array with entries in {0, ..., C}
    • w: length C+1 float array of rubric weights
    • R0 (optional): M x D int array of prior outcomes (same category set as R)
    • Returns posterior estimate mu of the rubric-weighted performance and its uncertainty sigma.
  • scorio.eval.avg(R) -> float

    • Returns the naive mean of elements in R. For binary accuracy, encode incorrect=0, correct=1.

How to use

import numpy as np
from scorio.eval import bayes

# Outcomes R: shape (M, N) with integer categories in {0, ..., C}
R = np.array([
    [0, 1, 2, 2, 1],   # Item 1, N=5 trials
    [1, 1, 0, 2, 2],   # Item 2, N=5 trials
])

# Rubric weights w: length C+1. Here: 0=incorrect, 1=partial(0.5), 2=correct(1.0)
w = np.array([0.0, 0.5, 1.0])

# Optional prior outcomes R0: shape (M, D). If omitted, D=0.
R0 = np.array([
    [0, 2],
    [1, 2],
])

# With prior (D=2 → T=10)
mu, sigma = bayes(R, w, R0)
print(mu, sigma)      # expected ~ (0.575, 0.084275)

# Without prior (D=0 → T=8)
mu2, sigma2 = bayes(R, w)
print(mu2, sigma2)    # expected ~ (0.5625, 0.091998)

Citing

If you use scorio or Bayes@N, please cite:

@article{hariri2025dontpassk,
  title   = {Don't Pass@k: A Bayesian Framework for Large Language Model Evaluation},
  author  = {Hariri, Mohsen and Samandar, Amirhossein and Hinczewski, Michael and Chaudhary, Vipin},
  journal={arXiv preprint arXiv:2510.04265},
  year    = {2025},
  url     = {https://scorio.readthedocs.io/}
}

License

MIT License. See the LICENSE file for details.

Support

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scorio-0.1.0.tar.gz (12.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

scorio-0.1.0-py3-none-any.whl (8.6 kB view details)

Uploaded Python 3

File details

Details for the file scorio-0.1.0.tar.gz.

File metadata

  • Download URL: scorio-0.1.0.tar.gz
  • Upload date:
  • Size: 12.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.7

File hashes

Hashes for scorio-0.1.0.tar.gz
Algorithm Hash digest
SHA256 418b1e040a9c73b968aa991aa243174a266a42c6843abc2f754fefcafe3fc4a6
MD5 bda26f5ad406fcbf31763c161eaa694f
BLAKE2b-256 e83eaf52d1ade7c8def033f454b44735f20ad5bcc32e7ded85501efaaa48e53f

See more details on using hashes here.

File details

Details for the file scorio-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: scorio-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 8.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.7

File hashes

Hashes for scorio-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 80e184daf998ea5bc4fdc1eca1f609167db18605f5a86f03f72beade9cbc47d4
MD5 17c2be1026fa20cefe8aeec8fce64ff3
BLAKE2b-256 cd6371e25e09304588ac142ff437d248a358bb3d3447605311eaca0b4dbb83dd

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page