Skip to main content

A minimalistic Python library to evaluate the pass-at-k metric

Project description

pass-at-k

A minimalistic Python library to evaluate the pass-at-k metric

The "pass-at-k" metric is used to evaluate the performance of predictive models, such as Large Language Models (LLMs). This metric measures the likelihood that the correct answer or an acceptable solution is included within the top 'k' predictions or attempts made by the model. It acknowledges that in practical applications, a user may be willing to consider multiple outputs provided by the model to find a satisfactory answer.

Installation

Install this library by adding it to your requirements.txt, and then running pip install -r requirements.txt.

Alternatively, install this package directly by running pip install pass-at-k.

Usage

from pass_at_k import pass_at_k

pass_at_5 = pass_at_k(
    num_total_samples_n=8,
    num_correct_samples_c=3,
    k=5,
)
# pass_at_5 approx. equals 0.9821

pass_at_1 = pass_at_k(8, 3, 1)
# pass_at_1 approx. equals 0.375

Citation

The pass-at-k metric was first presented in Evaluating Large Language Models Trained on Code:

@misc{chen_evaluating_2021,
	title = {Evaluating {Large} {Language} {Models} {Trained} on {Code}},
	url = {http://arxiv.org/abs/2107.03374},
	abstract = {We introduce Codex, a GPT language model fine-tuned on publicly available code from GitHub, and study its Python code-writing capabilities. A distinct production version of Codex powers GitHub Copilot. On HumanEval, a new evaluation set we release to measure functional correctness for synthesizing programs from docstrings, our model solves 28.8\% of the problems, while GPT-3 solves 0\% and GPT-J solves 11.4\%. Furthermore, we find that repeated sampling from the model is a surprisingly effective strategy for producing working solutions to difficult prompts. Using this method, we solve 70.2\% of our problems with 100 samples per problem. Careful investigation of our model reveals its limitations, including difficulty with docstrings describing long chains of operations and with binding operations to variables. Finally, we discuss the potential broader impacts of deploying powerful code generation technologies, covering safety, security, and economics.},
	urldate = {2023-11-10},
	publisher = {arXiv},
	author = {Chen, Mark and Tworek, Jerry and Jun, Heewoo and Yuan, Qiming and Pinto, Henrique Ponde de Oliveira and Kaplan, Jared and Edwards, Harri and Burda, Yuri and Joseph, Nicholas and Brockman, Greg and Ray, Alex and Puri, Raul and Krueger, Gretchen and Petrov, Michael and Khlaaf, Heidy and Sastry, Girish and Mishkin, Pamela and Chan, Brooke and Gray, Scott and Ryder, Nick and Pavlov, Mikhail and Power, Alethea and Kaiser, Lukasz and Bavarian, Mohammad and Winter, Clemens and Tillet, Philippe and Such, Felipe Petroski and Cummings, Dave and Plappert, Matthias and Chantzis, Fotios and Barnes, Elizabeth and Herbert-Voss, Ariel and Guss, William Hebgen and Nichol, Alex and Paino, Alex and Tezak, Nikolas and Tang, Jie and Babuschkin, Igor and Balaji, Suchir and Jain, Shantanu and Saunders, William and Hesse, Christopher and Carr, Andrew N. and Leike, Jan and Achiam, Josh and Misra, Vedant and Morikawa, Evan and Radford, Alec and Knight, Matthew and Brundage, Miles and Murati, Mira and Mayer, Katie and Welinder, Peter and McGrew, Bob and Amodei, Dario and McCandlish, Sam and Sutskever, Ilya and Zaremba, Wojciech},
	month = jul,
	year = {2021},
	note = {arXiv:2107.03374 [cs]},
	keywords = {Computer Science - Machine Learning},
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pass_at_k-0.0.2.tar.gz (6.0 kB view details)

Uploaded Source

Built Distribution

pass_at_k-0.0.2-py3-none-any.whl (4.3 kB view details)

Uploaded Python 3

File details

Details for the file pass_at_k-0.0.2.tar.gz.

File metadata

  • Download URL: pass_at_k-0.0.2.tar.gz
  • Upload date:
  • Size: 6.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.11.9

File hashes

Hashes for pass_at_k-0.0.2.tar.gz
Algorithm Hash digest
SHA256 3a3b08389837b182753324fbb763ba16e93df0151cb000f4670f73c77f1fb62c
MD5 c642eb457746a0e147291514fda61ae3
BLAKE2b-256 4d57f468c8104ac8780f7be659d3c9557eaa3c7b722d553a09566fcda95aebe4

See more details on using hashes here.

File details

Details for the file pass_at_k-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: pass_at_k-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 4.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.11.9

File hashes

Hashes for pass_at_k-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 1b8c13320404bd7fda0c4144688da0eed0a2ff87b7f34da4197afc804a4861de
MD5 ea0b4775ccabfaac127c2914e9a5fcf8
BLAKE2b-256 5587d316a47ce6f4864bda637a85ed9b485c2d972489ed346f4daa99a7926192

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page