Skip to main content

Evaluate language models using multiple choice items

Reason this release was yanked:

Incorrect supported python versions

Project description

LM Pub Quiz

Evaluate language models using multiple choice items

Build status PyPI - Version PyPI - Python Version License


This library implements a knoweledge probing approach which uses LM's inherent ability to estimate the log-likelihood of any given textual statement. For more information visit the LM Pub Quiz website.

See also

Getting started

This short guide should get you started. For more detailed information visit the documentation.

Installing the Package

You can install the package via pip:

pip install lm-pub-quiz

For alternatives methods of installing the package, visit the documentation.

Example Usage

from lm_pub_quiz import Dataset, Evaluator

# Load the dataset
dataset = Dataset.from_name("BEAR")

# Load the model
evaluator = Evaluator.from_model(
    "gpt2",
    model_type="CLM",
)

# Run the evaluation and save the
results = evaluator.evaluate_dataset(
    dataset,
    template_index=0,
    save_path="gpt2_results",
    batch_size=32,
)

# If the results are analyzed in a different session, they can be loaded from the file system
# results = DatasetResults.from_path("gpt2_results")

print("=== Overall score ===")
print(results.get_metrics("accuracy"))

Contributing

We welcome any questions, comments, or even PRs to this project to improve the package.

We use hatch to manage this project. For the most comfortable development experience, please first install hatch using pip or pipx.

Then, to propose a change to the library,

  • test your code locally using hatch run all:test
  • format the code according to our formatting guidelines using hatch run lint:fmt,
  • check type- and style-consistency using hatch run lint:all, and
  • finally create a pull request describing the changes you propose.

For work on the documentation, use hatch run serve-docs to run a local documentation server.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lm_pub_quiz-0.3.0.tar.gz (1.8 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

lm_pub_quiz-0.3.0-py3-none-any.whl (40.3 kB view details)

Uploaded Python 3

File details

Details for the file lm_pub_quiz-0.3.0.tar.gz.

File metadata

  • Download URL: lm_pub_quiz-0.3.0.tar.gz
  • Upload date:
  • Size: 1.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.0.1 CPython/3.12.8

File hashes

Hashes for lm_pub_quiz-0.3.0.tar.gz
Algorithm Hash digest
SHA256 5fe7335a026434c9c3f682f148fe0594b396d93bc252ec7fd7e5c2b2180acc35
MD5 6f4b4ba00ca635480a0cacc5973b3540
BLAKE2b-256 d63a42faf7ffa139dd109df6556a98e8b94a8c89e2c733e41cf0b2c75ec28149

See more details on using hashes here.

Provenance

The following attestation bundles were made for lm_pub_quiz-0.3.0.tar.gz:

Publisher: publish.yml on lm-pub-quiz/lm-pub-quiz

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file lm_pub_quiz-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: lm_pub_quiz-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 40.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.0.1 CPython/3.12.8

File hashes

Hashes for lm_pub_quiz-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ae50ff2e3a737d0c83601d945fc9b3b4a13dea78ea8f0214b5180eb365928af6
MD5 fb77575bade63d4d2b46761aed2d832a
BLAKE2b-256 b1570fbb39cb2b414d2d6f852125770cb1169a1b72f4f3b99fed5f4f6333bceb

See more details on using hashes here.

Provenance

The following attestation bundles were made for lm_pub_quiz-0.3.0-py3-none-any.whl:

Publisher: publish.yml on lm-pub-quiz/lm-pub-quiz

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page