Skip to main content

Evaluate language models using multiple choice items

Project description

LM Pub Quiz

Evaluate language models using multiple choice items

Build status PyPI - Version PyPI - Python Version License Code style: black


This library implements a knoweledge probing approach which uses LM's inherent ability to estimate the log-likelihood of any given textual statement. For more information visit the LM Pub Quiz website.

See also

Getting started

This short guide should get you started. For more detailed information visit the documentation.

Installing the Package

You can install the package via pip:

pip install lm-pub-quiz

or clone the repository and install the package using the -e flag to make changes to the source code:

pip install -e lm-pub-quiz  # Modify the path to the repository if necessary

For alternatives methods of installing the package, visit the documentation.

Example Usage

from lm_pub_quiz import Dataset, Evaluator

dataset_path = "<BEAR data path, e.g. ./transformer-knowledge-probe/data/BEAR>"
result_save_path = "<BEAR results save path>"
model_name = "gpt2"

# Load the BEAR dataset from its specific location
dataset = Dataset.from_path(dataset_path)

# Run the BEAR evaluator and save the results
evaluator = Evaluator.from_model(model_name, model_type="CLM", device="cuda")
results = evaluator.evaluate_dataset(dataset, save_path=result_save_path, batch_size=32)

Contributing

We welcome any questions, comments, or event PRs to this project to improve the package.

We use hatch to manage this project. To run the test cases, run hatch run test or hatch run all:test (to test on multiple python versions). In order to check the formatting and correct typing, run hatch run lint:all.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lm_pub_quiz-0.1.0.post2.tar.gz (42.6 kB view hashes)

Uploaded Source

Built Distribution

lm_pub_quiz-0.1.0.post2-py3-none-any.whl (34.3 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page