Skip to main content

Sample implementation of babymmlu benchmark

Project description

babymmlu

Implementation of utilities to measure babymmlu benchmark (see https://huggingface.co/datasets/ai-forever/baby_mmlu).

Methods

eval_parallel

Calculates babymmlu measures.

Parameter Optional Type Default value Description
model No AutoModelForCausalLM Model to evaluate.
tokenizer No AutoTokenizer Tokenizer used with model.
dataset Yes Dataset (Optional) or str ai-forever/baby_mmlu Dataset to evaluate model on.
q_batch_size Yes int 10 Number of questions to process in parallel.

Return value

The function returns a tuple with 3 elements: babymmlu measured be crossentropy-per-char, crossentropy-per-token and crossentropy-total.

load_model_and_tokenizer

Loads model and tokenizer from the same location.

Parameter Optional Type Description
path No str Path to load model and tokenizer from.
use_cuda Yes bool Whether to load model to cuda or to cpu.

Return value

The function returns a tuple with 2 elements:

  • model
  • tokenizer

load_model

Loads model from the specified location.

Parameter Optional Type Description
model_path No str Path to load model from.
use_cuda Yes bool Whether to load model to cuda or to cpu.

Return value

The function returns loaded model.

load_tokenizer

Loads tokenizer from the specified location.

Parameter Optional Type Description
tokenizer_path No str Path to load tokenizer from.

Return value

The function returns loaded tokenizer.

Example

import babymmlu
model, tokenizer = babymmlu.load_model_and_tokenizer('ai-forever/rugpt3small_based_on_gpt2')
result = babymmlu.eval_parallel(model, tokenizer)
print('babymmlu crossentropy-per-char, crossentropy-per-token and crossentropy-total', result)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

babymmlu-0.0.4.tar.gz (6.0 kB view hashes)

Uploaded Source

Built Distribution

babymmlu-0.0.4-py3-none-any.whl (5.6 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page