Skip to main content

Sample implementation of babymmlu benchmark

Project description

babymmlu

Implementation of utilities to measure babymmlu benchmark (see https://huggingface.co/datasets/ai-forever/baby_mmlu).

Methods

eval_parallel

Calculates babymmlu measures.

Parameter Optional Type Default value Description
model No AutoModelForCausalLM Model to evaluate.
tokenizer No AutoTokenizer Tokenizer used with model.
dataset Yes Dataset (Optional) or str ai-forever/baby_mmlu Dataset to evaluate model on.
q_batch_size Yes int 10 Number of questions to process in parallel.

Return value

The function returns a tuple with 3 elements: babymmlu measured be crossentropy-per-char, crossentropy-per-token and crossentropy-total.

load_model_and_tokenizer

Loads model and tokenizer from the same location.

Parameter Optional Type Description
path No str Path to load model and tokenizer from.
use_cuda Yes bool Whether to load model to cuda or to cpu.

Return value

The function returns a tuple with 2 elements:

  • model
  • tokenizer

load_model

Loads model from the specified location.

Parameter Optional Type Description
model_path No str Path to load model from.
use_cuda Yes bool Whether to load model to cuda or to cpu.

Return value

The function returns loaded model.

load_tokenizer

Loads tokenizer from the specified location.

Parameter Optional Type Description
tokenizer_path No str Path to load tokenizer from.

Return value

The function returns loaded tokenizer.

Example

import babymmlu
model, tokenizer = babymmlu.load_model_and_tokenizer('ai-forever/rugpt3small_based_on_gpt2')
result = babymmlu.eval_parallel(model, tokenizer)
print('babymmlu crossentropy-per-char, crossentropy-per-token and crossentropy-total', result)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

babymmlu-0.0.4.tar.gz (6.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

babymmlu-0.0.4-py3-none-any.whl (5.6 kB view details)

Uploaded Python 3

File details

Details for the file babymmlu-0.0.4.tar.gz.

File metadata

  • Download URL: babymmlu-0.0.4.tar.gz
  • Upload date:
  • Size: 6.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.5

File hashes

Hashes for babymmlu-0.0.4.tar.gz
Algorithm Hash digest
SHA256 37211de6bb9de73bbd95cb3b67ad41097d2625340055dc6ac9092aaf824b560b
MD5 628db49b9f764e96cf4459960fc667eb
BLAKE2b-256 b40dfe713bd103a63468e387c86ae28c2e72ef35f4417daa58afbe88260af9dc

See more details on using hashes here.

File details

Details for the file babymmlu-0.0.4-py3-none-any.whl.

File metadata

  • Download URL: babymmlu-0.0.4-py3-none-any.whl
  • Upload date:
  • Size: 5.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.5

File hashes

Hashes for babymmlu-0.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 7fdcee562b27d38e0ebd70ad45be5ed612af0abe47f9a326da2bd867d1f05bde
MD5 822bf0cb480a8dd6f9ec5669666e6516
BLAKE2b-256 d9d1d367027a97c565df89eb83c63ae4816d981c7d5add8dafbabfcfa136fcb1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page