Sample implementation of babymmlu benchmark
Project description
babymmlu
Implementation of utilities to measure babymmlu benchmark (see https://huggingface.co/datasets/ai-forever/baby_mmlu).
Methods
eval_parallel
Calculates babymmlu measures.
| Parameter | Optional | Type | Default value | Description |
|---|---|---|---|---|
| model | No | AutoModelForCausalLM | Model to evaluate. | |
| tokenizer | No | AutoTokenizer | Tokenizer used with model. | |
| dataset | Yes | Dataset (Optional) or str | ai-forever/baby_mmlu | Dataset to evaluate model on. |
| q_batch_size | Yes | int | 10 | Number of questions to process in parallel. |
Return value
The function returns a tuple with 3 elements: babymmlu measured be crossentropy-per-char, crossentropy-per-token and crossentropy-total.
load_model_and_tokenizer
Loads model and tokenizer from the same location.
| Parameter | Optional | Type | Description |
|---|---|---|---|
| path | No | str | Path to load model and tokenizer from. |
| use_cuda | Yes | bool | Whether to load model to cuda or to cpu. |
Return value
The function returns a tuple with 2 elements:
- model
- tokenizer
load_model
Loads model from the specified location.
| Parameter | Optional | Type | Description |
|---|---|---|---|
| model_path | No | str | Path to load model from. |
| use_cuda | Yes | bool | Whether to load model to cuda or to cpu. |
Return value
The function returns loaded model.
load_tokenizer
Loads tokenizer from the specified location.
| Parameter | Optional | Type | Description |
|---|---|---|---|
| tokenizer_path | No | str | Path to load tokenizer from. |
Return value
The function returns loaded tokenizer.
Example
import babymmlu
model, tokenizer = babymmlu.load_model_and_tokenizer('ai-forever/rugpt3small_based_on_gpt2')
result = babymmlu.eval_parallel(model, tokenizer)
print('babymmlu crossentropy-per-char, crossentropy-per-token and crossentropy-total', result)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file babymmlu-0.0.4.tar.gz.
File metadata
- Download URL: babymmlu-0.0.4.tar.gz
- Upload date:
- Size: 6.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.9.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
37211de6bb9de73bbd95cb3b67ad41097d2625340055dc6ac9092aaf824b560b
|
|
| MD5 |
628db49b9f764e96cf4459960fc667eb
|
|
| BLAKE2b-256 |
b40dfe713bd103a63468e387c86ae28c2e72ef35f4417daa58afbe88260af9dc
|
File details
Details for the file babymmlu-0.0.4-py3-none-any.whl.
File metadata
- Download URL: babymmlu-0.0.4-py3-none-any.whl
- Upload date:
- Size: 5.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.9.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7fdcee562b27d38e0ebd70ad45be5ed612af0abe47f9a326da2bd867d1f05bde
|
|
| MD5 |
822bf0cb480a8dd6f9ec5669666e6516
|
|
| BLAKE2b-256 |
d9d1d367027a97c565df89eb83c63ae4816d981c7d5add8dafbabfcfa136fcb1
|