A framework for evaluating language models
Project description
Scholar Evals (sevals)
This is built on Eleuther AI's LM Evaluation Harness but has:
- A simpler command-line interface
- A UI to visualize results and view model outputs (view example)
Installation
pip install sevals
API Keys
Go to usescholar.org/api-keys to get an API Key, then enter it into the sevals CLI when prompted.
Usage
sevals <model> <task> [options]
Examples
Mock/Dummy model:
sevals dummy gsm8k
Local model:
sevals ./path/to/model gsm8k
HuggingFace model:
sevals mistralai/Mistral-7B-v0.1 gsm8k
OpenAI API:
sevals gpt-3.5-turbo gsm8k
Tasks
Full list of tasks:
sevals --list_tasks
Documentation
% sevals --help
usage: sevals [-h] [--model_args MODEL_ARGS] [--gen_kwargs GEN_KWARGS] [--list_tasks [search string]] [--list_projects] [-p PROJECT] [--num_fewshot NUM_FEWSHOT] [--batch_size BATCH_SIZE]
[-o [dir/file.jsonl] [DIR]] [--include_path INCLUDE_PATH] [--verbose]
[model] [tasks]
positional arguments:
model Model name from HuggingFace or OpenAI, or a path to a local model that can be loaded using `transformers.AutoConfig.from_pretrained`.
E.g.:
- HuggingFace Model: mistralai/Mistral-7B-v0.1
- OpenAI Model: gpt-3
- Local Model: ./path/to/model
tasks To get full list of tasks, use the command sevals --list_tasks
optional arguments:
-h, --help show this help message and exit
--model_args MODEL_ARGS
String arguments for model, e.g. 'dtype=float32'
--gen_kwargs GEN_KWARGS
String arguments for model generation on greedy_until tasks, e.g. `temperature=0,top_k=0,top_p=0`
--list_tasks [search string]
List all available tasks, that optionally match a search string, and exit.
--list_projects List all projects you have on Scholar, and exit.
-p PROJECT, --project PROJECT
ID of Scholar project to store runs/results in.
--num_fewshot NUM_FEWSHOT
Number of examples in few-shot context
--batch_size BATCH_SIZE
-o [dir/file.jsonl] [DIR], --output_path [dir/file.jsonl] [DIR]
The path to the output file where the result metrics will be saved. If the path is a directory, the results will be saved in the directory. Else the parent directory will be used.
--include_path INCLUDE_PATH
Additional path to include if there are external tasks to include.
--verbose Whether to print verbose/detailed logs.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
sevals-0.0.3.tar.gz
(638.9 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sevals-0.0.3.tar.gz.
File metadata
- Download URL: sevals-0.0.3.tar.gz
- Upload date:
- Size: 638.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.16
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c498bec40c39fdb67ab30fc70564e7d37cc54874c323cb7571ae30115a16ddf0
|
|
| MD5 |
04f680358f5e949c91b73fd6b3a587f6
|
|
| BLAKE2b-256 |
a0976177c04d900370214a5d6ba9dcb8da87319471ca589f5b0729469c2b0727
|
File details
Details for the file sevals-0.0.3-py3-none-any.whl.
File metadata
- Download URL: sevals-0.0.3-py3-none-any.whl
- Upload date:
- Size: 1.3 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.16
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bacf6ba311d6ed8c145453c0462f473ebcba91e829c2079d571ecf5eda6da31a
|
|
| MD5 |
9558f70d293349a855d505ce14d19ea7
|
|
| BLAKE2b-256 |
2ef7998d46b57b40ed92205e1b2bef407127257f40c213d14257512e6e6b7c5b
|