A framework for evaluating language models
Project description
Scholar Eval (sevals)
This is built on Eleuther AI's LM Evaluation Harness but has:
- A simpler command-line interface
- A UI to visualize results and view model outputs
Installation
pip install sevals
Usage
sevals <model> <task> [options]
Examples
Mock/Dummy model:
sevals dummy lambada_openai
Local model:
sevals ./path/to/model lambada_openai
HuggingFace model:
sevals hf mistralai/Mistral-7B-v0.1 lambada_openai
OpenAI API:
sevals gpt-3.5-turbo lambada_openai
Tasks
Full list of tasks:
sevals --list-tasks
Documentation
% sevals --help
usage: sevals [-h] [--model_args MODEL_ARGS] [--gen_kwargs GEN_KWARGS] [--list-tasks [search string]] [--list-projects] [-p PROJECT] [--num_fewshot NUM_FEWSHOT] [--batch_size BATCH_SIZE]
[-o [dir/file.jsonl] [DIR]] [--include_path INCLUDE_PATH] [--verbose]
[model] [tasks]
positional arguments:
model Model name from HuggingFace or OpenAI, or a path to a local model that can be loaded using `transformers.AutoConfig.from_pretrained`.
E.g.:
- HuggingFace Model: mistralai/Mistral-7B-v0.1
- OpenAI Model: gpt-3
- Local Model: ./path/to/model
tasks To get full list of tasks, use the command sevals --list-tasks
optional arguments:
-h, --help show this help message and exit
--model_args MODEL_ARGS
String arguments for model, e.g. 'dtype=float32'
--gen_kwargs GEN_KWARGS
String arguments for model generation on greedy_until tasks, e.g. `temperature=0,top_k=0,top_p=0`
--list-tasks [search string]
List all available tasks, that optionally match a search string, and exit.
--list-projects List all projects you have on Scholar, and exit.
-p PROJECT, --project PROJECT
ID of Scholar project to store runs/results in.
--num_fewshot NUM_FEWSHOT
Number of examples in few-shot context
--batch_size BATCH_SIZE
-o [dir/file.jsonl] [DIR], --output_path [dir/file.jsonl] [DIR]
The path to the output file where the result metrics will be saved. If the path is a directory, the results will be saved in the directory. Else the parent directory will be used.
--include_path INCLUDE_PATH
Additional path to include if there are external tasks to include.
--verbose Whether to print verbose/detailed logs.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
sevals-0.0.2.post1.tar.gz
(638.9 kB
view hashes)
Built Distribution
Close
Hashes for sevals-0.0.2.post1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | eaedd5f281574a73544d10ae515f1d0ef839d7c364c52ed74b4a72f97e611aba |
|
MD5 | 42b263a5c69aa574f7f627f7a24d5868 |
|
BLAKE2b-256 | 17a76f42dddd464fe73e16f4b29d875e81adf62edae7dbc383eb64157580cad7 |