Skip to main content

A framework for evaluating language models

Project description

Scholar Evals (sevals)

This is built on Eleuther AI's LM Evaluation Harness but has:

  1. A simpler command-line interface
  2. A UI to visualize results and view model outputs (view example)
Screenshot 2023-12-20 at 7 48 32 PM Screenshot 2023-12-20 at 7 49 12 PM

Installation

pip install sevals

API Keys

Go to usescholar.org/api-keys to get an API Key, then enter it into the sevals CLI when prompted.

Usage

sevals <model> <task> [options]

Examples

Mock/Dummy model:

sevals dummy gsm8k

Local model:

sevals ./path/to/model gsm8k

HuggingFace model:

sevals mistralai/Mistral-7B-v0.1 gsm8k

OpenAI API:

sevals gpt-3.5-turbo gsm8k

Tasks

Full list of tasks:

sevals --list_tasks

Documentation

% sevals --help
usage: sevals [-h] [--model_args MODEL_ARGS] [--gen_kwargs GEN_KWARGS] [--list_tasks [search string]] [--list_projects] [-p PROJECT] [--num_fewshot NUM_FEWSHOT] [--batch_size BATCH_SIZE]
              [-o [dir/file.jsonl] [DIR]] [--include_path INCLUDE_PATH] [--verbose]
              [model] [tasks]

positional arguments:
  model                 Model name from HuggingFace or OpenAI, or a path to a local model that can be loaded using `transformers.AutoConfig.from_pretrained`.
                        E.g.:
                        - HuggingFace Model: mistralai/Mistral-7B-v0.1
                        - OpenAI Model: gpt-3
                        - Local Model: ./path/to/model
  tasks                 To get full list of tasks, use the command sevals --list_tasks

optional arguments:
  -h, --help            show this help message and exit
  --model_args MODEL_ARGS
                        String arguments for model, e.g. 'dtype=float32'
  --gen_kwargs GEN_KWARGS
                        String arguments for model generation on greedy_until tasks, e.g. `temperature=0,top_k=0,top_p=0`
  --list_tasks [search string]
                        List all available tasks, that optionally match a search string, and exit.
  --list_projects       List all projects you have on Scholar, and exit.
  -p PROJECT, --project PROJECT
                        ID of Scholar project to store runs/results in.
  --num_fewshot NUM_FEWSHOT
                        Number of examples in few-shot context
  --batch_size BATCH_SIZE
  -o [dir/file.jsonl] [DIR], --output_path [dir/file.jsonl] [DIR]
                        The path to the output file where the result metrics will be saved. If the path is a directory, the results will be saved in the directory. Else the parent directory will be used.
  --include_path INCLUDE_PATH
                        Additional path to include if there are external tasks to include.
  --verbose             Whether to print verbose/detailed logs.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sevals-0.0.3.tar.gz (638.9 kB view hashes)

Uploaded Source

Built Distribution

sevals-0.0.3-py3-none-any.whl (1.3 MB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page