Skip to main content

A framework for evaluating language models

Project description

Scholar Evals (sevals)

This is built on Eleuther AI's LM Evaluation Harness but has:

  1. A simpler command-line interface
  2. A UI to visualize results and view model outputs (view example)
Screenshot 2023-12-20 at 7 48 32 PM Screenshot 2023-12-20 at 7 49 12 PM

Installation

pip install sevals

API Keys

Go to usescholar.org/api-keys to get an API Key, then enter it into the sevals CLI when prompted.

Usage

sevals <model> <task> [options]

Examples

Mock/Dummy model:

sevals dummy gsm8k

Local model:

sevals ./path/to/model gsm8k

HuggingFace model:

sevals mistralai/Mistral-7B-v0.1 gsm8k

OpenAI API:

sevals gpt-3.5-turbo gsm8k

Tasks

Full list of tasks:

sevals --list_tasks

Documentation

% sevals --help
usage: sevals [-h] [--model_args MODEL_ARGS] [--gen_kwargs GEN_KWARGS] [--list_tasks [search string]] [--list_projects] [-p PROJECT] [--num_fewshot NUM_FEWSHOT] [--batch_size BATCH_SIZE]
              [-o [dir/file.jsonl] [DIR]] [--include_path INCLUDE_PATH] [--verbose]
              [model] [tasks]

positional arguments:
  model                 Model name from HuggingFace or OpenAI, or a path to a local model that can be loaded using `transformers.AutoConfig.from_pretrained`.
                        E.g.:
                        - HuggingFace Model: mistralai/Mistral-7B-v0.1
                        - OpenAI Model: gpt-3
                        - Local Model: ./path/to/model
  tasks                 To get full list of tasks, use the command sevals --list_tasks

optional arguments:
  -h, --help            show this help message and exit
  --model_args MODEL_ARGS
                        String arguments for model, e.g. 'dtype=float32'
  --gen_kwargs GEN_KWARGS
                        String arguments for model generation on greedy_until tasks, e.g. `temperature=0,top_k=0,top_p=0`
  --list_tasks [search string]
                        List all available tasks, that optionally match a search string, and exit.
  --list_projects       List all projects you have on Scholar, and exit.
  -p PROJECT, --project PROJECT
                        ID of Scholar project to store runs/results in.
  --num_fewshot NUM_FEWSHOT
                        Number of examples in few-shot context
  --batch_size BATCH_SIZE
  -o [dir/file.jsonl] [DIR], --output_path [dir/file.jsonl] [DIR]
                        The path to the output file where the result metrics will be saved. If the path is a directory, the results will be saved in the directory. Else the parent directory will be used.
  --include_path INCLUDE_PATH
                        Additional path to include if there are external tasks to include.
  --verbose             Whether to print verbose/detailed logs.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sevals-0.0.3.tar.gz (638.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sevals-0.0.3-py3-none-any.whl (1.3 MB view details)

Uploaded Python 3

File details

Details for the file sevals-0.0.3.tar.gz.

File metadata

  • Download URL: sevals-0.0.3.tar.gz
  • Upload date:
  • Size: 638.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.16

File hashes

Hashes for sevals-0.0.3.tar.gz
Algorithm Hash digest
SHA256 c498bec40c39fdb67ab30fc70564e7d37cc54874c323cb7571ae30115a16ddf0
MD5 04f680358f5e949c91b73fd6b3a587f6
BLAKE2b-256 a0976177c04d900370214a5d6ba9dcb8da87319471ca589f5b0729469c2b0727

See more details on using hashes here.

File details

Details for the file sevals-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: sevals-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 1.3 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.16

File hashes

Hashes for sevals-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 bacf6ba311d6ed8c145453c0462f473ebcba91e829c2079d571ecf5eda6da31a
MD5 9558f70d293349a855d505ce14d19ea7
BLAKE2b-256 2ef7998d46b57b40ed92205e1b2bef407127257f40c213d14257512e6e6b7c5b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page