Visualize OpenAI evals with Zeno
Project description
Zeno 🤝 OpenAI Evals
Use Zeno to visualize the results of OpenAI Evals.
https://user-images.githubusercontent.com/4563691/225655166-9fd82784-cf35-47c1-8306-96178cdad7c1.mov
Example using zeno-evals
to explore the results of an OpenAI eval on multiple choice medicine questions (MedMCQA)
Usage
pip install zeno-evals
Run an evaluation following the evals instructions. This will produce a cache file in /tmp/evallogs/
.
Pass this file to the zeno-evals
command:
zeno-evals /tmp/evallogs/my_eval_cache.jsonl
Example
We include an example looking at the MedMCQA dataset (Thanks to @SinanAkkoyun):
zeno-evals ./example_medicine/example.jsonl --functions_file=./example_medicine/distill.py
Todo
- Support model-graded evaluations
- Support custom evaluation templates (e.g. BLEU for translation)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
zeno_evals-0.1.3.tar.gz
(4.3 kB
view hashes)
Built Distribution
Close
Hashes for zeno_evals-0.1.3-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0811334b7d994eaa2263c2ffcba15d8fad8d11c2e65b476da6f72331f6f1cb19 |
|
MD5 | 59b0f81bd0e8aa7137144c41467cfb76 |
|
BLAKE2b-256 | 425fc943b7ccea068a4816d091ea82513673efdc028369e5c3655f5b9c87d4a0 |