Visualize OpenAI evals with Zeno
Project description
Zeno 🤝 OpenAI Evals
Use Zeno to visualize the results of OpenAI Evals.
https://user-images.githubusercontent.com/4563691/225655166-9fd82784-cf35-47c1-8306-96178cdad7c1.mov
Example using zeno-evals
to explore the results of an OpenAI eval on multiple choice medicine questions (MedMCQA)
Usage
pip install zeno-evals
Run an evaluation following the evals instructions. This will produce a cache file in /tmp/evallogs/
.
Pass this file to the zeno-evals
command:
zeno-evals /tmp/evallogs/my_eval_cache.jsonl
Example
Single example looking at US tort law questions:
zeno-evals ./examples/example.jsonl
And an example of comparison between two models:
zeno-evals ./examples/crossword-turbo.jsonl --second-results-file ./examples/crossword-turbo-0301.jsonl
And lastly, we can pass additional Zeno functions to provide more context to the results:
pip install wordfreq
zeno-evals ./examples/crossword-turbo.jsonl --second-results-file ./examples/crossword-turbo-0301.jsonl --functions_file ./examples/crossword_fns.py
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
zeno_evals-0.1.8.tar.gz
(4.6 kB
view hashes)
Built Distribution
Close
Hashes for zeno_evals-0.1.8-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 64ff4b7e1f247be7d591c190cc909eaf02d83f1850549a6d97b33f109830dcf5 |
|
MD5 | b20f1d58bd52bec01b76ded2e12bb5e4 |
|
BLAKE2b-256 | fafdbfdb3da33301215fef2a999238ad63c1a31f370f7724ebc74d2ced59abe8 |