Visualize OpenAI evals with Zeno
Project description
Zeno 🤝 OpenAI Evals
Use Zeno to visualize the results of OpenAI Evals.
https://user-images.githubusercontent.com/4563691/225655166-9fd82784-cf35-47c1-8306-96178cdad7c1.mov
Example using zeno-evals to explore the results of an OpenAI eval on multiple choice medicine questions (MedMCQA)
Usage
pip install zeno-evals
Run an evaluation following the evals instructions. This will produce a cache file in /tmp/evallogs/.
Pass this file to the zeno-evals command:
zeno-evals /tmp/evallogs/my_eval_cache.jsonl
Example
Single example looking at US tort law questions:
zeno-evals ./examples/example.jsonl
And an example of comparison between two models:
zeno-evals ./examples/crossword-turbo.jsonl --second-results-file ./examples/crossword-turbo-0301.jsonl
And lastly, we can pass additional Zeno functions to provide more context to the results:
pip install wordfreq
zeno-evals ./examples/crossword-turbo.jsonl --second-results-file ./examples/crossword-turbo-0301.jsonl --functions_file ./examples/crossword_fns.py
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file zeno_evals-0.1.10.tar.gz.
File metadata
- Download URL: zeno_evals-0.1.10.tar.gz
- Upload date:
- Size: 4.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.3.2 CPython/3.8.12 Darwin/22.4.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3ecf12896512ed8ec8bff10cc628c0a840c74b3a7c90c8f068309cf79ac8dbc8
|
|
| MD5 |
5043ad9214a245ab188597ba3c9e9571
|
|
| BLAKE2b-256 |
f03ffd3e7dfa364afea27e0add96cfb03daeb9cee94f0632a80fffb2e5dcfe95
|
File details
Details for the file zeno_evals-0.1.10-py3-none-any.whl.
File metadata
- Download URL: zeno_evals-0.1.10-py3-none-any.whl
- Upload date:
- Size: 5.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.3.2 CPython/3.8.12 Darwin/22.4.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9ee8451d68518fbeba4066f4c48fff2aef5e653299296d67521aa84a03cd677b
|
|
| MD5 |
809a1b87dc0475edc0a65845fa5ef0b9
|
|
| BLAKE2b-256 |
a0f25688d4f1fd25eeaeb850511ac7b01b9fc6c0e9af044648922c56b59fa5d1
|