Visualize OpenAI evals with Zeno
Project description
Zeno 🤝 OpenAI Evals
Use Zeno to visualize the results of OpenAI Evals.
https://user-images.githubusercontent.com/4563691/225655166-9fd82784-cf35-47c1-8306-96178cdad7c1.mov
Example using zeno-evals
to explore the results of an OpenAI eval on multiple choice medicine questions (MedMCQA)
Usage
pip install zeno-evals
Run an evaluation following the evals instructions. This will produce a cache file in /tmp/evallogs/
.
Pass this file to the zeno-evals
command:
zeno-evals /tmp/evallogs/my_eval_cache.jsonl
Example
Single example looking at US tort law questions:
zeno-evals ./examples/example.jsonl
And an example of comparison between two models:
zeno-evals ./examples/crossword-turbo.jsonl --second-results-file ./examples/crossword-turbo-0301.jsonl
And lastly, we can pass additional Zeno functions to provide more context to the results:
pip install wordfreq
zeno-evals ./examples/crossword-turbo.jsonl --second-results-file ./examples/crossword-turbo-0301.jsonl --functions_file ./examples/crossword_fns.py
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file zeno_evals-0.1.10.tar.gz
.
File metadata
- Download URL: zeno_evals-0.1.10.tar.gz
- Upload date:
- Size: 4.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.3.2 CPython/3.8.12 Darwin/22.4.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3ecf12896512ed8ec8bff10cc628c0a840c74b3a7c90c8f068309cf79ac8dbc8 |
|
MD5 | 5043ad9214a245ab188597ba3c9e9571 |
|
BLAKE2b-256 | f03ffd3e7dfa364afea27e0add96cfb03daeb9cee94f0632a80fffb2e5dcfe95 |
File details
Details for the file zeno_evals-0.1.10-py3-none-any.whl
.
File metadata
- Download URL: zeno_evals-0.1.10-py3-none-any.whl
- Upload date:
- Size: 5.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.3.2 CPython/3.8.12 Darwin/22.4.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9ee8451d68518fbeba4066f4c48fff2aef5e653299296d67521aa84a03cd677b |
|
MD5 | 809a1b87dc0475edc0a65845fa5ef0b9 |
|
BLAKE2b-256 | a0f25688d4f1fd25eeaeb850511ac7b01b9fc6c0e9af044648922c56b59fa5d1 |