Skip to main content

Visualize OpenAI evals with Zeno

Project description

Zeno 🤝 OpenAI Evals

Use Zeno to visualize the results of OpenAI Evals.

https://user-images.githubusercontent.com/4563691/225655166-9fd82784-cf35-47c1-8306-96178cdad7c1.mov

Example using zeno-evals to explore the results of an OpenAI eval on multiple choice medicine questions (MedMCQA)

Usage

pip install zeno-evals

Run an evaluation following the evals instructions. This will produce a cache file in /tmp/evallogs/.

Pass this file to the zeno-evals command:

zeno-evals /tmp/evallogs/my_eval_cache.jsonl

Example

Single example looking at US tort law questions:

zeno-evals ./examples/example.jsonl

And an example of comparison between two models:

zeno-evals ./examples/crossword-turbo.jsonl --second-results-file ./examples/crossword-turbo-0301.jsonl

And lastly, we can pass additional Zeno functions to provide more context to the results:

pip install wordfreq
zeno-evals ./examples/crossword-turbo.jsonl --second-results-file ./examples/crossword-turbo-0301.jsonl --functions_file ./examples/crossword_fns.py

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

zeno_evals-0.1.10.tar.gz (4.7 kB view details)

Uploaded Source

Built Distribution

zeno_evals-0.1.10-py3-none-any.whl (5.3 kB view details)

Uploaded Python 3

File details

Details for the file zeno_evals-0.1.10.tar.gz.

File metadata

  • Download URL: zeno_evals-0.1.10.tar.gz
  • Upload date:
  • Size: 4.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.3.2 CPython/3.8.12 Darwin/22.4.0

File hashes

Hashes for zeno_evals-0.1.10.tar.gz
Algorithm Hash digest
SHA256 3ecf12896512ed8ec8bff10cc628c0a840c74b3a7c90c8f068309cf79ac8dbc8
MD5 5043ad9214a245ab188597ba3c9e9571
BLAKE2b-256 f03ffd3e7dfa364afea27e0add96cfb03daeb9cee94f0632a80fffb2e5dcfe95

See more details on using hashes here.

File details

Details for the file zeno_evals-0.1.10-py3-none-any.whl.

File metadata

  • Download URL: zeno_evals-0.1.10-py3-none-any.whl
  • Upload date:
  • Size: 5.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.3.2 CPython/3.8.12 Darwin/22.4.0

File hashes

Hashes for zeno_evals-0.1.10-py3-none-any.whl
Algorithm Hash digest
SHA256 9ee8451d68518fbeba4066f4c48fff2aef5e653299296d67521aa84a03cd677b
MD5 809a1b87dc0475edc0a65845fa5ef0b9
BLAKE2b-256 a0f25688d4f1fd25eeaeb850511ac7b01b9fc6c0e9af044648922c56b59fa5d1

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page