A library for visualizing model evaluation results

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

nielsrolf

These details have not been verified by PyPI

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

Viseval / Vibes Eval

The original name was viseval, but it's taken on pypi. So now vibes eval Credit for the design of the evals goes to @johny-b

Tools for running model evaluations and visualizing results.

Install

pip install vibes_eval

Core Concept

Viseval assumes you have:

A set of models organized by experimental groups:

models = {
    "baseline": ["model-v1", "model-v2"],
    "intervention": ["model-a", "model-b"],
}

An async function that evaluates a single model and returns a DataFrame:

async def run_eval(model_id: str) -> pd.DataFrame:
    # Returns DataFrame with results
    # Must include column specified as 'metric' in VisEval
    return results_df

Usage

from vibes_eval import VisEval

# Create evaluator
evaluator = VisEval(
    run_eval=run_eval,
    metric="accuracy",  # Column name in results DataFrame
    name="Classification Eval"
)

# Run eval for all models
results = await evaluator.run(models)

# Create visualizations
results.model_plot()      # Compare individual models
results.group_plot()      # Compare groups (aggregated)
results.histogram()       # Score distributions per group
results.scatter(          # Compare two metrics
    x_column="accuracy",
    y_column="runtime"
)

Freeform questions

One built-in evaluation is provided by the FreeformQuestion class: a freeform question is a question that will be asked to the models, combined with a set of prompts that will be asked to an LLM judge. Questions are defined in yaml files such as this one. Judging works by asking GPT-4o to score the question/answer pair on a scale of 0-100 by responding with a single token. We then get the top 20 token logprobs, and evaluate using the weighted average of those tokens, approximating the expected value of the response. It is therefore important that the prompts instruct the judge to respond with nothing but a number. An example with code can be found here.

Visualizations

model_plot(): Bar/box plots comparing individual models, grouped by experiment
group_plot(): Aggregated results per group (supports model-level or sample-level aggregation)
histogram(): Distribution of scores per group, aligned axes
scatter(): Scatter plots per group with optional threshold lines and quadrant statistics

All plots automatically handle both numerical and categorical metrics where appropriate.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

nielsrolf

These details have not been verified by PyPI

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

0.2.5

Apr 1, 2026

0.2.4

Mar 10, 2026

0.2.1

Mar 4, 2026

This version

0.2.0

Oct 6, 2025

0.1.0

May 16, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vibes_eval-0.2.0.tar.gz (20.5 kB view details)

Uploaded Oct 6, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

vibes_eval-0.2.0-py3-none-any.whl (20.6 kB view details)

Uploaded Oct 6, 2025 Python 3

File details

Details for the file vibes_eval-0.2.0.tar.gz.

File metadata

Download URL: vibes_eval-0.2.0.tar.gz
Upload date: Oct 6, 2025
Size: 20.5 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for vibes_eval-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`9f7e1fbc29624c13398527fdc547472799880dcaa8cc6d81b850c1b22366812b`
MD5	`9b526f5d48f41ed924dc32eb42e948a5`
BLAKE2b-256	`c8501ee6c79b0ac4580d3e5efe3e35d7e045a36c527f4408e59fb5ee4de58c35`

See more details on using hashes here.

Provenance

The following attestation bundles were made for vibes_eval-0.2.0.tar.gz:

Publisher: manual_publish.yaml on nielsrolf/viseval

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: vibes_eval-0.2.0.tar.gz
- Subject digest: 9f7e1fbc29624c13398527fdc547472799880dcaa8cc6d81b850c1b22366812b
- Sigstore transparency entry: 584984207
- Sigstore integration time: Oct 6, 2025
Source repository:
- Permalink: nielsrolf/viseval@0641f1861f851ee5009946ea54df2ecfba225d27
- Branch / Tag: refs/heads/main
- Owner: https://github.com/nielsrolf
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: manual_publish.yaml@0641f1861f851ee5009946ea54df2ecfba225d27
- Trigger Event: workflow_dispatch

File details

Details for the file vibes_eval-0.2.0-py3-none-any.whl.

File metadata

Download URL: vibes_eval-0.2.0-py3-none-any.whl
Upload date: Oct 6, 2025
Size: 20.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for vibes_eval-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`4e527f0188deefe10a38b822422e5f73a0b183e619f7c34c69d9ff132faecaba`
MD5	`67970d057210cc420b5c50e4149b927b`
BLAKE2b-256	`73828ab8a7ff6f5efc685709ae8ac2aa72d1c9b80991915121a1b0e32ca4f8c2`

See more details on using hashes here.

Provenance

The following attestation bundles were made for vibes_eval-0.2.0-py3-none-any.whl:

Publisher: manual_publish.yaml on nielsrolf/viseval

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: vibes_eval-0.2.0-py3-none-any.whl
- Subject digest: 4e527f0188deefe10a38b822422e5f73a0b183e619f7c34c69d9ff132faecaba
- Sigstore transparency entry: 584984208
- Sigstore integration time: Oct 6, 2025
Source repository:
- Permalink: nielsrolf/viseval@0641f1861f851ee5009946ea54df2ecfba225d27
- Branch / Tag: refs/heads/main
- Owner: https://github.com/nielsrolf
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: manual_publish.yaml@0641f1861f851ee5009946ea54df2ecfba225d27
- Trigger Event: workflow_dispatch

vibes-eval 0.2.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

Viseval / Vibes Eval

Install

Core Concept

Usage

Freeform questions

Visualizations

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance