Plot your data with natural language

These details have not been verified by PyPI

Project description

EDAplot (VegaChat)

This repository contains a snapshot of the code used for the paper "Generating and Evaluating Declarative Charts Using Large Language Models".

Usage

Set an OpenAI API key as an env variable and install uv first. If you're unfamiliar with it refer to the environment section.

Run the interactive Streamlit prototype locally with:

uv run python -m streamlit run frontend/app.py

To use the code as a library, look into api.py.

Evaluation

Setup

Download evaluation datasets:

NLV Corpus is included

chart-llm should be cloned into ./dataset/:

cd dataset
git clone https://github.com/hyungkwonko/chart-llm.git
cd ..

Benchmarks

Example for running the NLV Corpus benchmark:

uv run python -m scripts.run_benchmark nlv_corpus --dataset_dir dataset/nlv_corpus --output_path out/benchmarks

Run the interactive results report with:

uv run python -m streamlit run benchmark/reports/vega_chat_benchmark_report.py out/benchmarks

where out is the path to the directory containing the saved outputs.

Evals

Our set of custom test cases (evals) are defined as yaml files. Each eval specifies the actions to take and the checks to perform after each action.

Run the evals with:

uv run python -m scripts.run_benchmark evals --output_path out/evals

Run the interactive results report with:

uv run python -m streamlit run benchmark/reports/evals_report.py out/evals

where out is the path to the directory containing the saved outputs.

Update existing results with new checks using:

uv run python -m scripts.run_eval_checks out/evals/

Request Analyzer

Run the request analyzer benchmark with:

uv run python -m scripts.run_request_analyzer_benchmark --dataset_dir dataset/chart-llm --take_n 180 --output_path out/request_analyzer_benchmark/ chart_llm_gold

View the results with:

uv run python -m streamlit run benchmark/reports/request_analyzer_benchmark_report.py out/request_analyzer_benchmark/

LLM as a judge

Vision Judge

The vision judge uses a multimodal LLM to compare the generated image to the reference image. It can be used to compare results from different plotting libraries (e.g., matplotlib and Vega-Lite).

To run the vision judge evaluation on existing outputs use:

uv run python -m scripts.run_vision_judge example.jsonl

or use the --vision_judge flag together with scripts/run_benchmark.py

Vision Judge Benchmark

To evaluate the vision judge, we use a separate benchmark.

Run it with:

uv run python -m scripts.run_vision_judge_benchmark

View the results with:

uv run python -m streamlit run benchmark/reports/vision_judge_benchmark_report.py out/vision_judge_benchmark/

Correlation with Human Judgments

To measure the correlation between the human judgments and different metrics requires running:

vision_judge_human_eval.py to generate an evaluation dataset
human_eval_db.py to store the evaluation dataset in a Postgres database
vision_judge_human_eval_app.py to run the interactive evaluation environment

LIDA Self-Evaluation

LIDA's self-evaluation can be run with:

uv run python -m scripts.run_lida_self_eval example.jsonl

Configuring dev environment

Install uv
Install dependencies:
```
uv sync
```
Enable pre-commit:
```
uv run pre-commit install
```
Add OpenAI API key to the env variable OPENAI_API_KEY

Run tests with:

uv run pytest tests -v
uv run pytest tests -v -m "not external_data"  # To skip tests that require external data

For some tests you need to first download the Evaluation datasets.

Docker

Build the image and run the container:

docker build -f frontend.Dockerfile -t edaplot .
docker run --rm -p 8501:8501 -e OPENAI_API_KEY -t edaplot

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.1.4

Dec 15, 2025

0.1.3

Nov 27, 2025

0.1.3.dev1 pre-release

Nov 27, 2025

0.1.2

Nov 18, 2025

0.1.2.dev2 pre-release

Nov 18, 2025

0.1.2.dev1 pre-release

Nov 17, 2025

0.1.1 yanked

Nov 17, 2025

0.1.1.dev1 pre-release

Nov 17, 2025

0.1.0

Nov 5, 2025

0.1.0.dev1 pre-release

Nov 5, 2025

0.0.3

Oct 23, 2025

This version

0.0.3.dev1 pre-release yanked

Oct 28, 2025

0.0.2

Oct 22, 2025

0.0.1

Oct 22, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

edaplot_vl-0.0.3.dev1.tar.gz (45.0 kB view details)

Uploaded Oct 28, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

edaplot_vl-0.0.3.dev1-py3-none-any.whl (52.6 kB view details)

Uploaded Oct 28, 2025 Python 3

File details

Details for the file edaplot_vl-0.0.3.dev1.tar.gz.

File metadata

Download URL: edaplot_vl-0.0.3.dev1.tar.gz
Upload date: Oct 28, 2025
Size: 45.0 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: uv/0.9.5

File hashes

Hashes for edaplot_vl-0.0.3.dev1.tar.gz
Algorithm	Hash digest
SHA256	`ccd345f02f7613179e431dca5b59e16f3afd019cdd76a1d5243eab93e69249a1`
MD5	`c4b723e5e4a9f066dc20bad3dd7e33f5`
BLAKE2b-256	`0025007c3c227874f30e27b33695bea52b257a5285beb555990c4d8d2b86a428`

See more details on using hashes here.

File details

Details for the file edaplot_vl-0.0.3.dev1-py3-none-any.whl.

File metadata

Download URL: edaplot_vl-0.0.3.dev1-py3-none-any.whl
Upload date: Oct 28, 2025
Size: 52.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: uv/0.9.5

File hashes

Hashes for edaplot_vl-0.0.3.dev1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b676f4c38fb4f0676094327b8c38761f46616748146f3dadfaa7bd7a2a3740b5`
MD5	`3bc57b6db19d03e454dcf20bd72ecdf7`
BLAKE2b-256	`d4d45f66a1907831dabcd498743964754a60952d97f53da1ec15096e62718cc7`

See more details on using hashes here.

edaplot-vl 0.0.3.dev1

Navigation

Verified details

Owner

Maintainers

Unverified details

Meta

Classifiers

Project description

EDAplot (VegaChat)

Usage

Evaluation

Setup

Benchmarks

Evals

Request Analyzer

LLM as a judge

Vision Judge

Vision Judge Benchmark

Correlation with Human Judgments

LIDA Self-Evaluation

Configuring dev environment

Docker

Project details

Verified details

Owner

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes