Plot your data with natural language
Project description
EDAplot (VegaChat)
This repository contains a snapshot of the code used for the paper "Generating and Evaluating Declarative Charts Using Large Language Models".
Usage
Set an OpenAI API key as an env variable and install uv first. If you're unfamiliar with it refer to the environment section.
Run the interactive Streamlit prototype locally with:
uv run python -m streamlit run frontend/app.py
To use the code as a library, look into api.py.
Evaluation
Setup
Download evaluation datasets:
- NLV Corpus is included
- chart-llm should be cloned into
./dataset/:cd dataset git clone https://github.com/hyungkwonko/chart-llm.git cd ..
Benchmarks
Example for running the NLV Corpus benchmark:
uv run python -m scripts.run_benchmark nlv_corpus --dataset_dir dataset/nlv_corpus --output_path out/benchmarks
Run the interactive results report with:
uv run python -m streamlit run benchmark/reports/vega_chat_benchmark_report.py out/benchmarks
where out is the path to the directory containing the saved outputs.
Evals
Our set of custom test cases (evals) are defined as yaml files.
Each eval specifies the actions to take and the checks to perform after each action.
Run the evals with:
uv run python -m scripts.run_benchmark evals --output_path out/evals
Run the interactive results report with:
uv run python -m streamlit run benchmark/reports/evals_report.py out/evals
where out is the path to the directory containing the saved outputs.
Update existing results with new checks using:
uv run python -m scripts.run_eval_checks out/evals/
Request Analyzer
Run the request analyzer benchmark with:
uv run python -m scripts.run_request_analyzer_benchmark --dataset_dir dataset/chart-llm --take_n 180 --output_path out/request_analyzer_benchmark/ chart_llm_gold
View the results with:
uv run python -m streamlit run benchmark/reports/request_analyzer_benchmark_report.py out/request_analyzer_benchmark/
LLM as a judge
Vision Judge
The vision judge uses a multimodal LLM to compare the generated image to the reference image. It can be used to compare results from different plotting libraries (e.g., matplotlib and Vega-Lite).
To run the vision judge evaluation on existing outputs use:
uv run python -m scripts.run_vision_judge example.jsonl
or use the --vision_judge flag together with scripts/run_benchmark.py
Vision Judge Benchmark
To evaluate the vision judge, we use a separate benchmark.
Run it with:
uv run python -m scripts.run_vision_judge_benchmark
View the results with:
uv run python -m streamlit run benchmark/reports/vision_judge_benchmark_report.py out/vision_judge_benchmark/
Correlation with Human Judgments
To measure the correlation between the human judgments and different metrics requires running:
- vision_judge_human_eval.py to generate an evaluation dataset
- human_eval_db.py to store the evaluation dataset in a Postgres database
- vision_judge_human_eval_app.py to run the interactive evaluation environment
LIDA Self-Evaluation
LIDA's self-evaluation can be run with:
uv run python -m scripts.run_lida_self_eval example.jsonl
Configuring dev environment
- Install uv
- Install dependencies:
uv sync - Enable pre-commit:
uv run pre-commit install
- Add OpenAI API key to the env variable
OPENAI_API_KEY
Run tests with:
uv run pytest tests -v
uv run pytest tests -v -m "not external_data" # To skip tests that require external data
For some tests you need to first download the Evaluation datasets.
Publishing
To publish a new release to PyPI:
git tag -a v0.1.2 -m v0.1.2andgit push --tags. This sets the package version dynamically.- The publish.yml workflow will trigger when a new version tag is pushed.
Docker
Build the image and run the container:
docker build -f frontend.Dockerfile -t edaplot .
docker run --rm -p 8501:8501 -e OPENAI_API_KEY -t edaplot
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file edaplot_vl-0.1.2.dev2.tar.gz.
File metadata
- Download URL: edaplot_vl-0.1.2.dev2.tar.gz
- Upload date:
- Size: 45.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.9.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
53faa87575f84691380dc4ff29736e90490bf6ffa342ec214d114bc5af8cd928
|
|
| MD5 |
409736eadcbd838e3d6043a7c774739b
|
|
| BLAKE2b-256 |
b932ea1d84e7a4a6c949e74d8131ab0458ff5c88c2ad0adc53976b953ffa55a8
|
File details
Details for the file edaplot_vl-0.1.2.dev2-py3-none-any.whl.
File metadata
- Download URL: edaplot_vl-0.1.2.dev2-py3-none-any.whl
- Upload date:
- Size: 53.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.9.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ac11dec04bb1a0026898a3a880fa6ca73ab8579e311370e3ef317d1347ecc864
|
|
| MD5 |
f78c8e26639bf9dd307034378ec4646a
|
|
| BLAKE2b-256 |
2e684a212b87946b52a01f23cdb7463b22cae4de58681ee72b9ad76a91d6cbdb
|