Skip to main content

Plot your data with natural language

Project description

EDAplot (VegaChat)

This repository contains a snapshot of the code used for the paper "Generating and Evaluating Declarative Charts Using Large Language Models".

Usage

Set an OpenAI API key as an env variable and install uv first. If you're unfamiliar with it refer to the environment section.

Run the interactive Streamlit prototype locally with:

uv run python -m streamlit run frontend/app.py

To use the code as a library, look into api.py.

Evaluation

Setup

Download evaluation datasets:

  • NLV Corpus is included
  • chart-llm should be cloned into ./dataset/:
    cd dataset
    git clone https://github.com/hyungkwonko/chart-llm.git
    cd ..
    

Benchmarks

Example for running the NLV Corpus benchmark:

uv run python -m scripts.run_benchmark nlv_corpus --dataset_dir dataset/nlv_corpus --output_path out/benchmarks

Run the interactive results report with:

uv run python -m streamlit run benchmark/reports/vega_chat_benchmark_report.py out/benchmarks

where out is the path to the directory containing the saved outputs.

Evals

Our set of custom test cases (evals) are defined as yaml files. Each eval specifies the actions to take and the checks to perform after each action.

Run the evals with:

uv run python -m scripts.run_benchmark evals --output_path out/evals

Run the interactive results report with:

uv run python -m streamlit run benchmark/reports/evals_report.py out/evals

where out is the path to the directory containing the saved outputs.

Update existing results with new checks using:

uv run python -m scripts.run_eval_checks out/evals/

Request Analyzer

Run the request analyzer benchmark with:

uv run python -m scripts.run_request_analyzer_benchmark --dataset_dir dataset/chart-llm --take_n 180 --output_path out/request_analyzer_benchmark/ chart_llm_gold

View the results with:

uv run python -m streamlit run benchmark/reports/request_analyzer_benchmark_report.py out/request_analyzer_benchmark/

LLM as a judge

Vision Judge

The vision judge uses a multimodal LLM to compare the generated image to the reference image. It can be used to compare results from different plotting libraries (e.g., matplotlib and Vega-Lite).

To run the vision judge evaluation on existing outputs use:

uv run python -m scripts.run_vision_judge example.jsonl

or use the --vision_judge flag together with scripts/run_benchmark.py

Vision Judge Benchmark

To evaluate the vision judge, we use a separate benchmark.

Run it with:

uv run python -m scripts.run_vision_judge_benchmark

View the results with:

uv run python -m streamlit run benchmark/reports/vision_judge_benchmark_report.py out/vision_judge_benchmark/

Correlation with Human Judgments

To measure the correlation between the human judgments and different metrics requires running:

  1. vision_judge_human_eval.py to generate an evaluation dataset
  2. human_eval_db.py to store the evaluation dataset in a Postgres database
  3. vision_judge_human_eval_app.py to run the interactive evaluation environment

LIDA Self-Evaluation

LIDA's self-evaluation can be run with:

uv run python -m scripts.run_lida_self_eval example.jsonl

Configuring dev environment

  1. Install uv
  2. Install dependencies:
    uv sync
    
  3. Enable pre-commit:
    uv run pre-commit install
    
  4. Add OpenAI API key to the env variable OPENAI_API_KEY

Run tests with:

uv run pytest tests -v
uv run pytest tests -v -m "not external_data"  # To skip tests that require external data

For some tests you need to first download the Evaluation datasets.

Publishing

To publish a new release to PyPI:

  1. git tag -a v0.1.2 -m v0.1.2 and git push --tags. This sets the package version dynamically.
  2. The publish.yml workflow will trigger when a new version tag is pushed.

Docker

Build the image and run the container:

docker build -f frontend.Dockerfile -t edaplot .
docker run --rm -p 8501:8501 -e OPENAI_API_KEY -t edaplot

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

edaplot_vl-0.1.0.tar.gz (45.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

edaplot_vl-0.1.0-py3-none-any.whl (53.3 kB view details)

Uploaded Python 3

File details

Details for the file edaplot_vl-0.1.0.tar.gz.

File metadata

  • Download URL: edaplot_vl-0.1.0.tar.gz
  • Upload date:
  • Size: 45.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.9.5

File hashes

Hashes for edaplot_vl-0.1.0.tar.gz
Algorithm Hash digest
SHA256 f49d41a8f5d4677aa163ff1356ec1c13485005c2345e1331980bb5651de12b6a
MD5 b680dae1cb02c6355b00cf4558805d1f
BLAKE2b-256 fcfdf72199eaa6cd9790e5b934ab4d801d047711e85d7634bf07094fc620ccfc

See more details on using hashes here.

File details

Details for the file edaplot_vl-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: edaplot_vl-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 53.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.9.5

File hashes

Hashes for edaplot_vl-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ee85e2843e264d7e5d8c91939bc65495e9fafe656d9015c77963f38641f1313e
MD5 749fae636086f84d2b9fdfc5a2b9a15c
BLAKE2b-256 dc5f653b23643e5851d0137a4882d051df4fc6a4a6e28aedf6621676cf3b6d17

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page