No project description provided

Project description

VisEval: A NL2VIS Benchmark

VisEval is a benchmark designed to evaluate visualization generation methods. In this repository, we provide both the toolkit to support the benchmarking, as well as the data used for benchmarks.

What Can VisEval Evaluate

The pipeline of VisEval includes three key modules: the validity checker, the legality checker, and the readability evaluator.

VisEval evaluates generated visualizations from three dimensions:

Whether the generated code can produce the visualization.
Whether the generated visualization meets the query.
Whether the generated visualization is easy to read.

Get Started

Install Benchmark Toolkit

pip install --upgrade vis-evaluator
# or `git clone https://github.com/microsoft/VisEval.git && cd VisEval && pip install --upgrade -e .`

Download Benchmark Dataset

To access the dataset, please follow these steps:

Download the dataset from this link.
Once the download is complete, unzip the file to extract the dataset contents.

For additional information about the dataset, please refer to the dataset documentation.

Usage & Examples

After installation, you can use VisEval by referring to examples/evaluate.py or a follow:

Create your generation method by inheriting from the Agent Class. You can find three examples in the examples/agent directory.

from viseval.agent import Agent, ChartExecutionResult

class YourAgent(Agent):
    def __init__(self, llm):
        self.llm = llm
    
    def generate(
        self, nl_query: str, tables: list[str], config: dict
    ) -> Tuple[str, dict]:
        """Generate code for the given natural language query."""
        pass

    def execute(
        self, code: str, context: dict, log_name: str = None
    ) -> ChartExecutionResult:
        """Execute the given code with context and return the result"""
        pass

Configure evaluator.

    evaluator = Evaluator(webdriver_path, vision_model)

(You can configure the Evaluator without a webdriver and vision model, in which case the evaluation of the readability of the generated visualizations will be skipped.)

Install webdriver.

# download
wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
# install
apt install google-chrome-stable_current_amd64.deb
# verify
google-chrome --version

Load vision model (e.g., GPT4-v).

from langchain_openai import AzureChatOpenAI

import dotenv
# Copy .env.example to .env and put your API keys in the file.
dotenv.load_dotenv()

vision_model = AzureChatOpenAI(
    model_name="gpt-4-turbo-v",
    max_retries=999,
    temperature=0.0,
    request_timeout=20,
    max_tokens=4096,
)

Evaluate

from viseval import Dataset

# Configure dataset with the benchmark dataset folder path ( folder), 
# specify the number of tables required to generate visualizations (table_type`: all, single, or multiple),
# and indicate whether to include irrelevant tables (`with_irrelevant_tables`).
dataset = Dataset(folder, table_type, with_irrelevant_tables)

config = {"library": args.library}
result = evaluator.evaluate(agent, dataset, config)
score = result.score()
print(f"Score: {score}")

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct.For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.

Project details

Release history Release notifications | RSS feed

0.0.3

Apr 9, 2025

0.0.2

Jul 19, 2024

This version

0.0.1

May 27, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vis_evaluator-0.0.1.tar.gz (31.9 kB view details)

Uploaded May 27, 2024 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

vis_evaluator-0.0.1-py3-none-any.whl (29.8 kB view details)

Uploaded May 27, 2024 Python 3

File details

Details for the file vis_evaluator-0.0.1.tar.gz.

File metadata

Download URL: vis_evaluator-0.0.1.tar.gz
Upload date: May 27, 2024
Size: 31.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.0 CPython/3.10.14

File hashes

Hashes for vis_evaluator-0.0.1.tar.gz
Algorithm	Hash digest
SHA256	`49071280d1d9ea4b07ccd9730d169fba1190c1bc59a7f442629bb27d33fbcc18`
MD5	`807cccb5193e3550b43657095acc6172`
BLAKE2b-256	`2c34633d2a11b96dee792508d7456791269fd5c9dc19917e395dcc049fac6015`

See more details on using hashes here.

File details

Details for the file vis_evaluator-0.0.1-py3-none-any.whl.

File metadata

Download URL: vis_evaluator-0.0.1-py3-none-any.whl
Upload date: May 27, 2024
Size: 29.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.0 CPython/3.10.14

File hashes

Hashes for vis_evaluator-0.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`72e1092977b999629f06ddc49d5c7d1aa2b19d8c6f4822753b3fff77cfd0bd6e`
MD5	`b0fb6bff50fef6588f46760141517444`
BLAKE2b-256	`d5bd2e15e93d159cb3b66cd7b5328da74f7cd34babef10ef45a0deccdd3b9765`

See more details on using hashes here.

vis-evaluator 0.0.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

VisEval: A NL2VIS Benchmark

What Can VisEval Evaluate

Get Started

Install Benchmark Toolkit

Download Benchmark Dataset

Usage & Examples

Contributing

Trademarks

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes