Skip to main content

No project description provided

Project description

VisEval: A NL2VIS Benchmark

VisEval is a benchmark designed to evaluate visualization generation methods. In this repository, we provide both the toolkit to support the benchmarking, as well as the data used for benchmarks.

What Can VisEval Evaluate

The pipeline of VisEval includes three key modules: the validity checker, the legality checker, and the readability evaluator.

VisEval evaluates generated visualizations from three dimensions:

  1. Whether the generated code can produce the visualization.
  2. Whether the generated visualization meets the query.
  3. Whether the generated visualization is easy to read.

Get Started

Install Benchmark Toolkit

pip install --upgrade vis-evaluator
# or `git clone https://github.com/microsoft/VisEval.git && cd VisEval && pip install --upgrade -e .`

Download Benchmark Dataset

To access the dataset, please follow these steps:

  1. Download the dataset from this link.
  2. Once the download is complete, unzip the file to extract the dataset contents.

For additional information about the dataset, please refer to the dataset documentation.

Usage & Examples

After installation, you can use VisEval by referring to examples/evaluate.py or a follow:

  1. Create your generation method by inheriting from the Agent Class. You can find three examples in the examples/agent directory.
from viseval.agent import Agent, ChartExecutionResult

class YourAgent(Agent):
    def __init__(self, llm):
        self.llm = llm
    
    def generate(
        self, nl_query: str, tables: list[str], config: dict
    ) -> Tuple[str, dict]:
        """Generate code for the given natural language query."""
        pass

    def execute(
        self, code: str, context: dict, log_name: str = None
    ) -> ChartExecutionResult:
        """Execute the given code with context and return the result"""
        pass
  1. Configure evaluator.
    evaluator = Evaluator(webdriver_path, vision_model)

(You can configure the Evaluator without a webdriver and vision model, in which case the evaluation of the readability of the generated visualizations will be skipped.)

  • Install webdriver.

    # download
    wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
    # install
    apt install google-chrome-stable_current_amd64.deb
    # verify
    google-chrome --version
    
  • Load vision model (e.g., GPT4-v).

    from langchain_openai import AzureChatOpenAI
    
    import dotenv
    # Copy .env.example to .env and put your API keys in the file.
    dotenv.load_dotenv()
    
    vision_model = AzureChatOpenAI(
        model_name="gpt-4-turbo-v",
        max_retries=999,
        temperature=0.0,
        request_timeout=20,
        max_tokens=4096,
    )
    
  1. Evaluate
from viseval import Dataset

# Configure dataset with the benchmark dataset folder path ( folder), 
# specify the number of tables required to generate visualizations (table_type`: all, single, or multiple),
# and indicate whether to include irrelevant tables (`with_irrelevant_tables`).
dataset = Dataset(folder, table_type, with_irrelevant_tables)

config = {"library": args.library}
result = evaluator.evaluate(agent, dataset, config)
score = result.score()
print(f"Score: {score}")

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct.For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vis_evaluator-0.0.1.tar.gz (31.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vis_evaluator-0.0.1-py3-none-any.whl (29.8 kB view details)

Uploaded Python 3

File details

Details for the file vis_evaluator-0.0.1.tar.gz.

File metadata

  • Download URL: vis_evaluator-0.0.1.tar.gz
  • Upload date:
  • Size: 31.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.10.14

File hashes

Hashes for vis_evaluator-0.0.1.tar.gz
Algorithm Hash digest
SHA256 49071280d1d9ea4b07ccd9730d169fba1190c1bc59a7f442629bb27d33fbcc18
MD5 807cccb5193e3550b43657095acc6172
BLAKE2b-256 2c34633d2a11b96dee792508d7456791269fd5c9dc19917e395dcc049fac6015

See more details on using hashes here.

File details

Details for the file vis_evaluator-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: vis_evaluator-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 29.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.10.14

File hashes

Hashes for vis_evaluator-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 72e1092977b999629f06ddc49d5c7d1aa2b19d8c6f4822753b3fff77cfd0bd6e
MD5 b0fb6bff50fef6588f46760141517444
BLAKE2b-256 d5bd2e15e93d159cb3b66cd7b5328da74f7cd34babef10ef45a0deccdd3b9765

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page