Skip to main content

No project description provided

Project description

VisEval: A NL2VIS Benchmark

VisEval is a benchmark designed to evaluate visualization generation methods. In this repository, we provide both the toolkit to support the benchmarking, as well as the data used for benchmarks.

What Can VisEval Evaluate

The pipeline of VisEval includes three key modules: the validity checker, the legality checker, and the readability evaluator.

VisEval evaluates generated visualizations from three dimensions:

  1. Whether the generated code can produce the visualization.
  2. Whether the generated visualization meets the query.
  3. Whether the generated visualization is easy to read.

Get Started

Install Benchmark Toolkit

pip install --upgrade vis-evaluator
# or `git clone https://github.com/microsoft/VisEval.git && cd VisEval && pip install --upgrade -e .`

Download Benchmark Dataset

To access the dataset, please follow these steps:

  1. Download the dataset from this link.
  2. Once the download is complete, unzip the file to extract the dataset contents.

For additional information about the dataset, please refer to the dataset documentation.

Usage & Examples

After installation, you can use VisEval by referring to examples/evaluate.py or a follow:

  1. Create your generation method by inheriting from the Agent Class. You can find three examples in the examples/agent directory.
from viseval.agent import Agent, ChartExecutionResult

class YourAgent(Agent):
    def __init__(self, llm):
        self.llm = llm
    
    def generate(
        self, nl_query: str, tables: list[str], config: dict
    ) -> Tuple[str, dict]:
        """Generate code for the given natural language query."""
        pass

    def execute(
        self, code: str, context: dict, log_name: str = None
    ) -> ChartExecutionResult:
        """Execute the given code with context and return the result"""
        pass
  1. Configure evaluator.
    evaluator = Evaluator(webdriver_path, vision_model)

(You can configure the Evaluator without a webdriver and vision model, in which case the evaluation of the readability of the generated visualizations will be skipped.)

  • Install webdriver.

    # download
    wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
    # install
    apt install google-chrome-stable_current_amd64.deb
    # verify
    google-chrome --version
    
  • Load vision model (e.g., GPT4-v).

    from langchain_openai import AzureChatOpenAI
    
    import dotenv
    # Copy .env.example to .env and put your API keys in the file.
    dotenv.load_dotenv()
    
    vision_model = AzureChatOpenAI(
        model_name="gpt-4-turbo-v",
        max_retries=999,
        temperature=0.0,
        request_timeout=20,
        max_tokens=4096,
    )
    
  1. Evaluate
from viseval import Dataset

# Configure dataset with the benchmark dataset folder path ( folder), 
# specify the number of tables required to generate visualizations (table_type`: all, single, or multiple),
# and indicate whether to include irrelevant tables (`with_irrelevant_tables`).
dataset = Dataset(folder, table_type, with_irrelevant_tables)

config = {"library": args.library}
result = evaluator.evaluate(agent, dataset, config)
score = result.score()
print(f"Score: {score}")

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct.For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.

Privacy Statement

This project has adopted the Microsoft Privacy Statement.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vis_evaluator-0.0.2.tar.gz (32.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vis_evaluator-0.0.2-py3-none-any.whl (30.8 kB view details)

Uploaded Python 3

File details

Details for the file vis_evaluator-0.0.2.tar.gz.

File metadata

  • Download URL: vis_evaluator-0.0.2.tar.gz
  • Upload date:
  • Size: 32.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.9

File hashes

Hashes for vis_evaluator-0.0.2.tar.gz
Algorithm Hash digest
SHA256 442491118e3755a88ce1d1a397179353dbb5fc0c972b4bc0cf64f5e3fbe0634b
MD5 385d1c5eb65d3b4dffdb192031b76e06
BLAKE2b-256 da2cb723efc828f247ba82fc8e4c9c1007e5a1c071d7fd2bff3d8a72d93a03d6

See more details on using hashes here.

File details

Details for the file vis_evaluator-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: vis_evaluator-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 30.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.9

File hashes

Hashes for vis_evaluator-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 868428f2b2f4543d1fdd9f94fa138cea2e5708b1279c65820901c3920e187552
MD5 b2c0be720b981acc974e72adbef8e62c
BLAKE2b-256 7f4bdf3523743879139739363f412be5a8ec39460fc248e495ce3f1fa3f80957

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page