Skip to main content

No project description provided

Project description

VisEval: A NL2VIS Benchmark

VisEval is a benchmark designed to evaluate visualization generation methods. In this repository, we provide both the toolkit to support the benchmarking, as well as the data used for benchmarks.

What Can VisEval Evaluate

The pipeline of VisEval includes three key modules: the validity checker, the legality checker, and the readability evaluator.

VisEval evaluates generated visualizations from three dimensions:

  1. Whether the generated code can produce the visualization.
  2. Whether the generated visualization meets the query.
  3. Whether the generated visualization is easy to read.

Get Started

Install Benchmark Toolkit

pip install --upgrade vis-evaluator
# or `git clone https://github.com/microsoft/VisEval.git && cd VisEval && pip install --upgrade -e .`

Download Benchmark Dataset

To access the dataset, please follow these steps:

  1. Download the dataset from this link.
  2. Once the download is complete, unzip the file to extract the dataset contents.

For additional information about the dataset, please refer to the dataset documentation.

Usage & Examples

After installation, you can use VisEval by referring to examples/evaluate.py or a follow:

  1. Create your generation method by inheriting from the Agent Class. You can find three examples in the examples/agent directory.
from viseval.agent import Agent, ChartExecutionResult

class YourAgent(Agent):
    def __init__(self, llm):
        self.llm = llm
    
    def generate(
        self, nl_query: str, tables: list[str], config: dict
    ) -> Tuple[str, dict]:
        """Generate code for the given natural language query."""
        pass

    def execute(
        self, code: str, context: dict, log_name: str = None
    ) -> ChartExecutionResult:
        """Execute the given code with context and return the result"""
        pass
  1. Configure evaluator.
    evaluator = Evaluator(webdriver_path, vision_model)

(You can configure the Evaluator without a webdriver and vision model, in which case the evaluation of the readability of the generated visualizations will be skipped.)

  • Install webdriver.

    # download
    wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
    # install
    apt install google-chrome-stable_current_amd64.deb
    # verify
    google-chrome --version
    
  • Load vision model (e.g., GPT4-v).

    from langchain_openai import AzureChatOpenAI
    
    import dotenv
    # Copy .env.example to .env and put your API keys in the file.
    dotenv.load_dotenv()
    
    vision_model = AzureChatOpenAI(
        model_name="gpt-4-turbo-v",
        max_retries=999,
        temperature=0.0,
        request_timeout=20,
        max_tokens=4096,
    )
    
  1. Evaluate
from viseval import Dataset

# Configure dataset with the benchmark dataset folder path ( folder), 
# specify the number of tables required to generate visualizations (table_type`: all, single, or multiple),
# and indicate whether to include irrelevant tables (`with_irrelevant_tables`).
dataset = Dataset(folder, table_type, with_irrelevant_tables)

config = {"library": args.library}
result = evaluator.evaluate(agent, dataset, config)
score = result.score()
print(f"Score: {score}")

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct.For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.

Privacy Statement

This project has adopted the Microsoft Privacy Statement.

Citation

If you find that VisEval helps your research, please consider citing it:

@misc{chen2024viseval,
      title={VisEval: A Benchmark for Data Visualization in the Era of Large Language Models}, 
      author={Nan Chen and Yuge Zhang and Jiahang Xu and Kan Ren and Yuqing Yang},
      year={2024},
      eprint={2407.00981},
      archivePrefix={arXiv},
      primaryClass={cs.HC},
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vis_evaluator-0.0.3.tar.gz (33.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vis_evaluator-0.0.3-py3-none-any.whl (31.6 kB view details)

Uploaded Python 3

File details

Details for the file vis_evaluator-0.0.3.tar.gz.

File metadata

  • Download URL: vis_evaluator-0.0.3.tar.gz
  • Upload date:
  • Size: 33.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.9

File hashes

Hashes for vis_evaluator-0.0.3.tar.gz
Algorithm Hash digest
SHA256 21b79639d82c4cb6d639b7abdc36b37e4a6c0566b39ddddb4fc3e8900de7df9e
MD5 5c760e1fe04855f085748f0a1e57eed5
BLAKE2b-256 e82164cac40effaee77480697bdce7542d094881f3a9cb3391cde0814ef4ad84

See more details on using hashes here.

File details

Details for the file vis_evaluator-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: vis_evaluator-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 31.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.9

File hashes

Hashes for vis_evaluator-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 b675d45746b46bc90a525d0bf0d073985d0385edb88509a8fa8a2d5e866e5a81
MD5 ce4da6a1a606d559549f3277fc2c70fc
BLAKE2b-256 25b8371004e11df63dff3d768137e236e027b774f3876a8eb00f8936318c9c35

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page