Skip to main content

LIDA: Automatic Generation of Visualizations from Data

Project description

LIDA: Automatic Generation of Visualizations and Infographics using Large Language Models

PyPI version Open In Colab

LIDA uses off-the-shelf large language models to generate grammar-agnostic visualization specifications and data-faithful infographics.

Note:: To create visualizations, LIDA generates and executes code. Ensure that you run LIDA in a secure environment. Acknowledge this by setting the environment variable LIDA_ALLOW_CODE_EVAL=1

How it works

LIDA comprises of 4 modules - A SUMMARIZER that converts data into a rich but compact natural language summary, a GOAL EXPLORER that enumerates visualization goals given the data, a VISGENERATOR that generates, refines, executes and filters visualization code and an INFOGRAPHER module (tbd) that yields data-faithful stylized graphics using IGMs. LIDA provides a python api, and a hybrid user interface (direct manipulation and multilingual natural language) for interactive chart, infographics and data story generation.

lida components

Details on the components of LIDA are described in the paper here and in this tutorial notebook.

Requirements and Installation

Verify Environment - Python 3.10+. Setup and verify that your python environment is python 3.10 or higher (preferably, use Conda).

Once requirements are met, setup your api key and run the following command to install the library in the repository root:

Setup your openai api key

export OPENAI_API_KEY=<your key>
pip install lida

Alternatively you can install the library in dev model by cloning this repo and running pip install -e . in the repository root.

Features and Python API

LIDA provides a python api for generating visualizations and infographics - data summary generation, visualization goals, visualization generation, visualization editing. Learn more about the api (e.g., switching between LLM providers such as Cohere, PaLM, Huggingface etc) by running the tutorial notebook.

Data Summarization

from lida.modules import Manager

lida = Manager()
summary = lida.summarize("data/cars.json") # generate data summary

Visualization Goal Generation

goals = lida.generate_goals(summary, n=5) # generate goals

Visualization Generation

# generate code specifications for charts
vis_specs = lida.generate_viz(summary=summary, goal=goals[0], library="matplotlib") # altair, matplotlib etc

# execute code to return charts (raster images or other formats)
charts = lida.execute_viz(code_specs=vis_specs)

Visualization Editing

# modify chart using natural language
instructions = ["convert this to a bar chart", "change the color to red", "change y axes label to Fuel Efficiency"]
vis_specs = lida.edit_viz(code=charts[0].code,  summary=summary, instructions=instructions, library="matplotlib")
edited_chartspecs = lida.execute_viz(code_specs=vis_specs, data=manager.data)

LIDA also supports other operations like visualization explanations, repair, recommendation etc. See the tutorial notebook for more details.

Getting Started

The fastest and recommended way to get started after installation will be to try out the web ui or run the tutorial notebook.

Web UI

You can use the library from the bundled ui by running the following command:

lida ui  --port=8080

Then navigate to http://localhost:8080/ in your browser.

Finally, you can call lida from your application via its web api. To view the web api specification, navigate to http://localhost:8080/api/docs in your browser.

Documentation and Citation

A short paper describing LIDA (Accepted at ACL 2023 Conference) is available here.

@article{dibia2023lida,
      title={LIDA: A Tool for Automatic Generation of Grammar-Agnostic Visualizations and Infographics using Large Language Models},
      author={Victor Dibia},
      year={2023},
      eprint={2303.02927},
      archivePrefix={arXiv},
      primaryClass={cs.AI}
}

LIDA builds on insights in automatic generation of visualizaiton from an earlier paper - Data2Vis: Automatic Generation of Data Visualizations Using Sequence to Sequence Recurrent Neural Networks.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lida-0.0.3.tar.gz (15.1 MB view hashes)

Uploaded Source

Built Distribution

lida-0.0.3-py3-none-any.whl (14.1 MB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page