Skip to main content

pygwalker: turn your data into an interactive UI for data exploration and visualization

Project description

English | Español | Français | Deutsch | 中文 | Türkçe | 日本語 | 한국어 | Русский

PyGWalker: A Python Library for Exploratory Data Analysis with Visualization

PyPI version binder PyPI downloads conda-forge

discord invitation link Twitter Follow Join Kanaries on Slack

PyGWalker can simplify your Jupyter Notebook data analysis and data visualization workflow, by turning your pandas dataframe into an interactive user interface for visual exploration.

PyGWalker (pronounced like "Pig Walker", just for fun) is named as an abbreviation of "Python binding of Graphic Walker". It integrates Jupyter Notebook with Graphic Walker, an open-source alternative to Tableau. It allows data scientists to visualize / clean / annotates the data with simple drag-and-drop operations and even natural language queries.

https://github.com/Kanaries/pygwalker/assets/22167673/2b940e11-cf8b-4cde-b7f6-190fb10ee44b

[!TIP] If you want more AI features, we also build runcell, an AI Code Agent in Jupyter that understands your code/data/cells and generate code, execute cells and take actions for you. It can be used in jupyter lab with pip install runcell

https://github.com/user-attachments/assets/9ec64252-864d-4bd1-8755-83f9b0396d38

Visit Google Colab, Kaggle Code or Graphic Walker Online Demo to test it out!

If you prefer using R, check GWalkR, the R wrapper of Graphic Walker. If you prefer a Desktop App that can be used offline and without any coding, check out PyGWalker Desktop.

Features

PyGWalker is a Python library that simplifies data analysis and visualization workflows by turning pandas DataFrames into interactive visual interfaces. It offers a variety of features that make it a powerful tool for data exploration:

  • Interactive Data Exploration:
    • Drag-and-drop interface for easy visualization creation.  
    • Real-time updates as you make changes to the visualization.
    • Ability to zoom, pan, and filter the data.  
  • Data Cleaning and Transformation:
    • Visual data cleaning tools to identify and remove outliers or inconsistencies.  
    • Ability to create new variables and features based on existing data.  
  • Advanced Visualization Capabilities:
    • Support for various chart types (bar charts, line charts, scatter plots, etc.).
    • Customization options for colors, labels, and other visual elements.  
    • Interactive features like tooltips and drill-down capabilities.  
  • Integration with Jupyter Notebooks:
    • Seamless integration with Jupyter Notebooks for a smooth workflow.  
  • Open-Source and Free:
    • Available for free and allows for customization and extension.

Getting Started

Check our video tutorial about using pygwalker, pygwalker + streamlit and pygwalker + snowflake, How to explore data with PyGWalker in Python

Run in Kaggle Run in Colab
Kaggle Code Google Colab

Setup pygwalker

Before using pygwalker, make sure to install the packages through the command line using pip or conda.

pip

pip install pygwalker

Note

For an early trial, you can install with pip install pygwalker --upgrade to keep your version up to date with the latest release or even pip install pygwalker --upgrade --pre to obtain latest features and bug-fixes.

Conda-forge

conda install -c conda-forge pygwalker

or

mamba install -c conda-forge pygwalker

See conda-forge feedstock for more help.

Use pygwalker in Jupyter Notebook

Quick Start

Import pygwalker and pandas to your Jupyter Notebook to get started.

import pandas as pd
import pygwalker as pyg

You can use pygwalker without breaking your existing workflow. For example, you can call up PyGWalker with the dataframe loaded in this way:

df = pd.read_csv('./bike_sharing_dc.csv')
walker = pyg.walk(df)

That's it. Now you have an interactive UI to analyze and visualize data with simple drag-and-drop operations.

Cool things you can do with PyGwalker:

  • You can change the mark type into others to make different charts, for example, a line chart: graphic walker line chart

  • To compare different measures, you can create a concat view by adding more than one measure into rows/columns. graphic walker area chart

  • To make a facet view of several subviews divided by the value in dimension, put dimensions into rows or columns to make a facets view. graphic walker scatter chart

  • PyGWalker contains a powerful data table, which provides a quick view of data and its distribution, profiling. You can also add filters or change the data types in the table.

pygwalker-data-preview
  • You can save the data exploration result to a local file

Better Practices

There are some important parameters you should know when using pygwalker:

  • spec: for save/load chart config (json string or file path)
  • kernel_computation: for using duckdb as computing engine which allows you to handle larger dataset faster in your local machine.
  • use_kernel_calc: Deprecated, use kernel_computation instead.
df = pd.read_csv('./bike_sharing_dc.csv')
walker = pyg.walk(
    df,
    spec="./chart_meta_0.json",    # this json file will save your chart state, you need to click save button in ui mannual when you finish a chart, 'autosave' will be supported in the future.
    kernel_computation=True,          # set `kernel_computation=True`, pygwalker will use duckdb as computing engine, it support you explore bigger dataset(<=100GB).
)

Example in local notebook

Example in cloud notebook

Programmatic Export of Charts

After saving a chart from the UI, you can retrieve the image directly from Python.

walker = pyg.walk(df, spec="./chart_meta_0.json")
# edit the chart in the UI and click the save button
walker.save_chart_to_file("Chart 1", "chart1.svg", save_type="svg")
png_bytes = walker.export_chart_png("Chart 1")
svg_bytes = walker.export_chart_svg("Chart 1")

Use pygwalker in Streamlit

Streamlit allows you to host a web version of pygwalker without figuring out details of how web application works.

Here are some of the app examples build with pygwalker and streamlit:

from pygwalker.api.streamlit import StreamlitRenderer
import pandas as pd
import streamlit as st

# Adjust the width of the Streamlit page
st.set_page_config(
    page_title="Use Pygwalker In Streamlit",
    layout="wide"
)

# Add Title
st.title("Use Pygwalker In Streamlit")

# You should cache your pygwalker renderer, if you don't want your memory to explode
@st.cache_resource
def get_pyg_renderer() -> "StreamlitRenderer":
    df = pd.read_csv("./bike_sharing_dc.csv")
    # If you want to use feature of saving chart config, set `spec_io_mode="rw"`
    return StreamlitRenderer(df, spec="./gw_config.json", spec_io_mode="rw")


renderer = get_pyg_renderer()

renderer.explorer()

API Reference

pygwalker.walk

Parameter Type Default Description
dataset Union[DataFrame, Connector] - The dataframe or connector to be used.
gid Union[int, str] None ID for the GraphicWalker container div, formatted as 'gwalker-{gid}'.
env Literal['Jupyter', 'JupyterWidget'] 'JupyterWidget' Environment using pygwalker.
field_specs Optional[Dict[str, FieldSpec]] None Specifications of fields. Will be automatically inferred from dataset if not specified.
hide_data_source_config bool True If True, hides DataSource import and export button.
theme_key Literal['vega', 'g2'] 'g2' Theme type for the GraphicWalker.
appearance Literal['media', 'light', 'dark'] 'media' Theme setting. 'media' will auto-detect the OS theme.
spec str "" Chart configuration data. Can be a configuration ID, JSON, or remote file URL.
use_preview bool True If True, uses the preview function.
kernel_computation bool False If True, uses kernel computation for data.
**kwargs Any - Additional keyword arguments.

Development

Refer it: local-development

Tested Environments

  • Jupyter Notebook
  • Google Colab
  • Kaggle Code
  • Jupyter Lab
  • Jupyter Lite
  • Databricks Notebook (Since version 0.1.4a0)
  • Jupyter Extension for Visual Studio Code (Since version 0.1.4a0)
  • Most web applications compatiable with IPython kernels. (Since version 0.1.4a0)
  • Streamlit (Since version 0.1.4.9), enabled with pyg.walk(df, env='Streamlit')
  • DataCamp Workspace (Since version 0.1.4a0)
  • Panel. See panel-graphic-walker.
  • marimo (Since version 0.4.9.11)
  • Hex Projects
  • ...feel free to raise an issue for more environments.

Configuration And Privacy Policy(pygwalker >= 0.3.10)

You can use pygwalker config to set your privacy configuration.

$ pygwalker config --help

usage: pygwalker config [-h] [--set [key=value ...]] [--reset [key ...]] [--reset-all] [--list]

Modify configuration file. (default: ~/Library/Application Support/pygwalker/config.json) 
Available configurations:

- privacy  ['offline', 'update-only', 'events'] (default: events).
    "offline": fully offline, no data is send or api is requested
    "update-only": only check whether this is a new version of pygwalker to update
    "events": share which events about which feature is used in pygwalker, it only contains events data about which feature you arrive for product optimization. No DATA YOU ANALYSIS IS SEND. Events data will bind with a unique id, which is generated by pygwalker when it is installed based on timestamp. We will not collect any other information about you.
    
- kanaries_token  ['your kanaries token'] (default: empty string).
    your kanaries token, you can get it from https://kanaries.net.
    refer: https://space.kanaries.net/t/how-to-get-api-key-of-kanaries.
    by kanaries token, you can use kanaries service in pygwalker, such as share chart, share config.
    

options:
  -h, --help            show this help message and exit
  --set [key=value ...]
                        Set configuration. e.g. "pygwalker config --set privacy=update-only"
  --reset [key ...]     Reset user configuration and use default values instead. e.g. "pygwalker config --reset privacy"
  --reset-all           Reset all user configuration and use default values instead. e.g. "pygwalker config --reset-all"
  --list                List current used configuration.

More details, refer it: How to set your privacy configuration?

License

Apache License 2.0

Contribution Guideline

You are encouraged to contribute to PyGWalker in any way that suits your interests. This may include:

  • Answering questions and providing support
  • Sharing ideas for new features
  • Reporting bugs and glitches
  • Contributing code to the project
  • Offering suggestions for website improvements and better documentation

Resources

PyGWalker Cloud is released! You can now save your charts to cloud, publish the interactive cell as a web app and use advanced GPT-powered features. Check out the PyGWalker Cloud for more details.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pygwalker-0.5.0.0.tar.gz (4.9 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pygwalker-0.5.0.0-py3-none-any.whl (4.8 MB view details)

Uploaded Python 3

File details

Details for the file pygwalker-0.5.0.0.tar.gz.

File metadata

  • Download URL: pygwalker-0.5.0.0.tar.gz
  • Upload date:
  • Size: 4.9 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for pygwalker-0.5.0.0.tar.gz
Algorithm Hash digest
SHA256 cb2f11088c7308da03dc21a176a03dd8eb77dc611835deffd24168960e48f6de
MD5 2cfbe79ef9622caa149733bd4f5fb6e4
BLAKE2b-256 423e01b1f66339acda385a44ceb0b72311081dc273bf88b0fdca6966a80a73ee

See more details on using hashes here.

File details

Details for the file pygwalker-0.5.0.0-py3-none-any.whl.

File metadata

  • Download URL: pygwalker-0.5.0.0-py3-none-any.whl
  • Upload date:
  • Size: 4.8 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for pygwalker-0.5.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 04ade2f3e616ee5360898dbcc7cea0471a7bac6e1f379ecc4888bb000c8b1a71
MD5 2cecb86b43e0e78a4b27544dab2087c8
BLAKE2b-256 f2bd8819e39767bf7b633deb1a4e5d735cfc54954ab5a883ec9b582dccc146d8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page