A scatter plot extension for Jupyter Notebook and Lab
Why? Imagine trying to explore an embedding space of millions of data points. Besides plotting the space as a 2D scatter, the exploration typically involves three things: First, we want to interactively adjust the view (e.g., via panning & zooming) and the visual point encoding (e.g., the point color, opacity, or size). Second, we want to be able to select/highlight points. And third, we want to compare multiple embeddings (e.g., via animation, color, or point connections). The goal of jupyter-scatter is to support all three requirements and scale to millions of points.
pip install jupyter-scatter
If you are using JupyterLab <=2:
jupyter labextension install @jupyter-widgets/jupyterlab-manager jupyter-scatter
For a minimal working example, take a look at test-environments.
To play with the following examples yourself, open notebooks/get-started.ipynb.
Also check out our full-blown tutorial that we first presented at the SciPy '23 conference.
In the simplest case, you can pass the x/y coordinates to the plot function as follows:
import jscatter import numpy as np x = np.random.rand(500) y = np.random.rand(500) jscatter.plot(x, y)
Say your data is stored in a Pandas dataframe like the following:
import pandas as pd # Just some random float and int values data = np.random.rand(500, 4) df = pd.DataFrame(data, columns=['mass', 'speed', 'pval', 'group']) # We'll convert the `group` column to strings to ensure it's recognized as # categorical data. This will come in handy in the advanced example. df['group'] = df['group'].map(lambda c: chr(65 + round(c)), na_action=None)
You can then visualize this data by referencing column names:
jscatter.plot(data=df, x='mass', y='speed')
Show the resulting scatter plot
Often you want to customize the visual encoding, such as the point color, size, and opacity.
jscatter.plot( data=df, x='mass', y='speed', size=8, # static encoding color_by='group', # data-driven encoding opacity_by='density', # view-driven encoding )
In the above example, we chose a static point size of
8. In contrast, the point color is data-driven and assigned based on the categorical
group value. The point opacity is view-driven and defined dynamically by the number of points currently visible in the view.
Also notice how jscatter uses an appropriate color map by default based on the data type used for color encoding. In this examples, jscatter uses the color blindness safe color map from Okabe and Ito as the data type is
categorical and the number of categories is less than
Important: in order for jscatter to recognize categorical data, the
dtype of the corresponding column needs to be
You can, of course, customize the color map and many other parameters of the visual encoding as shown next.
Functional API Example
The flat API can get overwhelming when you want to customize a lot of properties. Therefore, jscatter provides a functional API that groups properties by type and exposes them via meaningfully-named methods.
scatter = jscatter.Scatter(data=df, x='mass', y='speed') scatter.selection(df.query('mass < 0.5').index) scatter.color(by='mass', map='plasma', order='reverse') scatter.opacity(by='density') scatter.size(by='pval', map=[2, 4, 6, 8, 10]) scatter.height(480) scatter.background('black') scatter.show()
When you update properties dynamically, i.e., after having called
scatter.show(), the plot will update automatically. For instance, try calling
scatter.xy('speed', 'mass')and you will see how the points are mirrored along the diagonal.
Moreover, all arguments are optional. If you specify arguments, the methods will act as setters and change the properties. If you call a method without any arguments it will act as a getter and return the property (or properties). For example,
scatter.selection() will return the currently selected points.
Finally, the scatter plot is interactive and supports two-way communication. Hence, if you select some point with the lasso tool and then call
scatter.selection() you will get the current selection.
Linking Scatter Plots
To explore multiple scatter plots and have their view, selection, and hover interactions link, use
jscatter.link([ jscatter.Scatter(data=embeddings, x='pcaX', y='pcaY', **config), jscatter.Scatter(data=embeddings, x='tsneX', y='tsneY', **config), jscatter.Scatter(data=embeddings, x='umapX', y='umapY', **config), jscatter.Scatter(data=embeddings, x='caeX', y='caeY', **config) ], rows=2)
See notebooks/linking.ipynb for more details.
Visualize Millions of Data Points
jupyter-scatter you can easily visualize and interactively explore datasets with millions of points.
In the following we're visualizing 5 million points generated with the Rössler attractor.
points = np.asarray(roesslerAttractor(5000000)) jscatter.plot(points[:,0], points[:,1], height=640)
See notebooks/examples.ipynb for more details.
While jscatter is primarily developed for Jupyter Lab and Notebook, it also runs just fine in Google Colab. See jupyter-scatter-colab-test.ipynb for an example.
Setting up a development environment
- Conda >= 4.8
git clone https://github.com/flekschas/jupyter-scatter/ jscatter && cd jscatter conda env create -f environment.yml && conda activate jscatter pip install -e ".[test]"
After Changing Python code: simply restart the kernel.
cd js && npm run build.
Alternatively you can run
npm run watch and rebundle the code on the fly.
Setting up a test environment
Go to test-environment and follow the detailed instructions
Release history Release notifications | RSS feed
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Hashes for jupyter_scatter-0.14.3-py2.py3-none-any.whl