Data visualization toolchain based on aggregating into a grid
NOTE: Due to non-Python dependencies, Datashader is not distributed via PyPI; this page is only a placeholder. See below!
Traditional visualization systems treat plotting as a unitary process transforming incoming data into an onscreen or printed image, with parameters that can be specified beforehand that affect the final result. While this approach works for small collections of data that can be viewed in their entirety, for large datasets the visualization is often the only way to understand what the data consists of, and there is no objective way to set the parameters to reveal this data.
The datashader library breaks up the rendering pipeline into a series of stages where user-defined computations can be performed, allowing the visualization to adapt to and reveal the underlying properties of the dataset. I.e., the datashader pipeline allows computation on the visualization, not just on the dataset, allowing it to do automatic ranging and scaling that takes the current visualization constraints into account. For instance, where a traditional system would use a transparency/opacity parameter to show the density of overlapping points in a scatterplot, datashader can automatically calculate how many datapoints are mapped to each pixel, scaling the representation to accurately convey the data at every location, with no saturation, overplotting, or underplotting issues.
Datashader is distributed via conda and git. Datashader itself is pure Python, but for it to have acceptable performance it strictly requires Numba, which relies on complicated non-Python dependencies that PyPI and pip cannot manage well. To avoid those problems, datashader is not currently available via PyPI/pip, and so this page is only a placeholder to ensure that other projects do not cause confusion by choosing a similar name.
To get started with datashader, see the datashader website.