Skip to main content

A collection of ready-made statistical graphics for vega

Project description

Version Build Status Code Coverage Documentation Status Updates Maintainability

A collection of ready-made statistical graphics for vega.

vega is a statistical graphics system for the web, meaning the plots are displayed in a browser. As an added bonus, it adds interactions, again through web technologies: select data point, reveal information on hover etc. Interaction and the web are clearly the future of statistical graphics. Even the successor to the famous ggplot for R, ggvis is based on vega.

altair is a python package that produces vega graphics. Like vega, it adopts an approach to describing statistical graphics known as grammar of graphics which underlies other well known packages such as ggplot for R. It represents a extremely useful compromise of power and flexibility. Its elements are data, marks (points, lines), encodings (relations between data and marks), scales etc.

Sometimes we want to skip all of that and just produce a boxplot (or heatmap or histogram, the argument is the same) by calling:

boxplot(data.iris(), columns="petalLength", group_by="species")

because:

  • It’s a well known type of statistical graphics that everyone can recognize and understand on the fly.

  • Creativity is nice, in statistical graphics as in many other endeavors, but dangerous: there are more bad charts out there than good ones. The grammar of graphics is no insurance.

  • While it’s simple to put together a boxplot in altair, it isn’t trivial: there are rectangles, vertical lines, horizontal lines (whiskers), points (outliers). Each element is related to a different statistics of the data. It’s about 30 lines of code and, unless you run them, it’s hard to tell you are looking at a boxplot.

  • One doesn’t always need the control that the grammar of graphics affords. There are times when I need to see a boxplot as quick as possible. Others, for instance preparing a publication, when I need to control every detail.

The boxplot is not the only example. The scatterplot, the quantile-quantile plot, the heatmap are important idioms that are battle tested in data analysis practice. They deserve their own abstraction. Other packages offering an abstraction above the grammar level are:

  • seaborn and the graphical subset of pandas, for example, both provide high level statistical graphics primitives (higher than the grammar of graphics) and they are quite successful (but not web-based).

  • ggplot, even if named after the Grammar of Graphics, slipped in some more complex charts, pretending they are elements of the grammar, such as geom_boxplot, because sometimes even R developers are lazy. But a boxplot is not a geom or mark. It’s a combination of several ones, certain statistics and so on. I suspect the authors of altair know better than mixing the two levels.

altair_recipes aims to fill this space above altair while making full use of its features. It provides a growing list of “classic” statistical graphics without going down to the grammar level. At the same time it is hoped that, over time, it can become a repository of examples and model best practices for altair, a computable form of its gallery.

There is one more thing. It’s nice to have all these famous chart types available at a stroke of the keyboard, but we still have to decide which type of graphics to use and, in certain cases, the association between variables in the data and channels in the graphics (what becomes coordinate, what becomes color etc.). It still is work and things can still go wrong, sometimes in subtle ways. Enter autoplot. autoplot inspects the data, selects a suitable graphics and generates it. While no claim is made that the result is optimal, it will make reasonable choices and avoid common pitfalls, like overlapping points in scatterplots. While there are interesting research efforts aimed at characterizing the optimal graphics for a given data set, their goal is more ambitious than just selecting from a repertoire of pre-defined graphics types and they are fairly complex. Therefore, at this time autoplot is based on a set of reasonable heuristics derived from decades of experience such as:

  • use stripplot and scatterplot to display continuous data, barcharts for discrete data

  • use opacity to counter mark overlap, but not with discrete color maps

  • switch to summaries (count and averages) when the amount of overlap is too high

  • use facets for discrete data

autoplot is work in progress and perhaps will always be and feedback is most welcome. A large number of charts generated with it is available at the end of the Examples page and should give a good idea of what it does. In particular, in this first iteration we do not make any attempt to detect if a dataset represents a function or a relation, hence scatterplots are preferred over line plots. Moreover there is no special support for evenly spaced data, such as a time series.

Features

  • Free software: BSD license.

  • Fully documented.

  • Highly consistent API enforced with autosig

  • Near 100% regression test coverage.

  • Support for both wide and long format.

  • Data can be provided as a dataframe or as a URL pointing to a csv or json file.

  • All charts produced are valid altair charts, can be modified, combined, saved, served, embedded exactly as one.

Chart types

  • autocorrelation

  • barchart

  • boxplot

  • heatmap

  • histogram, in a simple and multi-variable version

  • qqplot

  • scatterplot in the simple and all-vs-all versions

  • smoother, smoothing line with IRQ range shading

  • stripplot

See Examples.

Credits

This package was created with Cookiecutter and the elgertam/cookiecutter-pipenv project template, based on audreyr/cookiecutter-pypackage.

History

0.6.0 (2019-01-25)

  • Fine tuned API: * no faceting but all returned charts are facet-able * Color made a bool option when separate color dim can’t work * Eliminated some special cases from autoplot for very small datasets * Some refactor in boxpolot and autoplot to shrink, clarify code

0.5.0 (2019-01-17)

  • Autoplot for automatic statistical graphics

  • Stripplots and barcharts

0.4.0 (2018-09-25)

  • Custom height and width for all charts

0.3.2 (2018-09-21)

  • Dealt with breaking changes from autosig, but code is simpler and paves the way for some new features

0.3.1 (2018-09-20)

  • Addressing a documentation mishap

0.3.0 (2018-09-20)

  • Better readme and a raft of examples

  • Some test flakiness addressed

0.2.4 (2018-08-29)

  • One more issue with col resolution

  • Switch to using docstring support in autosig

0.2.3 (2018-08-29)

  • Some issues with processing of columns and group_by args

  • Fixed travis-ci build (3.6 only, 3.5 looks like a minor RNG issue)

0.2.2 (2018-08-28)

  • Switch to a simpler, flatter API a la qplot

  • Added two types of heatmaps

  • Extensive use of autosig features for API consistency and reduced boilerplate

  • Fixed build to follow requests model (pip for users, pipenv for devs)

0.1.2 (2018-08-14)

  • Fixed a number of loose ends particularly wrt docs

0.1.0 (2018-08-06)

  • First release on PyPI.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

altair_recipes-0.6.0.tar.gz (3.6 MB view details)

Uploaded Source

Built Distribution

altair_recipes-0.6.0-py2.py3-none-any.whl (17.4 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file altair_recipes-0.6.0.tar.gz.

File metadata

  • Download URL: altair_recipes-0.6.0.tar.gz
  • Upload date:
  • Size: 3.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/39.1.0 requests-toolbelt/0.8.0 tqdm/4.24.0 CPython/3.6.6

File hashes

Hashes for altair_recipes-0.6.0.tar.gz
Algorithm Hash digest
SHA256 22bf9f1d889c210fa977602b8537e412aaf3f00b3234ea65c4aaa66203bd056e
MD5 ea2c27fdbe96247bc28fc60e7f94908b
BLAKE2b-256 71aeb46860e15f870725bad3eed651850bac41b10ec54fdab4ca40cd574bae2a

See more details on using hashes here.

File details

Details for the file altair_recipes-0.6.0-py2.py3-none-any.whl.

File metadata

  • Download URL: altair_recipes-0.6.0-py2.py3-none-any.whl
  • Upload date:
  • Size: 17.4 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/39.1.0 requests-toolbelt/0.8.0 tqdm/4.24.0 CPython/3.6.6

File hashes

Hashes for altair_recipes-0.6.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 8060e82983cb2940c5a08b3ed297f4e74e4a21a7f2d3ad0a9613070327bbdd4b
MD5 aac121142b4b3747df1efcd20d0a0c22
BLAKE2b-256 8a50bf91b4abfe2d61effaac5797ab6bd8e5e80743bd22d130b37cc4ea78e324

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page