Data Visualization library using matplotlib for both long and wide data

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

Dexplot

Dexplot is a Python library for delivering beautiful data visualizations with a simple and intuitive user experience.

Goals

The primary goals for dexplot are:

Maintain a very consistent API with as few functions as necessary to make the desired statistical plots
Allow the user to tweak the plots without digging into matplotlib

Installation

pip install dexplot

Built for long and wide data

Dexplot is primarily built for long data, which is a form of data where each row represents a single observation and each column represents a distinct quantity. It is often referred to as "tidy" data. Here, we have some long data.

png

Dexplot also has the ability to handle wide data, where multiple columns may contain values that represent the same kind of quantity. The same data above has been aggregated to show the mean for each combination of neighborhood and property type. It is now wide data as each column contains the same quantity (price).

png

Usage

Dexplot provides a small number of powerful functions that all work similarly. Most plotting functions have the following signature:

dxp.plotting_func(x, y, data, aggfunc, split, row, col, orientation, ...)

x - Column name along the x-axis
y - Column name the y-axis
data - Pandas DataFrame
aggfunc - String of pandas aggregation function, 'min', 'max', 'mean', etc...
split - Column name to split data into distinct groups
row - Column name to split data into distinct subplots row-wise
col - Column name to split data into distinct subplots column-wise
orientation - Either vertical ('v') or horizontal ('h'). Default for most plots is vertical.

When aggfunc is provided, x will be the grouping variable and y will be aggregated when vertical and vice-versa when horizontal. The best way to learn how to use dexplot is with the examples below.

Families of plots

There are two primary families of plots, aggregation and distribution. Aggregation plots take a sequence of values and return a single value using the function provided to aggfunc to do so. Distribution plots take a sequence of values and depict the shape of the distribution in some manner.

Aggregation
- bar
- line
- scatter
- count
Distribution
- box
- violin
- hist
- kde

Comparison with Seaborn

If you have used the seaborn library, then you should notice a lot of similarities. Much of dexplot was inspired by Seaborn. Below is a list of the extra features in dexplot not found in seaborn

The ability to graph relative frequency percentage and normalize over any number of variables
Far fewer public functions
No need for multiple functions to do the same thing
Ability to make grids with a single function instead of having to use a higher level function like catplot
Pandas groupby methods are available as strings
Ability to sort by values
Ability to sort x/y labels lexicographically
Both x/y-labels and titles are automatically wrapped so that they don't overlap
The figure size (plus several other options) and available to change without using matplotlib
Only matplotlib objects are returned

Examples

Most of the examples below use long data.

Aggregating plots - bar, line and scatter

We'll begin by covering the plots that aggregate. An aggregation is defined as a function that summarizes a sequence of numbers with a single value.

The examples come from the Airbnb dataset, which contains many property rental listings from the Washington D.C. area.

import dexplot as dxp
airbnb = dxp.load_dataset('airbnb')
airbnb.head()

	neighborhood	property_type	accommodates	bathrooms	bedrooms	price	cleaning_fee	rating	superhost	response_time	latitude	longitude
0	Shaw	Townhouse	16	3.5	4	433	250	95.0	No	within an hour	38.90982	-77.02016
1	Brightwood Park	Townhouse	4	3.5	4	154	50	97.0	No	NaN	38.95888	-77.02554
2	Capitol Hill	House	2	1.5	1	83	35	97.0	Yes	within an hour	38.88791	-76.99668
3	Shaw	House	2	2.5	1	475	0	98.0	No	NaN	38.91331	-77.02436
4	Kalorama Heights	Apartment	3	1.0	1	118	15	91.0	No	within an hour	38.91933	-77.04124

There are more than 4,000 listings in our dataset. We will use bar charts to aggregate the data.

airbnb.shape

(4581, 12)

Vertical bar charts

In order to performa an aggregation, you must supply a value for aggfunc. Here, we find the median price per neighborhood. Notice that the column names automatically wrap.

dxp.bar(x='neighborhood', y='price', data=airbnb, aggfunc='median')

png

Components of the groupby aggregation

Anytime the aggfunc parameter is set, you have performed a groupby aggregation, which always consists of three components:

Grouping column - unique values of this column form independent groups (neighborhood)
Aggregating column - the column that will get summarized with a single value (price)
Aggregating function - a function that returns a single value (median)

The general format for doing this in pandas is:

df.groupby('grouping column').agg({'aggregating column': 'aggregating function'})

Specifically, the following code is executed within dexplot.

airbnb.groupby('neighborhood').agg({'price': 'median'})

	price
neighborhood
Brightwood Park	87.0
Capitol Hill	129.5
Columbia Heights	95.0
Dupont Circle	125.0
Edgewood	100.0
Kalorama Heights	118.0
Shaw	133.5
Union Station	120.0

Sorting the bars

By default, the grouping column (x-axis here) will be sorted in alphabetical order. Use the sort parameter to specify how its sorted.

lex_asc - sort lexicographically A to Z (default)
lex_desc - sort lexicographically Z to A
asc - sort values from least to greatest
desc - sort values from greatest to least
None - Use order of appearance in DataFrame

fig = dxp.bar(x='neighborhood', y='price', data=airbnb, aggfunc='median', sort='lex_desc')
fig

png

dxp.bar(x='neighborhood', y='price', data=airbnb, aggfunc='median', sort='asc')

png

Specify order with `x_order`

Specify a specific order of the values on the x-axis by passing a list of values to x_order. This can also act as a filter to limit the number of bars.

dxp.bar(x='neighborhood', y='price', data=airbnb, aggfunc='median',
        x_order=['Dupont Circle', 'Edgewood', 'Union Station'])

png

Horizontal bars

Set orientation to 'h' for horizontal bars. When you do this, you'll need to switch x and y since the grouping column (neighborhood) will be along the y-axis and the aggregating column (price) will be along the x-axis.

dxp.bar(x='price', y='neighborhood', data=airbnb, aggfunc='median', orientation='h')

png

Split bars into groups

You can split each bar into further groups by setting the split parameter to another column.

dxp.bar(x='neighborhood', y='price', data=airbnb, aggfunc='median', split='superhost')

png

We can use the pivot_table method to replicate the results in pandas.

airbnb.pivot_table(index='neighborhood', columns='superhost', 
                   values='price', aggfunc='median')

superhost	No	Yes
neighborhood
Brightwood Park	85.0	90.0
Capitol Hill	129.0	130.0
Columbia Heights	90.5	103.0
Dupont Circle	120.0	135.0
Edgewood	100.0	100.0
Kalorama Heights	110.0	124.0
Shaw	130.0	135.0
Union Station	120.0	125.0

Set the order of the unique split values with split_order, which can also act as a filter.

dxp.bar(x='neighborhood', y='price', data=airbnb, aggfunc='median', 
        split='superhost', split_order=['Yes', 'No'])

png

Stacked bar charts

Stack all the split groups one on top of the other by setting stacked to True.

dxp.bar(x='neighborhood', y='price', data=airbnb, aggfunc='median', 
        split='superhost', split_order=['Yes', 'No'], stacked=True)

png

Split into multiple plots

It's possible to split the data further into separate plots by the unique values in a different column with the row or col parameter. Here, each kind of property_type has its own plot.

dxp.bar(x='neighborhood', y='price', data=airbnb, aggfunc='median', 
        split='superhost', col='property_type')

png

If there isn't room for all of the plots, set the wrap parameter to an integer to set the maximum number of plots per row/col.

dxp.bar(x='neighborhood', y='price', data=airbnb, aggfunc='median', 
        split='superhost', col='property_type', wrap=2)

png

Use col_order to both filter and set a specific order for the plots.

dxp.bar(x='neighborhood', y='price', data=airbnb, aggfunc='median',
        split='superhost', col='property_type', col_order=['House', 'Condominium'])

png

Splits can be made simultaneously along row and columns.

dxp.bar(x='neighborhood', y='price', data=airbnb, aggfunc='median',
        split='superhost', col='property_type', col_order=['House', 'Condominium', 'Apartment'],
        row='bedrooms', row_order=[0, 1, 2, 3])

png

By default, all axis limits are shared. Allow each plot to set its own limits with the sharex and sharey parameters.

dxp.bar(x='neighborhood', y='price', data=airbnb, aggfunc='median',
        split='superhost', col='property_type', col_order=['House', 'Condominium', 'Apartment'],
        row='bedrooms', row_order=[0, 1, 2, 3], sharey=False)

png

Set the width of each bar with `size`

The width of the bars is set with the size parameter.

dxp.bar(x='neighborhood', y='price', data=airbnb, aggfunc='median', split='property_type',
       split_order=['Apartment', 'House'], x_order=['Dupont Circle', 'Capitol Hill', 'Union Station'], size=.5)

png

Distribution plots - box, violin, histogram, kde

Distribution plots work similarly, but do not have an aggfunc since they do not aggregate.

dxp.box(x='price', y='neighborhood', data=airbnb)

png

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.1.4

Jun 16, 2020

0.1.3

Jun 14, 2020

0.1.2

Jun 14, 2020

0.1.1

Jun 12, 2020

0.1.0

Jun 11, 2020

This version

0.0.10

Jun 6, 2020

0.0.9

Oct 7, 2018

0.0.8

Oct 7, 2018

0.0.7

Oct 7, 2018

0.0.6

Oct 4, 2018

0.0.5

Oct 4, 2018

0.0.4

Oct 4, 2018

0.0.3

Oct 4, 2018

0.0.2

Sep 14, 2018

0.0.1

Aug 21, 2018

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dexplot-0.0.10.tar.gz (165.8 kB view hashes)

Uploaded Jun 6, 2020 Source

Built Distribution

dexplot-0.0.10-py3-none-any.whl (167.9 kB view hashes)

Uploaded Jun 6, 2020 Python 3

Hashes for dexplot-0.0.10.tar.gz

Hashes for dexplot-0.0.10.tar.gz
Algorithm	Hash digest
SHA256	`21bd7a4e0d551c44b441b5267d94dec364fec1f4901fa134a030ade25b5ef50f`
MD5	`293f7d270f210b199ce9173bd22af284`
BLAKE2b-256	`4f570346bded26795f8a4e08c327af2241cc937b34efb92e736545f134b80b66`

Hashes for dexplot-0.0.10-py3-none-any.whl

Hashes for dexplot-0.0.10-py3-none-any.whl
Algorithm	Hash digest
SHA256	`9f435e612f5bcececf28ff153e2b062b526423f256dffaabbebc51dce60349b0`
MD5	`bca6fdbc5832a3e3ee0ddc17e3939174`
BLAKE2b-256	`7a0b06a8cce185113ffd1fab42820bfd624e51a30340810e386c704e6def4477`

dexplot 0.0.10

Navigation

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Project description

Dexplot

Goals

Installation

Built for long and wide data

Usage

Families of plots

Comparison with Seaborn

Examples

Aggregating plots - bar, line and scatter

Vertical bar charts

Components of the groupby aggregation

Sorting the bars

Specify order with x_order

Horizontal bars

Split bars into groups

Stacked bar charts

Split into multiple plots

Set the width of each bar with size

Distribution plots - box, violin, histogram, kde

Project details

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

Specify order with `x_order`

Set the width of each bar with `size`