Simple plotting library for both long and wide data integrated with DataFrames

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: BSD License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

Dexplot

A Python library for making data visualizations.

The current aim of Dexplot is to make data visualization creation in Python more robust and straightforward. Dexplot is built on top of Matplotlib and accepts Pandas DataFrames as inputs.

Installation

pip install dexplot

Goals

The primary goals for Dexplot are:

Maintain a very consistent API with as few functions as necessary to make the desired statistical plots
Allow the user to tweak the plots without digging into Matplotlib

Tidy Data from Pandas

Dexplot only accepts Pandas DataFrames as input for its plotting functions that are in "tidy" form.

Sample plots

Dexplot currently maintains one primary function, aggplot which is used to aggregate data and can create five different kinds of plots.

bar
line
box
hist
kde

There are 7 primary parameters to aggplot:

agg - Name of column to be aggregated. If it is a column with string/categorical values, then the counts or frequency percentage will be returned.
groupby - Name of column whose unique values will form independent groups. This is used in a similar fashion as the group by SQL clause.
data - The Pandas DataFrame
hue - The name of the column to further group the data within a single plot
row - The name of the column who's unique values split the data in to separate rows
col - The name of the column who's unique values split the data in to separate columns
kind - The kind of plot to create. One of the five strings from above.

City of Houston Data

To get started, we will use City of Houston employee data collected from the year 2016. It contains public information from about 1500 employees and is located in Dexplot's GitHub repository.

import pandas as pd
import dexplot as dxp

emp = pd.read_csv('data/employee.csv')
emp.head()

	title	dept	salary	race	gender	experience	experience_level
0	POLICE OFFICER	Houston Police Department-HPD	45279.0	White	Male	1	Novice
1	ENGINEER/OPERATOR	Houston Fire Department (HFD)	63166.0	White	Male	34	Veteran
2	SENIOR POLICE OFFICER	Houston Police Department-HPD	66614.0	Black	Male	32	Veteran
3	ENGINEER	Public Works & Engineering-PWE	71680.0	Asian	Male	4	Novice
4	CARPENTER	Houston Airport System (HAS)	42390.0	White	Male	3	Novice

Plotting the average salary by department

The agg parameter is very important and is what will be aggregated (summarized by a single point statistic, like the mean or median). It is the first parameter and only parameter you must specify (besides data). If this column is numeric, then by default, the mean of it will be calculated. Here, we specify the groupby parameter, who's unique values form the independent groups and label the x-axis.

dxp.aggplot(agg='salary', groupby='dept', data=emp)

<matplotlib.axes._subplots.AxesSubplot at 0x1146b7550>

png

Make horizontal with the `orient` parameter

The orient parameter controls whether the plot will be horizontal or vertical. By default it is set to 'h'.

dxp.aggplot(agg='salary', groupby='dept', data=emp, orient='h')

<matplotlib.axes._subplots.AxesSubplot at 0x114a37438>

png

Controlling the figure size

One of the goals of Dexplot is to not have you dip down into the details of Matplotlib. We can use the figsize parameter to change the size of our plot.

dxp.aggplot(agg='salary', groupby='dept', data=emp, orient='h', figsize=(8, 4))

<matplotlib.axes._subplots.AxesSubplot at 0x1149cee80>

png

Adding another dimension with `hue`

The hue parameter may be used to further subdivide each unique value in the groupby column. Notice that long tick labels are automatically wrapped.

dxp.aggplot(agg='salary', groupby='dept', data=emp, hue='gender')

<matplotlib.axes._subplots.AxesSubplot at 0x1170f2518>

png

Aggregating a String/Categorical column

It is possible to use a string/categorical column as the aggregating variable. In this instance, the counts of the unique values of that column will be returned. Because this is already doing a groupby, you cannot specify a groupby column in this instance. Let's get the count of employees by race.

dxp.aggplot(agg='race', data=emp, figsize=(8, 4))

<matplotlib.axes._subplots.AxesSubplot at 0x1173e1fd0>

png

Using `hue` with a String/Categorical column

Using a groupby is not allowed when a string/categorical column is being aggregated. But, we can still sub-divide the groups further by specifying hue.

dxp.aggplot(agg='race', data=emp, hue='dept')

<matplotlib.axes._subplots.AxesSubplot at 0x1176cf6d8>

png

Getting the frequency percentage with `normalize`

It is possible to turn the raw counts into percentages by passing a value to normalize. Let's find the percentage of all employees by race.

dxp.aggplot(agg='race', data=emp, normalize='all', figsize=(8, 4))

<matplotlib.axes._subplots.AxesSubplot at 0x1171bdba8>

png

You can normalize over any variable

The parameter normalize can be either 'agg', 'hue', 'row', 'col', or a tuple containing any number of these or 'all'. For instance, in the following plot, you can normalize by either agg or hue.

dxp.aggplot(agg='race', data=emp, hue='dept', normalize='agg')

<matplotlib.axes._subplots.AxesSubplot at 0x117abc6a0>

png

Data normalized by race

As you can see, the data was normalized by race. For example, from the graph, we can tell that about 30% of black employees were members of the police department. We can also normalize by department. From the graph, about 10% of the Health & Human Services employees were Asian.

dxp.aggplot(agg='race', data=emp, hue='dept', normalize='hue')

<matplotlib.axes._subplots.AxesSubplot at 0x117c0fb38>

png

Other kinds of plots `line`, `box`, `hist`, and `kde`

aggplot is capable of making four other kinds of plots. The line plot is very similar to the bar plot but simply connects the values together. Let's go back to a numeric column and calculate the median salary by department across each gender.

dxp.aggplot(agg='salary', data=emp, groupby='dept', hue='gender', kind='line', aggfunc='median')

<matplotlib.axes._subplots.AxesSubplot at 0x117fe55f8>

png

`aggfunc` can take any string value that Pandas can

There are more than a dozen string values that aggfunc can take. These are simply passed to Pandas groupby method which does the aggregation.

All plots can be both vertical and horizontal

We can rotate all plots with orient.

dxp.aggplot(agg='salary', data=emp, groupby='dept', hue='gender', kind='line', aggfunc='median', orient='h')

<matplotlib.axes._subplots.AxesSubplot at 0x1181b6240>

png

Boxplots

Here is the same data plotted as a box plot. This isn't actually an aggregation, so the aggfunc parameter is meaningless here. Instead, all the values of the particular group are plotted.

dxp.aggplot(agg='salary', data=emp, groupby='dept', hue='gender', kind='box', orient='h')

<matplotlib.axes._subplots.AxesSubplot at 0x118379390>

png

Histograms and KDE's

As with boxplots, histograms and kdes do not function with aggfunc as they aren't aggregating but simply displaying all the data for us. Also, it is not possible to use both groupby and agg with these plots.

dxp.aggplot(agg='salary', data=emp, groupby='dept', kind='hist', orient='v')

<matplotlib.axes._subplots.AxesSubplot at 0x118a7ac88>

png

dxp.aggplot(agg='salary', data=emp, groupby='dept', kind='kde', orient='v')

<matplotlib.axes._subplots.AxesSubplot at 0x118cb77f0>

png

Splitting into separate plots

The row and col parameters can be used to split the data into separate plots. Each unique value of row or col will create a new plot. A one-item tuple consisting of the entire Figure is returned.

dxp.aggplot(agg='salary', data=emp, groupby='experience_level', kind='kde', orient='v', row='dept')

(<Figure size 576x1152 with 6 Axes>,)

png

Use the `wrap` parameter to make new rows/columns

Set the wrap parameter to an integer to determine where a new row/column will be formed.

dxp.aggplot(agg='salary', data=emp, groupby='experience_level', kind='box', orient='v', row='dept', wrap=3)

(<Figure size 864x720 with 6 Axes>,)

png

`wrap` works for both `row` or `col`

dxp.aggplot(agg='salary', data=emp, groupby='experience_level', kind='box', orient='v', col='dept', wrap=5)

(<Figure size 1296x576 with 6 Axes>,)

png

Use both `row` and `col` for a entire grid

By using both row and col, you can maximize the number of variables you divide the data into.

dxp.aggplot(agg='salary', data=emp, groupby='gender', kind='kde', row='dept', col='experience_level')

(<Figure size 1008x1152 with 18 Axes>,)

png

Normalize by more than one variable

Before, we normalized by just a single variable. It is possible to normalize by multiple variables with a tuple. Here we normalize by department and gender. Adding up all the blue bars for each department should add to 1.

dxp.aggplot(agg='dept', data=emp, hue='gender', kind='bar', row='race', normalize=('agg', 'hue'))

(<Figure size 720x1008 with 5 Axes>,)

png

Normalize by three variables

Here we normalize by race, experience level, and gender. Each set of orange/blue bars within each plot will add to 1.

dxp.aggplot(agg='dept', data=emp, hue='gender', kind='bar', row='race', 
            col='experience_level', normalize=('hue', 'col', 'row'), orient='h')

(<Figure size 1008x1008 with 15 Axes>,)

png

Scatterplot

scatterplot is the only other currently available function. It plots two continuous valued variables against each other. It does not do any aggregating. It plots the data raw is it sees it. It can split the data into groups or new plots with hue, row, and col.

dxp.scatterplot('experience', 'salary', data=emp)

<matplotlib.axes._subplots.AxesSubplot at 0x1271ec710>

png

Split data in the same plot with `hue`

dxp.scatterplot('experience', 'salary', data=emp, hue='gender')

<matplotlib.axes._subplots.AxesSubplot at 0x1274f1a20>

png

Plot a regression line by setting `fit_reg` equal to `True`

By default it plots the 95% confidence interval around the mean.

dxp.scatterplot('experience', 'salary', data=emp, hue='gender', fit_reg=True)

<matplotlib.axes._subplots.AxesSubplot at 0x127670e10>

png

Further split the data into separate plots with `row` and `col`

dxp.scatterplot('experience', 'salary', data=emp, hue='gender', row='dept', wrap=3)

(<Figure size 864x720 with 6 Axes>,)

png

dxp.scatterplot('experience', 'salary', data=emp, hue='gender', row='dept', col='experience_level')

(<Figure size 1008x1152 with 18 Axes>,)

png

Use the `s` parameter to change the size of each marker

Let s equal a column name containing numeric values to set each marker size individually. We need to create another numeric variable first since the dataset only contains two.

import numpy as np
emp['num'] = np.random.randint(10, 300, len(emp))

dxp.scatterplot('experience', 'salary', data=emp, hue='gender', row='dept', wrap=3, s='num')

(<Figure size 864x720 with 6 Axes>,)

png

Comparison with Seaborn

If you have used the Seaborn library, then you should notice a lot of similarities. Much of Dexplot was inspired by Seaborn. Below is a list of the extra features in Dexplot not found in Seaborn

The ability to graph frequency percentage and normalize over any number of variables
Far fewer public functions. Only two at the moment
No need for multiple functions to do the same thing. Seaborn has both countplot and barplot
Ability to make grids with a single function instead of having to use a higher level function like catplot
Pandas groupby methods are available as strings
Both x/y-labels and titles are automatically wrapped so that they don't overlap
The figure size (plus several other options) and available to change without dipping down into matplotlib
No new types like FacetGrid. Only matplotlib objects are returned

Project details

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: BSD License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

0.1.4

Jun 16, 2020

0.1.3

Jun 14, 2020

0.1.2

Jun 14, 2020

0.1.1

Jun 12, 2020

0.1.0

Jun 11, 2020

0.0.10

Jun 6, 2020

0.0.9

Oct 7, 2018

0.0.8

Oct 7, 2018

0.0.7

Oct 7, 2018

This version

0.0.6

Oct 4, 2018

0.0.5

Oct 4, 2018

0.0.4

Oct 4, 2018

0.0.3

Oct 4, 2018

0.0.2

Sep 14, 2018

0.0.1

Aug 21, 2018

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dexplot-0.0.6.tar.gz (17.9 kB view details)

Uploaded Oct 4, 2018 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

dexplot-0.0.6-py3-none-any.whl (18.2 kB view details)

Uploaded Oct 4, 2018 Python 3

File details

Details for the file dexplot-0.0.6.tar.gz.

File metadata

Download URL: dexplot-0.0.6.tar.gz
Upload date: Oct 4, 2018
Size: 17.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.18.4 setuptools/40.4.3 requests-toolbelt/0.8.0 tqdm/4.25.0 CPython/3.6.4

File hashes

Hashes for dexplot-0.0.6.tar.gz
Algorithm	Hash digest
SHA256	`93603fe6cd0364ac3df48ce58baa7d49db4493201e65152cd492ea447afc1ab7`
MD5	`ac2fc289bccd315708386103dc9a6d7b`
BLAKE2b-256	`bf3c8f2876106f30963b66a82c19b9510c2d34ffd9f3edab03b854fc71124353`

See more details on using hashes here.

File details

Details for the file dexplot-0.0.6-py3-none-any.whl.

File metadata

Download URL: dexplot-0.0.6-py3-none-any.whl
Upload date: Oct 4, 2018
Size: 18.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.18.4 setuptools/40.4.3 requests-toolbelt/0.8.0 tqdm/4.25.0 CPython/3.6.4

File hashes

Hashes for dexplot-0.0.6-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d952fb166365458fd717cd79dfda3bbb677a8ecaa291c6db68b494c11e237017`
MD5	`e966abeb6bd16c2ba39fc9d3e70fee0b`
BLAKE2b-256	`0c2ad254df6da96f66cbf48af98788def73ff9115cef2d8e6fb0f88a41a63d26`

See more details on using hashes here.

dexplot 0.0.6

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Dexplot

Installation

Goals

Tidy Data from Pandas

Sample plots

City of Houston Data

Plotting the average salary by department

Make horizontal with the orient parameter

Controlling the figure size

Adding another dimension with hue

Aggregating a String/Categorical column

Using hue with a String/Categorical column

Getting the frequency percentage with normalize

You can normalize over any variable

Data normalized by race

Other kinds of plots line, box, hist, and kde

aggfunc can take any string value that Pandas can

All plots can be both vertical and horizontal

Boxplots

Histograms and KDE's

Splitting into separate plots

Use the wrap parameter to make new rows/columns

wrap works for both row or col

Use both row and col for a entire grid

Normalize by more than one variable

Normalize by three variables

Scatterplot

Split data in the same plot with hue

Plot a regression line by setting fit_reg equal to True

Further split the data into separate plots with row and col

Use the s parameter to change the size of each marker

Comparison with Seaborn

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Make horizontal with the `orient` parameter

Adding another dimension with `hue`

Using `hue` with a String/Categorical column

Getting the frequency percentage with `normalize`

Other kinds of plots `line`, `box`, `hist`, and `kde`

`aggfunc` can take any string value that Pandas can

Use the `wrap` parameter to make new rows/columns

`wrap` works for both `row` or `col`

Use both `row` and `col` for a entire grid

Split data in the same plot with `hue`

Plot a regression line by setting `fit_reg` equal to `True`

Further split the data into separate plots with `row` and `col`

Use the `s` parameter to change the size of each marker