Bubble plot - data visualization package

Project description

bubble_plot

Hi everyone!

I love data visualizations! And if you love them too, I think you will find this bubble plot very nice and useful.

How to install
Motivation & Usage
Usage Example
Usage Example 2
Dependencies
Contact

How to install

Very simple - just write in your command line:

pip install bubble_plot

Motivation & Usage

The goal for the bubble plot is to help us visualize linear and non-linear connections between numerical/categorical features in our data in an easy and simple way. The bubble plot is a kind of a 2-dimensional histogram using bubbles. It suits every combination of categorical and numerical features.

The bubble size is proportional to the frequency of the data points in this point.

Function signature:

bubble_plot(df, x, y, z_boolean=None, ordered_x_values=None, ordered_y_values=None, bins_x=10, bins_y=10, fontsize=16, 
            figsize=(15,10), maximal_bubble_size=5000, normalization_by_all = False, log=False)

For numerical features the values will be presented in buckets (ten equally spaced bins will be used as default, you can provide the specific bins / bins number through the bin_x and bins_y parameters).

For categorical features the features will be presented according to their categories. If you would like a specific order for the categories presentation please supply a list of the values by order using the ordered_x_values / ordered_y_values parameters.

You can plot a numerical feature vs. another numerical feature or vs. a categorical feature or a categorical feature vs another categorical feature or numerical feature. All options are possible.

Setting the parameter normalization_by_all to False defines that we would like to plot P(y/x), meaning, plot the distribution of y given x. Each column in this plot is an independent (1D) histogram of the values of the y given x. Setting the parameter normalization_by_all to True would plot the joint distribution of x and y, P(x,y), this is in fact a 2D histogram with bubbles.

Setting the log parameter to True would apply the natural log function - element wise - on the counts which will make the differences between the largest bubble to the smallest bubble much smaller, so if you have large differences between the frequencies of different values you might want to use that.

Setting the z_boolean parameter to a name of categorical field with two categories / boolean field would make the color of the bucket be proportional to the ratio ( (boolean_z==value_1).sum()/(boolean_z==value_1).sum() + (boolean_z==value_2).sum()) of the z values for this bucket.

Usage Example

import pandas as pd  
from bubble_plot.bubble_plot import bubble_plot
from sklearn.datasets import load_boston
import seaborn as sns
sns.set_style("darkgrid")
data = load_boston()                            
df = pd.DataFrame(columns=data['feature_names'], data=data['data'])                            
df['target'] = data['target']                            
bubble_plot(df, x='RM', y='target')

The resulting bubble plot will look like this:

Usage Example 2

Census income dataset - plot the age vs. hours per week vs. the income level. How is that even possible? Can we visualize three dimensions of information in a two dimensional plot?

import pandas as pd
import seaborn as sns
from bubble_plot.bubble_plot import bubble_plot
sns.set_style("darkgrid")
df = pd.read_csv("adult.csv")
bubble_plot(df, x='age', y='hours-per-week', z_boolean='target')

The resulting bubble plot will look like this:

P(x,y), x: age, y: working hours, color — proportional to the rate of high income people within each bucket

In this bubble plot, we see the joint distribution of the hours-per-week vs. the age (p(x,y)), but here the color is proportional to the rate of high income people — (#>50K/((#>50K)+(#≤50K)) - within all the people in this bucket . By supplying the z_boolean variable, we added additional dimension to the plot using the color of the bubble.

The pinker the color, the higher the ratio for the given boolean feature/target Z. See colormap in the image.

Cool colormap — Pink would stand for the higher ratios in our case, cyan would stand for the lower ratios

This plot shows us clearly that the higher income is much more common within people of age higher than 30 which work more than 40 hours a week.

Dependencies

pandas
numpy
matplotlib

Contact

More usage examples and explanations can be found at: https://medium.com/@DataLady/exploring-the-census-income-dataset-using-bubble-plot-cfa1b366313b

Please let me know if you have any questions. My email is meir.shir86@gmail.com.

Enjoy, Shir

Project details

Release history Release notifications | RSS feed

This version

0.3.4

Jul 11, 2019

0.3.3

Jul 11, 2019

0.3.2

Jul 11, 2019

0.3.1

Jul 7, 2019

0.3

Jul 6, 2019

0.2

Jul 5, 2019

0.1

Jul 5, 2019

0.0.2

Jul 5, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

bubble_plot-0.3.4-py3-none-any.whl (9.0 kB view details)

Uploaded Jul 11, 2019 Python 3

File details

Details for the file bubble_plot-0.3.4-py3-none-any.whl.

File metadata

Download URL: bubble_plot-0.3.4-py3-none-any.whl
Upload date: Jul 11, 2019
Size: 9.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.29.1 CPython/3.7.2

File hashes

Hashes for bubble_plot-0.3.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b45aa6fd67984ad2f698e323ab878c6328abfed6b4b96599430eebc3922fa5e6`
MD5	`6e7b44bfcbf9686a420e62d512cb0882`
BLAKE2b-256	`61792aa812c81b2911df1e14966f144e71978e36b58ab06f41022017beec113e`

See more details on using hashes here.

bubble-plot 0.3.4

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

bubble_plot

How to install

Motivation & Usage

Usage Example

Usage Example 2

Dependencies

Contact

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distribution

File details

File metadata

File hashes