Kesh Utils for Data science/EDA/Data preparation
Project description
Chart + Util = Chartil
During EDA/data preparation stage, I use few fixed chart types to analyse the relation among various features. Few are simple chart like univariate and some are complex 3D or even multiple features>3.
Over the period it became complex to maintain all relevant codes or repeat codes. Instead I developed a simple, single api to plot various type of relations which will hide all technical/code details from Data Science task and approch.
Using this approach I just need one api
from KUtils.eda import chartil
chartil.plot(dataframe, [list of columns]) or
chartil.plot(dataframe, [list of columns], {optional_settings})
Demo code:
Load UCI Dataset. Download From here
heart_disease_df = pd.read_csv('../input/uci/heart.csv')
heart_disease_df['age_bin'] = pd.cut(heart_disease_df['age'], [0, 32, 40, 50, 60, 70, 100], labels=['<32', '33-40','41-50','51-60','61-70', '71+']) heart_disease_df['sex'] = heart_disease_df['sex'].map({1:'Male', 0:'Female'})
Heatmap
chartil.plot(heart_disease_df, heart_disease_df.columns) # Send all column names ![Heatmap Numerical] (https://raw.githubusercontent.com/KeshavShetty/ds/master/Roughbook/misc_resources/heatmap1.png) chartil.plot(heart_disease_df, heart_disease_df.columns, optional_settings={'include_categorical':True} ) ![Heatmap With categorical] (https://raw.githubusercontent.com/KeshavShetty/ds/master/Roughbook/misc_resources/heatmap2.png) chartil.plot(heart_disease_df, heart_disease_df.columns, optional_settings={'include_categorical':True, 'sort_by_column':'trestbps'} ) ![Heatmap With categorical and ordered by a column] (https://raw.githubusercontent.com/KeshavShetty/ds/master/Roughbook/misc_resources/heatmap3.png)
Uni-categorical
chartil.plot(heart_disease_df, ['target']) # Barchart as count plot ![Uni Categorical] (https://raw.githubusercontent.com/KeshavShetty/ds/master/Roughbook/misc_resources/uni_categorical.png)
Uni-Continuous
chartil.plot(heart_disease_df, ['age']) # boxplot ![Uni boxplot] (https://raw.githubusercontent.com/KeshavShetty/ds/master/Roughbook/misc_resources/uni_boxplot.png) chartil.plot(heart_disease_df, ['age'], chart_type='barchart') # Force barchart on cntinuous by auto creating 10 equal bins ![Uni barchart_forced] (https://raw.githubusercontent.com/KeshavShetty/ds/master/Roughbook/misc_resources/uni_barchart_forced.png) chartil.plot(heart_disease_df, ['age'], chart_type='barchart', optional_settings={'no_of_bins':5}) # Create custom number of bins ![Uni uni_barchart_forced_custom_bin_size] (https://raw.githubusercontent.com/KeshavShetty/ds/master/Roughbook/misc_resources/uni_barchart_forced_custom_bin_size.png) chartil.plot(heart_disease_df, ['age'], chart_type='distplot') ![Uni distplot] (https://raw.githubusercontent.com/KeshavShetty/ds/master/Roughbook/misc_resources/uni_distplot.png)
Uni-categorical with optional_settings
chartil.plot(heart_disease_df, ['age_bin']) # Barchart as count plot chartil.plot(heart_disease_df, ['age_bin'], optional_settings={'sort_by_value':True}) chartil.plot(heart_disease_df, ['age_bin'], optional_settings={'sort_by_value':True, 'limit_bars_count_to':5})
Bi Category vs Category (& Univariate Segmented)
chartil.plot(heart_disease_df, ['sex', 'target']) chartil.plot(heart_disease_df, ['sex', 'target'], chart_type='crosstab') chartil.plot(heart_disease_df, ['sex', 'target'], chart_type='stacked_barchart')
Bi Continuous vs Continuous
chartil.plot(heart_disease_df, ['chol', 'thalach']) # Scatter plot
Bi Continuous vs Category
chartil.plot(heart_disease_df, ['thalach', 'sex']) # Grouped box plot (Segmented univariate) chartil.plot(heart_disease_df, ['thalach', 'sex'], chart_type='distplot') # Distplot
Multi 3 Continuous
chartil.plot(heart_disease_df, ['chol', 'thalach', 'trestbps']) # Colored 3D scatter plot
Multi 3 Categorical
chartil.plot(heart_disease_df, ['age_bin', 'sex', 'target']) # Paired barchart
Multi 2 Continuous, 1 Category
chartil.plot(heart_disease_df, ['chol', 'thalach', 'target']) # Scatter plot with colored groups
Multi 1 Continuous, 2 Category
chartil.plot(heart_disease_df, ['thalach', 'sex', 'target']) # Grouped boxplot chartil.plot(heart_disease_df, ['thalach', 'sex', 'target'], chart_type='violinplot') # Grouped violin plot
Multi 3 Continuous, 1 category
chartil.plot(heart_disease_df, ['chol', 'thalach', 'trestbps', 'target']) # Group Color highlighted 3D plot
Multi 3 Continuous, 2 category
chartil.plot(heart_disease_df, ['sex','cp','target','thalach','trestbps']) # Paired scatter plot
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for kesh_utils-0.1.6-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9d6ce9494ed330c721c71fedbd1567dedd311e693327f5cf6daddb64bcebdd7f |
|
MD5 | f59ba4af321fa050a2af53b5f49a68eb |
|
BLAKE2b-256 | befebe6ef651213fe4f75ef38a37dd07b3bceef084bdf7a651dd26cbfaf6778d |