Kesh Utils for Data science/EDA/Data preparation
Project description
Chart + Util = Chartil (Click to expand)
Chart + Util = Chartil
Data visualization: Simple, Single unified API for plotting and charting
During EDA/data preparation we use few common and fixed set of chart types to analyse the relation among various features. Few are simple charts like univariate and some are complex 3D or even multiple features>3.
This api is simple, single api to plot various type of relations which will hide all the technical/code details from Data Science task and approch. This overcomes the difficulties of maintaining several api or libraries and avoid repeated codes.
Using this approach we just need one api (Rest all decided by library)
from KUtils.eda import chartil
chartil.plot(dataframe, [list of columns]) or
chartil.plot(dataframe, [list of columns], {optional_settings})
Demo code:
Load UCI Dataset. Download From here
heart_disease_df = pd.read_csv('../input/uci/heart.csv')
Quick data preparation
column_to_convert_to_categorical = ['target', 'cp', 'fbs', 'exang', 'restecg', 'slope', 'ca', 'thal']
for col in column_to_convert_to_categorical:
heart_disease_df[col] = heart_disease_df[col].astype('category')
heart_disease_df['age_bin'] = pd.cut(heart_disease_df['age'], [0, 32, 40, 50, 60, 70, 100], labels=['<32', '33-40','41-50','51-60','61-70', '71+'])
heart_disease_df['sex'] = heart_disease_df['sex'].map({1:'Male', 0:'Female'})
heart_disease_df.info()
Heatmap
chartil.plot(heart_disease_df, heart_disease_df.columns) # Send all column names
chartil.plot(heart_disease_df, heart_disease_df.columns, optional_settings={'include_categorical':True} )
chartil.plot(heart_disease_df, heart_disease_df.columns, optional_settings={'include_categorical':True, 'sort_by_column':'trestbps'} )
# Force to plot heatmap when you have fewer columns, otherwise tool will decide as different chart
chartil.plot(heart_disease_df, ['chol', 'thalach', 'trestbps'], chart_type='heatmap')
Uni-categorical
chartil.plot(heart_disease_df, ['target']) # Barchart as count plot
Uni-Continuous
chartil.plot(heart_disease_df, ['age'])
chartil.plot(heart_disease_df, ['age'], chart_type='barchart') # Force barchart on cntinuous by auto creating 10 equal bins
chartil.plot(heart_disease_df, ['age'], chart_type='barchart', optional_settings={'no_of_bins':5}) # Create custom number of bins
chartil.plot(heart_disease_df, ['age'], chart_type='distplot')
Uni-categorical with optional_settings
chartil.plot(heart_disease_df, ['age_bin']) # Barchart as count plot
chartil.plot(heart_disease_df, ['age_bin'], optional_settings={'sort_by_value':True})
chartil.plot(heart_disease_df, ['age_bin'], optional_settings={'sort_by_value':True, 'limit_bars_count_to':5})
Bi Category vs Category (& Univariate Segmented)
chartil.plot(heart_disease_df, ['sex', 'target'])
chartil.plot(heart_disease_df, ['sex', 'target'], chart_type='crosstab')
chartil.plot(heart_disease_df, ['sex', 'target'], chart_type='stacked_barchart')
Bi Continuous vs Continuous
chartil.plot(heart_disease_df, ['chol', 'thalach']) # Scatter plot
Bi Continuous vs Category
chartil.plot(heart_disease_df, ['thalach', 'sex']) # Grouped box plot (Segmented univariate)
chartil.plot(heart_disease_df, ['thalach', 'sex'], chart_type='distplot') # Distplot
Multi 3 Continuous
chartil.plot(heart_disease_df, ['chol', 'thalach', 'trestbps']) # Colored 3D scatter plot
Multi 3 Categorical
chartil.plot(heart_disease_df, ['sex', 'age_bin', 'target']) # Paired barchart
Multi 2 Continuous, 1 Category
chartil.plot(heart_disease_df, ['chol', 'thalach', 'target']) # Scatter plot with colored groups
Multi 1 Continuous, 2 Category
chartil.plot(heart_disease_df, ['thalach', 'sex', 'target']) # Grouped boxplot
chartil.plot(heart_disease_df, ['thalach', 'sex', 'target'], chart_type='violinplot') # Grouped violin plot
Multi 3 Continuous, 1 category
chartil.plot(heart_disease_df, ['chol', 'thalach', 'trestbps', 'target']) # Group Color highlighted 3D plot
Multi 3 Continuous, 2 category
chartil.plot(heart_disease_df, ['sex','cp','target','thalach','trestbps']) # Paired scatter plot
Auto Liner Regression (Click to expand)
Todo:Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for kesh_utils-0.2.6-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 45db3d5f7f4c71dc2ea7405c1bbffe8ce1d5cd45360ceafa0302c1261123cf25 |
|
MD5 | ff96029159af92082283cb5c2866986d |
|
BLAKE2b-256 | bbde2402c5af22fbc6c324fbbc3dace1eb513193d978fa4ea9d2f6fb16579f66 |