Skip to main content

Kesh Utils for Data science/EDA/Data preparation

Project description

Chart + Util = Chartil

During EDA/data preparation stage, I use few fixed chart types to analyse the relation among various features. Few are simple chart like univariate and some are complex 3D or even multiple features>3.

Over the period it became complex to maintain all relevant codes or repeat codes. Instead I developed a simple, single api to plot various type of relations which will hide all technical/code details from Data Science task and approch.

Using this approach I just need one api

from KUtils.eda import chartil

chartil.plot(dataframe, [list of columns]) or
chartil.plot(dataframe, [list of columns], {optional_settings})

Demo code:

Load UCI Dataset. Download From here

heart_disease_df = pd.read_csv('../input/uci/heart.csv')

heart_disease_df['age_bin'] = pd.cut(heart_disease_df['age'], [0, 32, 40, 50, 60, 70, 100], labels=['<32', '33-40','41-50','51-60','61-70', '71+']) heart_disease_df['sex'] = heart_disease_df['sex'].map({1:'Male', 0:'Female'})

Heatmap

chartil.plot(heart_disease_df, heart_disease_df.columns) # Send all column names Heatmap Numerical

chartil.plot(heart_disease_df, heart_disease_df.columns, optional_settings={'include_categorical':True} ) Heatmap With categorical

chartil.plot(heart_disease_df, heart_disease_df.columns, optional_settings={'include_categorical':True, 'sort_by_column':'trestbps'} ) Heatmap With categorical and ordered by a column

Uni-categorical

chartil.plot(heart_disease_df, ['target']) # Barchart as count plot Uni Categorical

Uni-Continuous

chartil.plot(heart_disease_df, ['age']) Uni boxplot

chartil.plot(heart_disease_df, ['age'], chart_type='barchart') # Force barchart on cntinuous by auto creating 10 equal bins Uni barchart_forced

chartil.plot(heart_disease_df, ['age'], chart_type='barchart', optional_settings={'no_of_bins':5}) # Create custom number of bins Uni uni_barchart_forced_custom_bin_size

chartil.plot(heart_disease_df, ['age'], chart_type='distplot') Uni distplot

Uni-categorical with optional_settings

chartil.plot(heart_disease_df, ['age_bin']) # Barchart as count plot Uni distplot

chartil.plot(heart_disease_df, ['age_bin'], optional_settings={'sort_by_value':True}) Uni distplot

chartil.plot(heart_disease_df, ['age_bin'], optional_settings={'sort_by_value':True, 'limit_bars_count_to':5}) Uni distplot

Bi Category vs Category (& Univariate Segmented)

chartil.plot(heart_disease_df, ['sex', 'target']) Bi Category

chartil.plot(heart_disease_df, ['sex', 'target'], chart_type='crosstab') Bi Category

chartil.plot(heart_disease_df, ['sex', 'target'], chart_type='stacked_barchart') Bi Category

Bi Continuous vs Continuous

chartil.plot(heart_disease_df, ['chol', 'thalach']) # Scatter plot Bi Continuous scatter

Bi Continuous vs Category

chartil.plot(heart_disease_df, ['thalach', 'sex']) # Grouped box plot (Segmented univariate) Bi continuous_catergory_box

chartil.plot(heart_disease_df, ['thalach', 'sex'], chart_type='distplot') # Distplot Bi continuous_catergory_distplot

Multi 3 Continuous

chartil.plot(heart_disease_df, ['chol', 'thalach', 'trestbps']) # Colored 3D scatter plot 3 Continuous 3D

Multi 3 Categorical

chartil.plot(heart_disease_df, ['age_bin', 'sex', 'target']) # Paired barchart 3 paired_3d_grouped_barchart

Multi 2 Continuous, 1 Category

chartil.plot(heart_disease_df, ['chol', 'thalach', 'target']) # Scatter plot with colored groups Grouped Scatter plot

Multi 1 Continuous, 2 Category

chartil.plot(heart_disease_df, ['thalach', 'sex', 'target']) # Grouped boxplot Grouped 1continuous_2category_boxplot

chartil.plot(heart_disease_df, ['thalach', 'sex', 'target'], chart_type='violinplot') # Grouped violin plot Grouped 1continuous_2category_violinplot

Multi 3 Continuous, 1 category

chartil.plot(heart_disease_df, ['chol', 'thalach', 'trestbps', 'target']) # Group Color highlighted 3D plot Grouped 3d_scatter

Multi 3 Continuous, 2 category

chartil.plot(heart_disease_df, ['sex','cp','target','thalach','trestbps']) # Paired scatter plot Grouped Paired_3d_grouped_scatter

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kesh-utils-0.2.0.tar.gz (12.6 kB view hashes)

Uploaded Source

Built Distribution

kesh_utils-0.2.0-py3-none-any.whl (13.1 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page