Skip to main content

A preliminary EDA helper.

Project description

badge codecov

prelim_eda_helper

This package is a preliminary exploratory data analysis (EDA) tool to make useful feature EDA plots and provide relevant information to simplify an otherwise tedious EDA step of any data science project. Specifically this package allows users to target any two features, whether they are numeric or categorical, and create visualization plots supplemented with useful summary and test statistics.

This package provides a streamlined and easy to use solution for basic EDA tasks that would otherwise require significant amount of coding to achieve. Similar packages can be found published on PyPi such as the following:

prelim_eda_helper enables user to write quick visualization queries. At the same time, as we understand visually strong effects on graphs are not necessarily statistically meaningful, prelim_eda_helper is designed to combine graphic visualizations with preliminary statistical test results. We aim to create a helper package to really help researchers to get a quick sense of how our data look like, without making charts and doing tests separately in earlier stages of projects. We believe the combination of graphical and statistical output is what makes prelim_eda_helper a unique yet handy helper package.

To achive this goal, prelim_eda_helper creates charts with the visualization library altair and conducts statistical tests with 'scipy'.

Usage

Installation

$ pip install prelim_eda_helper

initialize_helper

Enables plotting data sets with more than 5000 rows.

initialize_helper()

num_dist_by_cat

Creates a pair of plots showing the distribution of the numeric variable when grouped by the categorical variable. Output includes a histogram and boxplot. In addition, basic test statistics will be provided for user reference.

from prelim_eda_helper import num_dist_by_cat
num_dist_by_cat(num = 'x', cat = 'group', data = data, title_hist = 'Distribution of X', title_boxplot = 'X Seperated by Group', lab_num = 'X', lab_cat = 'Group', num_on_x = True, stat = True)

num_dist_scatter

Creates a scatter plot given two numerical variables. The plot can provide regression trendline and include confidence interval bands. Spearman and Pearson's correlation will also be returned to aid the user to determining feature relationship.

from prelim_eda_helper import num_dist_scatter
num_dist_scatter(num1 = 'x', num2 = 'y', data = data, title = 'Scatter plot with X and Y', stat = False, trend = None)

cat_dist_heatmap

Creates concatenated charts showing the heatmap of two categorical variables and a barchart for occurrance of these variables.

from prelim_eda_helper import cat_dist_heatmap
cat_dist_heatmap(cat_1 = 'group1', cat_2 = 'group2', data = data, title = 'How are Group1 and Group2 distributed?', lab_1 = 'group1', lab_2 = 'group2', heatmap = True, barchart = True)

num_dist_summary

Creates a distribution plot of the given numeric variable and provides a statistical summary of the feature. In addition, the correlation values of the variable with other numeric features will be provided based on a given threshold.

from prelim_eda_helper import num_dist_summary
num_dist_summary(num = 'x', data = data, title = 'Distribution of X', lab = 'X', thresh_corr = 0.0, stat = True )

Contributing

Interested in contributing? Check out the contributing guidelines. Please note that this project is released with a Code of Conduct. By contributing to this project, you agree to abide by its terms.

License

prelim_eda_helper was created by Mehwish Nabi, Morris Chan, Xinry LU, Austin Shih. It is licensed under the terms of the MIT license.

Credits

prelim_eda_helper was created with cookiecutter and the py-pkgs-cookiecutter template.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

prelim_eda_helper-0.1.8.tar.gz (8.4 kB view details)

Uploaded Source

Built Distribution

prelim_eda_helper-0.1.8-py3-none-any.whl (8.3 kB view details)

Uploaded Python 3

File details

Details for the file prelim_eda_helper-0.1.8.tar.gz.

File metadata

  • Download URL: prelim_eda_helper-0.1.8.tar.gz
  • Upload date:
  • Size: 8.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.11.1

File hashes

Hashes for prelim_eda_helper-0.1.8.tar.gz
Algorithm Hash digest
SHA256 3707b2bc5c5f75bdef8241ad71021f0aeb3c28dd41c754f7c8cd9713310fa043
MD5 258f7c36ea6ae4437bccb706610bbfb7
BLAKE2b-256 174b699a63e4c6eb338c40fc2d692dfd039785a57fd1b15f59392c135c379ba8

See more details on using hashes here.

File details

Details for the file prelim_eda_helper-0.1.8-py3-none-any.whl.

File metadata

File hashes

Hashes for prelim_eda_helper-0.1.8-py3-none-any.whl
Algorithm Hash digest
SHA256 52723c0cfbb27a2a91bd616faa1a28af0825d4debbcb860b0bd9c340c9a95a17
MD5 f8c2b5f9ccf6879b9200f6a76de0696b
BLAKE2b-256 b57b87e81d1e71292aac20dc045ca8b277f0239bf84a129be70c91e9b223f171

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page