Skip to main content

EBasic EDA functions implemented

Project description

eda_mds: Simplified Exploratory Data Analysis

Documentation Status

Basic EDA functions implemented to improve on core Pandas DataFrame functions.

Installation

This project has not yet been uploaded to PyPI. Please see contributing for instructions to install locally.

Summary

This package is created for kick-starting the EDA stage of a machine learning and analytics project. It's primary objective is to improve upon the popular EDA functions present in pandas package. There are four functions that deliver insights and identify potential problems in the dataset.

Function Descriptions

  • cor_eda: This function accepts a dataset and isolates its numerical continuous variables. It calculates the correlation between each numerically continuous variable from scratch and displays the results in a table.
  • info_na: This function replicates and extends behaviour of pandas.DataFrame.info. Additional information about null values in rows and columns is included.
  • cat_var_stats: This function creates summary statistics about categorical variables in the dataframe. Number of unique values, frequency of values, and suggested category binning is included.
  • describe_outliers: This function extends the functionality of pandas.Dataframe.describe for numeric data by providing a count of lower-tail and upper-tail outliers for a given threshold.

Python Ecosystem Integration

Our functions are heavily inspired from pandas package for python. EDA functions such as pandas.Dataframe.info, pandas.Dataframe.describe and pandas.Dataframe.corr are recreated and improved upon in this package. Our functions also depend on the pandas.Dataframe object.

Usage

Installation

This project has not yet been uploaded to PyPI. Please see contributing for instructions to install locally.

Function Usage

Each function takes a pandas.DataFrame object. Please see the included vingette for intended use.

Contributing

Package created by Koray Tecimer, Paolo De Lagrave-Codina, Nicole Bidwell, Simon Frew.

Interested in contributing? Check out the contributing guidelines. Please note that this project is released with a Code of Conduct. By contributing to this project, you agree to abide by its terms.

License

eda_mds was created by Koray Tecimer, Paolo De Lagrave-Codina, Nicole Bidwell, Simon Frew. It is licensed under the terms of the MIT license.

Credits

eda_mds was created with cookiecutter and the py-pkgs-cookiecutter template.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

eda_mds-2.1.0.tar.gz (7.2 kB view hashes)

Uploaded Source

Built Distribution

eda_mds-2.1.0-py3-none-any.whl (8.9 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page