Skip to main content

EBasic EDA functions implemented

Project description

eda_mds: Simplified Exploratory Data Analysis

Documentation Status License: MIT Python 3.9.0 version release codecov


Basic EDA functions implemented to improve on core Pandas DataFrame functions.

Summary

This package is created for kick-starting the EDA stage of a machine learning and analytics project. It's primary objective is to improve upon the popular EDA functions present in pandas package. There are four functions that deliver insights and identify potential problems in the dataset.

Function Descriptions

  • cor_eda: This function accepts a dataset and isolates its numerical continuous variables. It calculates the correlation between each numerically continuous variable from scratch and displays the results in a table.
  • info_na: This function replicates and extends behaviour of pandas.DataFrame.info. Additional information about null values in rows and columns is included.
  • cat_var_stats: This function creates summary statistics about categorical variables in the dataframe. Number of unique values, frequency of values, and suggested category binning is included.
  • describe_outliers: This function extends the functionality of pandas.Dataframe.describe for numeric data by providing a count of lower-tail and upper-tail outliers for a given threshold.

Python Ecosystem Integration

Our functions are heavily inspired from pandas package for python. EDA functions such as pandas.Dataframe.info, pandas.Dataframe.describe and pandas.Dataframe.corr are recreated and improved upon in this package. Our functions also depend on the pandas.Dataframe object.

Installation

User Setup

This package can be installed via PyPi by running the following command in your terminal.

$ pip install eda_mds

Developer Setup

Here's how to install eda_mds for local development.

  1. Clone a copy of eda_mds locally, by running the following command in your terminal.

    $ git clone https://github.com/UBC-MDS/eda_mds.git
    
  2. Create/activate new conda environment and install poetry.

    $ conda create -n eda_mds_dev python=3.9 poetry
    
    $ conda activate eda_mds_dev 
    
  3. Navigate to the root directory.

    $ cd path/to/eda_mds
    
  4. Install eda_mds using poetry.

    $ poetry install
    

Running the Tests and Coverage

  1. To run the tests navigate to the root directory.
    $ cd path/to/eda_mds
    
  2. To run the tests navigate to the root directory.
    $ pytest
    
  3. To run the coverage report.
    $ coverage report
    

Usage

Function Usage

Below provides a short depiction on how to start using the functions in this package, after you have completed the installation. Please see the vingette for detailed usage. Note: Each function takes in a pandas.DataFrame object.

  1. Import the functions and pandas.
from eda_mds import info_na, describe_outliers, cat_var_stats, cor_eda

import pandas as pd
  1. Load your dataset of choice.
df = pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/titanic.csv')
  1. Begin using the functions!
info_na(df)
describe_outliers(df)
cat_var_stats(df)
cor_eda(df)

Contributing

Package created by Koray Tecimer, Paolo De Lagrave-Codina, Nicole Bidwell, Simon Frew.

Interested in contributing? Check out the contributing guidelines. Please note that this project is released with a Code of Conduct. By contributing to this project, you agree to abide by its terms.

License

eda_mds was created by Koray Tecimer, Paolo De Lagrave-Codina, Nicole Bidwell, Simon Frew. Code is licensed under the terms of the MIT license. Non-code portions, specifically vignettes and related documentation, is licensed under the terms of the Creative Commons Zero v1.0 Universal license.

Credits

eda_mds was created with cookiecutter and the py-pkgs-cookiecutter template.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

eda_mds-2.1.8.tar.gz (10.0 kB view hashes)

Uploaded Source

Built Distribution

eda_mds-2.1.8-py3-none-any.whl (11.9 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page