Skip to main content

EBasic EDA functions implemented

Project description

eda_mds: Simplified Exploratory Data Analysis

Documentation Status License: MIT Python 3.9.0 version release

Basic EDA functions implemented to improve on core Pandas DataFrame functions.

Installation

This project has not yet been uploaded to PyPI. Please see contributing for instructions to install locally.

Summary

This package is created for kick-starting the EDA stage of a machine learning and analytics project. It's primary objective is to improve upon the popular EDA functions present in pandas package. There are four functions that deliver insights and identify potential problems in the dataset.

Function Descriptions

  • cor_eda: This function accepts a dataset and isolates its numerical continuous variables. It calculates the correlation between each numerically continuous variable from scratch and displays the results in a table.
  • info_na: This function replicates and extends behaviour of pandas.DataFrame.info. Additional information about null values in rows and columns is included.
  • cat_var_stats: This function creates summary statistics about categorical variables in the dataframe. Number of unique values, frequency of values, and suggested category binning is included.
  • describe_outliers: This function extends the functionality of pandas.Dataframe.describe for numeric data by providing a count of lower-tail and upper-tail outliers for a given threshold.

Python Ecosystem Integration

Our functions are heavily inspired from pandas package for python. EDA functions such as pandas.Dataframe.info, pandas.Dataframe.describe and pandas.Dataframe.corr are recreated and improved upon in this package. Our functions also depend on the pandas.Dataframe object.

Installation

Here's how to set up eda_mds for local development.

  1. Clone a copy of eda_mds locally.

    $ git clone https://github.com/UBC-MDS/eda_mds.git
    
  2. Create/activate new conda environment and install poetry

    $ conda create -n eda_mds_dev python=3.9 poetry
    
    $ conda activate eda_mds_dev 
    
  3. Navigate to the root directory

    $ cd path/to/eda_mds
    
  4. Install eda_mds using poetry:

    $ poetry install
    
  • For installing the package via PyPi:

    $ pip install eda_mds
    

Running the Tests and Coverage

  1. To run the tests navigate to the root directory
    $ cd path/to/eda_mds
    
  2. To run the tests navigate to the root directory
    $ pytest
    
  3. To run the coverage report
    $ coverage report
    

Usage

Function Usage

Below provides a short depiction on how to start using the functions in this package. Please see the vingette for intended use.

Each function takes a pandas.DataFrame object.

  1. Import the functions and Pandas.
from eda_mds import info_na, describe_outliers, cat_var_stats, cor_eda

import pandas as pd
  1. Load your dataset of choice.
df = pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/titanic.csv')
  1. Begin using the functions!
info_na(df)

Contributing

Package created by Koray Tecimer, Paolo De Lagrave-Codina, Nicole Bidwell, Simon Frew.

Interested in contributing? Check out the contributing guidelines. Please note that this project is released with a Code of Conduct. By contributing to this project, you agree to abide by its terms.

License

eda_mds was created by Koray Tecimer, Paolo De Lagrave-Codina, Nicole Bidwell, Simon Frew. It is licensed under the terms of the MIT license.

Credits

eda_mds was created with cookiecutter and the py-pkgs-cookiecutter template.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

eda_mds-2.1.3.tar.gz (9.1 kB view hashes)

Uploaded Source

Built Distribution

eda_mds-2.1.3-py3-none-any.whl (9.5 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page