Skip to main content

EBasic EDA functions implemented

Project description

eda_mds: Simplified Exploratory Data Analysis

Documentation Status License: MIT Python 3.9.0 version release codecov


Basic EDA functions implemented to improve on core Pandas DataFrame functions.

Summary

This package is created for kick-starting the EDA stage of a machine learning and analytics project. It's primary objective is to improve upon the popular EDA functions present in pandas package. There are four functions that deliver insights and identify potential problems in the dataset.

Function Descriptions

  • cor_eda: This function accepts a dataset and isolates its numerical continuous variables. It calculates the correlation between each numerically continuous variable from scratch and displays the results in a table.
  • info_na: This function replicates and extends behaviour of pandas.DataFrame.info. Additional information about null values in rows and columns is included.
  • cat_var_stats: This function creates summary statistics about categorical variables in the dataframe. Number of unique values, frequency of values, and suggested category binning is included.
  • describe_outliers: This function extends the functionality of pandas.Dataframe.describe for numeric data by providing a count of lower-tail and upper-tail outliers for a given threshold.

Python Ecosystem Integration

Our functions are heavily inspired from pandas package for python. EDA functions such as pandas.Dataframe.info, pandas.Dataframe.describe and pandas.Dataframe.corr are recreated and improved upon in this package. Our functions also depend on the pandas.Dataframe object.

Installation

User Setup

This package can be installed via PyPi by running the following command in your terminal.

$ pip install eda_mds

Developer Setup

Here's how to install eda_mds for local development.

  1. Clone a copy of eda_mds locally, by running the following command in your terminal.

    $ git clone https://github.com/UBC-MDS/eda_mds.git
    
  2. Create/activate new conda environment and install poetry.

    $ conda create -n eda_mds_dev python=3.9 poetry
    
    $ conda activate eda_mds_dev 
    
  3. Navigate to the root directory.

    $ cd path/to/eda_mds
    
  4. Install eda_mds using poetry.

    $ poetry install
    

Running the Tests and Coverage

  1. To run the tests navigate to the root directory.
    $ cd path/to/eda_mds
    
  2. To run the tests navigate to the root directory.
    $ pytest
    
  3. To run the coverage report.
    $ coverage report
    

Usage

Function Usage

Below provides a short depiction on how to start using the functions in this package, after you have completed the installation. Please see the vingette for detailed usage. Note: Each function takes in a pandas.DataFrame object.

  1. Import the functions and pandas.
from eda_mds import info_na, describe_outliers, cat_var_stats, cor_eda

import pandas as pd
  1. Load your dataset of choice.
df = pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/titanic.csv')
  1. Begin using the functions!
info_na(df)
describe_outliers(df)
cat_var_stats(df)
cor_eda(df)

Contributing

Package created by Koray Tecimer, Paolo De Lagrave-Codina, Nicole Bidwell, Simon Frew.

Interested in contributing? Check out the contributing guidelines. Please note that this project is released with a Code of Conduct. By contributing to this project, you agree to abide by its terms.

License

eda_mds was created by Koray Tecimer, Paolo De Lagrave-Codina, Nicole Bidwell, Simon Frew. Code is licensed under the terms of the MIT license. Non-code portions, specifically vignettes and related documentation, is licensed under the terms of the Creative Commons Zero v1.0 Universal license.

Credits

eda_mds was created with cookiecutter and the py-pkgs-cookiecutter template.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

eda_mds-2.1.8.tar.gz (10.0 kB view details)

Uploaded Source

Built Distribution

eda_mds-2.1.8-py3-none-any.whl (11.9 kB view details)

Uploaded Python 3

File details

Details for the file eda_mds-2.1.8.tar.gz.

File metadata

  • Download URL: eda_mds-2.1.8.tar.gz
  • Upload date:
  • Size: 10.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.7

File hashes

Hashes for eda_mds-2.1.8.tar.gz
Algorithm Hash digest
SHA256 9dfc3cae3fa1e861621627e927d313f31c4c733e39a6790ce2c656f483823339
MD5 8a467daee4ce51de15cb57d940736874
BLAKE2b-256 622c1ecde11f5b13115eb5052a24386ab1ef557182c6cea668964154a7368725

See more details on using hashes here.

File details

Details for the file eda_mds-2.1.8-py3-none-any.whl.

File metadata

  • Download URL: eda_mds-2.1.8-py3-none-any.whl
  • Upload date:
  • Size: 11.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.7

File hashes

Hashes for eda_mds-2.1.8-py3-none-any.whl
Algorithm Hash digest
SHA256 709751d0e67514263d2fe1f109ab24a44951fb65d1f3242bd4ae49f69a2a260b
MD5 55b9d33d657c415bebf83a6c2294fb76
BLAKE2b-256 2566df441236f4fc89d2509e5feeddcbd8fb5304a695cd270b8aa36ee3a6916c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page