Skip to main content

Make exploratory data analysis easier!

Project description

pyeasyeda

ci-cd codecov

Since exploratory data analysis is an imperative part of every analysis, this package aims at providing efficient data scrubbing and visualization tools to perform preliminary EDA on raw data. The package can be leveraged to clean the dataset and visualize relationships between features to generate insightful trends.

Functions

  • clean_up - This function takes in a pandas dataframe object and performs initial steps of EDA on unstructured data. It returns a clean dataset by removing null values and identifying potential outliers in numeric variables based on a defined threshold.

  • birds_eye_view - This function takes in a pandas dataframe object and visualizes the distributions of variables in the form of histograms and density plots. It also generates a correlation heatmap for numeric variables to study their relationships.

  • close_up - This function accepts a pandas dataframe object creates a scatterplot of the variable(s) most strongly correlated with the dependent variable. The plot also produces a trend line to model the correlation between the variables.

  • summary_suggestions - This function takes in a pandas dataframe object and outputs a table of summary statistics for numeric and categorical variables and a table for percentage of unique values in the categorical variables.

Other packages that offer similar functionality are:

Installation

$ pip install pyeasyeda

Usage

After installing the package through the command above, please run the following commands in the terminal from the root of the project repo as a quick demo.

python
import pandas as pd
from pyeasyeda.clean_up import clean_up
from pyeasyeda.birds_eye_view import birds_eye_view
from pyeasyeda.close_up import close_up
from pyeasyeda.summary_suggestions import summary_suggestions
df = pd.read_csv("tests/data/penguins_test.csv")
clean_up(df)
plots = birds_eye_view(df)
close_up(df, 1)
summary_suggestions(df)

Please check our official documentation for the example usage of the package at pyeasyeda/example on Read the Docs.

Documentation

The official documentation is hosted at pyeasyeda on Read the Docs.

Contributors

This python package was developed by James Kim, Kristin Bunyan, Luming Yang and Sukhleen Kaur. The team is from the Master of Data Science program at the University of the British Columbia.

Contributing

Interested in contributing? Check out the contributing guidelines. Please note that this project is released with a Code of Conduct. By contributing to this project, you agree to abide by its terms.

License

pyeasyeda was created by James Kim, Kristin Banyan, Luming Yang and Sukhleen Kaur. It is licensed under the terms of the MIT license.

Credits

pyeasyeda was created with cookiecutter and the py-pkgs-cookiecutter template.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyeasyeda-0.2.8.tar.gz (7.0 kB view details)

Uploaded Source

Built Distribution

pyeasyeda-0.2.8-py3-none-any.whl (7.9 kB view details)

Uploaded Python 3

File details

Details for the file pyeasyeda-0.2.8.tar.gz.

File metadata

  • Download URL: pyeasyeda-0.2.8.tar.gz
  • Upload date:
  • Size: 7.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/32.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.62.3 importlib-metadata/4.10.1 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.9.10

File hashes

Hashes for pyeasyeda-0.2.8.tar.gz
Algorithm Hash digest
SHA256 909f062149c12158742d15ce64ad02e6d711ea6b110a77aa0503160bd0cdac82
MD5 6b4c4c415497c9e0e944f64c94033bef
BLAKE2b-256 b5d18a53fa70182dd02bc721362aba8edd0380358cbe9a130b93f29664e1664a

See more details on using hashes here.

File details

Details for the file pyeasyeda-0.2.8-py3-none-any.whl.

File metadata

  • Download URL: pyeasyeda-0.2.8-py3-none-any.whl
  • Upload date:
  • Size: 7.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/32.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.62.3 importlib-metadata/4.10.1 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.9.10

File hashes

Hashes for pyeasyeda-0.2.8-py3-none-any.whl
Algorithm Hash digest
SHA256 d59e400bfc4bdeb1097c7d48f0c72ed98698f52ec3a5c3c67692825dcb91f26b
MD5 7d9eec12d77f8bafc5bb1f0630a873e4
BLAKE2b-256 9bb40f5adae8c51dc33de96020d435ff4ddf4595033bc0ebe8f7e31abb455e96

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page