A package for streamlining EDA processes for basic Data Analysis
Project description
gdphelper
This package is designed to take the url of any of the several dozen GDP-related csv datasets from the Canadian Government Open Data Portal and download, clean load, summarize and visualize the data contained within.
It contains 4 functions:
gdpimporter
: Downloads the zipped data, extracts, renames the appropriate csv, and returns a dataframe along with the title from the meta data.
gdpcleaner
: Loads the data, removes spurious columns, renames used columns, scrubs and data issues. Returns a basic data frame and some category flags.
gdpdescribe
: Evaluates the data category and generates summary statistics by year, region, industry, etc.
gdpplotter
: Generates a set of visualizations of the data set according to the user's choices.
This package is built upon a bunch of popular packages in Python ecosystem, including
zipfile
, matplotlib
, and pandas.
What makes this package unique is that it incorporates the common functionalities and streamlines the workflow from downloading the data to performing simple EDA, specifically for the GDP-related data from the Canadian Government Open Data Portal.
Installation
$ pip install gdphelper
Usage
from gdphelper import gdpimporter
from gdphelper import gdpcleaner
from gdphelper import gdpdescribe
from gdphelper import gdpplotter
URL = "https://www150.statcan.gc.ca/n1/tbl/csv/36100400-eng.zip"
data_frame, title = gdpimporter.gdpimporter(URL)
clean_frame = gdpcleaner.gdpcleaner(data_frame)
gdpdescribe.gdpdescribe(clean_frame, "Value", "Location", stats=["mean", "median", "sd", "min", "max", "range_"], dec=2)
gdpplotter.gdpplotter(clean_frame)
for more detailed documentation, see: https://gdphelper.readthedocs.io/en/latest/
Contributors
- Aldo Barros aldosaltao@gmail.com
- Gabe Fairbrother gfairbrother@gmail.com
- Wanying Ye wanying.ye2020@gmail.com
- Ramiro Mejia ramiromejiap@gmail.com
Contributing
Interested in contributing? Check out the contributing guidelines. Please note that this project is released with a Code of Conduct. By contributing to this project, you agree to abide by its terms.
License
gdphelper
was created by Aldo Barros, Gabriel Fairbrother, Ramiro Mejia, Wanying Ye. It is licensed under the terms of the MIT license.
Credits
gdphelper
was created with cookiecutter
and the py-pkgs-cookiecutter
template.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file gdphelper-1.1.11.tar.gz
.
File metadata
- Download URL: gdphelper-1.1.11.tar.gz
- Upload date:
- Size: 8.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/32.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.62.3 importlib-metadata/4.10.1 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.9.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 12afa16b57b94e0be45d890ef1ac04a6e450af782dbe8af58f168b0077833c5e |
|
MD5 | 45242c36e6e7b5f56c8d0fc3da00f164 |
|
BLAKE2b-256 | bfa82ee986e88174c9c462a45af977299472c58d3e6599c190a3be885f46bfde |
File details
Details for the file gdphelper-1.1.11-py3-none-any.whl
.
File metadata
- Download URL: gdphelper-1.1.11-py3-none-any.whl
- Upload date:
- Size: 8.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/32.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.62.3 importlib-metadata/4.10.1 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.9.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | df807e2a65e994f96cb099500969f732f9d3adf5bbcefd1a93e279a1f661c602 |
|
MD5 | 6a54e3dc03f1754134caf755b7614511 |
|
BLAKE2b-256 | 79c89417b28c9b55843454ba077a943bbc74e7c376f1bef111494cb0ad56deef |