Skip to main content

Utility functions to help with exploratory data analysis on top the Pandas APIs

Project description

Pandas Data Exploration Utility Package

Table of content

Overview

Pandas Data Exploration utility is an interactive, notebook based library for quickly profiling and exploring the shape of data and the relationships between data. Using existing APIs from IpyWidget, Plot.ly, and Pandas, it creates a flexible point and click widget that allows the user to easily explore and visualize the dataset.
This is a work in progress, and I welcome any suggestions on features and/or enhancements.

Installation

pip install Pandas-Data-Exploration-Utility-Package

Usage

Visualization Module

import pandas as pd
import pandas_exploration_util.viz.explore as pe

global_temp = pd.read_csv("./data/GlobalTemperatures.csv", parse_dates = [0], infer_datetime_format=True)

pe.generate_widget(global_temp)

see /test for sample data and test jupyter notebook
https://github.com/yifeihuang/pandas_exploration_util/tree/master/test


Pareto plot

Visualize the top values of any column as ranked by aggregation of any other column. Support aggregation functions include 'count', 'sum', 'mean', 'std', 'max', 'min', 'uniques'

Distribution plot

Visualize distribution of any numerical value. Binning is automatically determined by the plot.ly histogram method.

X-Y plot

Visualize the X-Y scatter of any column vs aggregation of any other column. Support aggregation functions include 'count', 'sum', 'mean', 'std', 'max', 'min', 'uniques'

Recommended development setup

Local Dev

  1. Setup virtualenv
  2. Create a virtual environment using virtualenv /path/to/env/dir
  3. Activate virtual environment using source /path/to/env/dir/bin/activate
  4. Clone the repo locally
  5. Navigate the root directory of the repo where the setup.py lives
  6. Install the module in development mode using python setup.py develop
  7. Run the Jupyter notebook that is in the virtual environment directory, which should have installed as the part of the dependency of the module
  8. Dev away
  9. When done uninstall the package using python setup.py develop --uninstall
  10. Deactive the environment using deactivate

Building and distributing

https://packaging.python.org/tutorials/packaging-projects/
Assuming all relevant tools are installed and the relevant project files are properly defined

  1. build the distribution using python3 setup.py sdist bdist_wheel
  2. upload the distribution using twine upload dist/*{version}*

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file Pandas Data Exploration Utility Package-0.0.3.tar.gz.

File metadata

  • Download URL: Pandas Data Exploration Utility Package-0.0.3.tar.gz
  • Upload date:
  • Size: 5.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.18.4 setuptools/40.0.0 requests-toolbelt/0.8.0 tqdm/4.24.0 CPython/3.6.5

File hashes

Hashes for Pandas Data Exploration Utility Package-0.0.3.tar.gz
Algorithm Hash digest
SHA256 372acc91c7bcfb192a078af599607acf64dcc09461f5a332d673298e20cfe8e6
MD5 a5abcb3081d86bd1e68095cb286e1af1
BLAKE2b-256 044606077509fabcd921e0c7b43b7b6d8455e90d7f6b8681e441004cfd81009a

See more details on using hashes here.

File details

Details for the file Pandas_Data_Exploration_Utility_Package-0.0.3-py3-none-any.whl.

File metadata

File hashes

Hashes for Pandas_Data_Exploration_Utility_Package-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 adb69db3acdc64c73c104146d00a5cd612f9f68f8aa89e01d8bcccd012400e71
MD5 44e193fd18061d4b8b896141523db132
BLAKE2b-256 cd83a1476af964060b88561e2645ff2487b67aeca5faaab9edbc255a51248541

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page