Skip to main content

Explore and transform your data

Project description

Exploretransform

Explore and transform your datasets

Exploretransform is a collection of data exploration functions and custom pipline trasformers. It aims to streamline exploratory data analysis and extend some of scikit's data transformers.  

Package Guide

Examples of using the exploretransform functions and classes are contained in examples.ipynb located on the github repository. Details about each function or class (docstrings) can be accessed using ?name

Installation

Python PYPI:

!pip install exploretransform

Import the exploretransform package:

import exploretransform as et

 

Summary of Functions and Classes

Function / Class Description
loadboston loads the Boston housing dataset
peek returns dtype, levels, # of observations, and first five observations for a dataframe
explore provides various statistics on a dataframe (zeros, inf, missing, levels, dtypes)
nested takes a list, series or dataframe and returns the location of nested objects
freq for categorical or ordinal features, provides the count, percent, and cumulative percent for each level
plotfreq generates a bar plot using the data generated by freq
corrtable generates a table of all pairwise correlations and uses the average correlation for the row and column in to decide on potential drop/filter candidates
calcdrop analyzes corrtable output determines which features should be filtered/drop
skewstats returns the skewness statistics and magnitude for each numeric feature
ascores calculates various association scores (kendall, pearson, mic, dcor, spearman) between predictors and target
ColumnSelect custom transformer that selects columns for pipeline
CategoricalOtherLevel custom transformer that creates "other" level in categorical / ordinal data based on threshold
CorrelationFilter custom transformer that filters numeric features based on pairwise correlation

 

How to use exploretransform

More examples of using the exploretransform functions and classes are contained in examples.ipynb. Details about each function or class (docstrings) can be accessed using ?

?et.explore

loadboston()

df, X, y = et.loadboston()

explore()

et.explore(X)
variable obs q_zer p_zer q_na p_na q_inf p_inf dtype
0 town 506 0 0.00 0 0.0 0 0.0 object
1 lon 506 0 0.00 0 0.0 0 0.0 float64
2 lat 506 0 0.00 0 0.0 0 0.0 float64
3 crim 506 0 0.00 0 0.0 0 0.0 float64
4 zn 506 372 73.52 0 0.0 0 0.0 float64
5 indus 506 0 0.00 0 0.0 0 0.0 float64
6 chas 506 0 0.00 0 0.0 0 0.0 category
7 nox 506 0 0.00 0 0.0 0 0.0 float64
8 rm 506 0 0.00 0 0.0 0 0.0 float64
9 age 506 0 0.00 0 0.0 0 0.0 float64
10 dis 506 0 0.00 0 0.0 0 0.0 float64
11 rad 506 0 0.00 0 0.0 0 0.0 category
12 tax 506 0 0.00 0 0.0 0 0.0 int64
13 ptratio 506 0 0.00 0 0.0 0 0.0 float64
14 b 506 0 0.00 0 0.0 0 0.0 float64
15 lstat 506 0 0.00 0 0.0 0 0.0 float64

 

Column Description
variable name of variable
obs number of observations
q_zer number of zeros
p_zer percentage of zeros
q_na number of missing
p_na percentage of missing
q_inf number of infinity
p_inf percentage of infinity
dtype Python dtype

 

Release History

  • 1.0.0
    • First release
  • 1.0.1 - 1.0.7
    • Minor adjustments to get package working correctly

 

Meta

Brian Pietracatella – bpietrac@gmail.com

Distributed under the MIT license. See LICENSE for more information.

https://github.com/bxp151/exploretransform

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

exploretransform-1.0.7.tar.gz (12.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

exploretransform-1.0.7-py3-none-any.whl (11.7 kB view details)

Uploaded Python 3

File details

Details for the file exploretransform-1.0.7.tar.gz.

File metadata

  • Download URL: exploretransform-1.0.7.tar.gz
  • Upload date:
  • Size: 12.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.6.0.post20200814 requests-toolbelt/0.9.1 tqdm/4.50.0 CPython/3.8.5

File hashes

Hashes for exploretransform-1.0.7.tar.gz
Algorithm Hash digest
SHA256 b44d53d4620f1c7c6bd96790d12704f023be647fd96ce5f7af4ae4522f83ccff
MD5 30bc9bd46ffaf39ed83776ffe38cd009
BLAKE2b-256 857829e85a35ef875e9d1b277e240e9a00597d8ac5f655d43d9cc49e0a68b493

See more details on using hashes here.

File details

Details for the file exploretransform-1.0.7-py3-none-any.whl.

File metadata

  • Download URL: exploretransform-1.0.7-py3-none-any.whl
  • Upload date:
  • Size: 11.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.6.0.post20200814 requests-toolbelt/0.9.1 tqdm/4.50.0 CPython/3.8.5

File hashes

Hashes for exploretransform-1.0.7-py3-none-any.whl
Algorithm Hash digest
SHA256 32c3e7aa73c802efca7152db78a48438f733809da92c7e53183b7389d5844580
MD5 556123b62c46c6a7f915c71c468b2518
BLAKE2b-256 9391be8914c651fe7268fb5bc91ea2910d98d7d17d1035b4c94059abc632d1d2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page