Skip to main content

Explore and transform your data

Project description

Exploretransform

Explore and transform your datasets

Exploretransform is a collection of data exploration functions and custom pipline trasformers. It's aims to streamline exploratory data analysis and extend some of scikit's data transformers.  

Package Guide

Examples of using the exploretransform functions and classes are contained in examples.ipynb. Details about each function or class (docstrings) can be accessed using ?name

Installation

Python PYPI:

!pip install exploretransform

Import the exploretransform package:

import exploretransform as et

 

Summary of Functions and Classes

Function / Class Description
loadboston loads the Boston housing dataset
peek returns dtype, levels, # of observations, and first five observations for a dataframe
explore provides various statistics on a dataframe (zeros, inf, missing, levels, dtypes)
nested takes a list, series or dataframe and returns the location of nested objects
freq for categorical or ordinal features, provides the count, percent, and cumulative percent for each level
plotfreq generates a bar plot using the data generated by freq
corrtable generates a table of all pairwise correlations and uses the average correlation for the row and column in to decide on potential drop/filter candidates
calcdrop analyzes corrtable output determines which features should be filtered/drop
skewstats returns the skewness statistics and magnitude for each numeric feature
ascores calculates various association scores (kendall, pearson, mic, dcor, spearman) between predictors and target
ColumnSelect custom transformer that selects columns for pipeline
CategoricalOtherLevel custom transformer that creates "other" level in categorical / ordinal data based on threshold
CorrelationFilter custom transformer that filters numeric features based on pairwise correlation

 

How to use exploretransform

More examples of using the exploretransform functions and classes are contained in examples.ipynb. Details about each function or class (docstrings) can be accessed using ?

?et.explore

loadboston()

df, X, y = et.loadboston()

explore()

et.explore(X)
variable obs q_zer p_zer q_na p_na q_inf p_inf dtype
0 town 506 0 0.00 0 0.0 0 0.0 object
1 lon 506 0 0.00 0 0.0 0 0.0 float64
2 lat 506 0 0.00 0 0.0 0 0.0 float64
3 crim 506 0 0.00 0 0.0 0 0.0 float64
4 zn 506 372 73.52 0 0.0 0 0.0 float64
5 indus 506 0 0.00 0 0.0 0 0.0 float64
6 chas 506 0 0.00 0 0.0 0 0.0 category
7 nox 506 0 0.00 0 0.0 0 0.0 float64
8 rm 506 0 0.00 0 0.0 0 0.0 float64
9 age 506 0 0.00 0 0.0 0 0.0 float64
10 dis 506 0 0.00 0 0.0 0 0.0 float64
11 rad 506 0 0.00 0 0.0 0 0.0 category
12 tax 506 0 0.00 0 0.0 0 0.0 int64
13 ptratio 506 0 0.00 0 0.0 0 0.0 float64
14 b 506 0 0.00 0 0.0 0 0.0 float64
15 lstat 506 0 0.00 0 0.0 0 0.0 float64

 

Column Description
variable name of variable
obs number of observations
q_zer number of zeros
p_zer percentage of zeros
q_na number of missing
p_na percentage of missing
q_inf number of infinity
p_inf percentage of infinity
dtype Python dtype

 

Release History

  • 0.1.0
    • First release

 

Meta

Brian Pietracatella – bpietrac@gmail.com

Distributed under the MIT license. See LICENSE for more information.

https://github.com/bxp151/exploretransform

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

exploretransform-1.0.0.tar.gz (243.6 kB view hashes)

Uploaded Source

Built Distribution

exploretransform-1.0.0-py3-none-any.whl (3.6 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page