Explore and transform your data
Project description
Exploretransform
Explore and transform your datasets
Exploretransform is a collection of data exploration functions and custom pipline trasformers. It's aims to streamline exploratory data analysis and extend some of scikit's data transformers.
Package Guide
Examples of using the exploretransform functions and classes are contained in examples.ipynb. Details about each function or class (docstrings) can be accessed using ?name
Installation
Python PYPI:
!pip install exploretransform
Import the exploretransform package:
import exploretransform as et
Summary of Functions and Classes
Function / Class | Description |
---|---|
loadboston | loads the Boston housing dataset |
peek | returns dtype, levels, # of observations, and first five observations for a dataframe |
explore | provides various statistics on a dataframe (zeros, inf, missing, levels, dtypes) |
nested | takes a list, series or dataframe and returns the location of nested objects |
freq | for categorical or ordinal features, provides the count, percent, and cumulative percent for each level |
plotfreq | generates a bar plot using the data generated by freq |
corrtable | generates a table of all pairwise correlations and uses the average correlation for the row and column in to decide on potential drop/filter candidates |
calcdrop | analyzes corrtable output determines which features should be filtered/drop |
skewstats | returns the skewness statistics and magnitude for each numeric feature |
ascores | calculates various association scores (kendall, pearson, mic, dcor, spearman) between predictors and target |
ColumnSelect | custom transformer that selects columns for pipeline |
CategoricalOtherLevel | custom transformer that creates "other" level in categorical / ordinal data based on threshold |
CorrelationFilter | custom transformer that filters numeric features based on pairwise correlation |
How to use exploretransform
More examples of using the exploretransform functions and classes are contained in examples.ipynb. Details about each function or class (docstrings) can be accessed using ?
?et.explore
loadboston()
df, X, y = et.loadboston()
explore()
et.explore(X)
variable | obs | q_zer | p_zer | q_na | p_na | q_inf | p_inf | dtype | |
---|---|---|---|---|---|---|---|---|---|
0 | town | 506 | 0 | 0.00 | 0 | 0.0 | 0 | 0.0 | object |
1 | lon | 506 | 0 | 0.00 | 0 | 0.0 | 0 | 0.0 | float64 |
2 | lat | 506 | 0 | 0.00 | 0 | 0.0 | 0 | 0.0 | float64 |
3 | crim | 506 | 0 | 0.00 | 0 | 0.0 | 0 | 0.0 | float64 |
4 | zn | 506 | 372 | 73.52 | 0 | 0.0 | 0 | 0.0 | float64 |
5 | indus | 506 | 0 | 0.00 | 0 | 0.0 | 0 | 0.0 | float64 |
6 | chas | 506 | 0 | 0.00 | 0 | 0.0 | 0 | 0.0 | category |
7 | nox | 506 | 0 | 0.00 | 0 | 0.0 | 0 | 0.0 | float64 |
8 | rm | 506 | 0 | 0.00 | 0 | 0.0 | 0 | 0.0 | float64 |
9 | age | 506 | 0 | 0.00 | 0 | 0.0 | 0 | 0.0 | float64 |
10 | dis | 506 | 0 | 0.00 | 0 | 0.0 | 0 | 0.0 | float64 |
11 | rad | 506 | 0 | 0.00 | 0 | 0.0 | 0 | 0.0 | category |
12 | tax | 506 | 0 | 0.00 | 0 | 0.0 | 0 | 0.0 | int64 |
13 | ptratio | 506 | 0 | 0.00 | 0 | 0.0 | 0 | 0.0 | float64 |
14 | b | 506 | 0 | 0.00 | 0 | 0.0 | 0 | 0.0 | float64 |
15 | lstat | 506 | 0 | 0.00 | 0 | 0.0 | 0 | 0.0 | float64 |
Column | Description |
---|---|
variable | name of variable |
obs | number of observations |
q_zer | number of zeros |
p_zer | percentage of zeros |
q_na | number of missing |
p_na | percentage of missing |
q_inf | number of infinity |
p_inf | percentage of infinity |
dtype | Python dtype |
Release History
- 0.1.0
- First release
Meta
Brian Pietracatella – bpietrac@gmail.com
Distributed under the MIT license. See LICENSE
for more information.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for exploretransform-1.0.4-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b1e5367915ef91da7ff524bce372136f5659bbd6b2ab5e97571850df3e4f42af |
|
MD5 | 7b7330109d4b671c317482e8663f1eee |
|
BLAKE2b-256 | 9282de32ff621470c7c82a27904f3b99008999cc99239a1a934b72a3ef5dfb45 |