Explore and transform your data
Project description
Exploretransform
Explore and transform your datasets
Exploretransform is a collection of data exploration functions and custom pipline trasformers. It aims to streamline exploratory data analysis and extend some of scikit's data transformers.
Package Guide
Examples of using the exploretransform functions and classes are contained in examples.ipynb located on the github repository. Details about each function or class (docstrings) can be accessed using ?name
Installation
Python PYPI:
!pip install exploretransform
Import the exploretransform package:
import exploretransform as et
Summary of Functions and Classes
| Function / Class | Description |
|---|---|
| loadboston | loads the Boston housing dataset |
| peek | returns dtype, levels, # of observations, and first five observations for a dataframe |
| explore | provides various statistics on a dataframe (zeros, inf, missing, levels, dtypes) |
| nested | takes a list, series or dataframe and returns the location of nested objects |
| freq | for categorical or ordinal features, provides the count, percent, and cumulative percent for each level |
| plotfreq | generates a bar plot using the data generated by freq |
| corrtable | generates a table of all pairwise correlations and uses the average correlation for the row and column in to decide on potential drop/filter candidates |
| calcdrop | analyzes corrtable output determines which features should be filtered/drop |
| skewstats | returns the skewness statistics and magnitude for each numeric feature |
| ascores | calculates various association scores (kendall, pearson, mic, dcor, spearman) between predictors and target |
| ColumnSelect | custom transformer that selects columns for pipeline |
| CategoricalOtherLevel | custom transformer that creates "other" level in categorical / ordinal data based on threshold |
| CorrelationFilter | custom transformer that filters numeric features based on pairwise correlation |
How to use exploretransform
More examples of using the exploretransform functions and classes are contained in examples.ipynb. Details about each function or class (docstrings) can be accessed using ?
?et.explore
loadboston()
df, X, y = et.loadboston()
explore()
et.explore(X)
| variable | obs | q_zer | p_zer | q_na | p_na | q_inf | p_inf | dtype | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | town | 506 | 0 | 0.00 | 0 | 0.0 | 0 | 0.0 | object |
| 1 | lon | 506 | 0 | 0.00 | 0 | 0.0 | 0 | 0.0 | float64 |
| 2 | lat | 506 | 0 | 0.00 | 0 | 0.0 | 0 | 0.0 | float64 |
| 3 | crim | 506 | 0 | 0.00 | 0 | 0.0 | 0 | 0.0 | float64 |
| 4 | zn | 506 | 372 | 73.52 | 0 | 0.0 | 0 | 0.0 | float64 |
| 5 | indus | 506 | 0 | 0.00 | 0 | 0.0 | 0 | 0.0 | float64 |
| 6 | chas | 506 | 0 | 0.00 | 0 | 0.0 | 0 | 0.0 | category |
| 7 | nox | 506 | 0 | 0.00 | 0 | 0.0 | 0 | 0.0 | float64 |
| 8 | rm | 506 | 0 | 0.00 | 0 | 0.0 | 0 | 0.0 | float64 |
| 9 | age | 506 | 0 | 0.00 | 0 | 0.0 | 0 | 0.0 | float64 |
| 10 | dis | 506 | 0 | 0.00 | 0 | 0.0 | 0 | 0.0 | float64 |
| 11 | rad | 506 | 0 | 0.00 | 0 | 0.0 | 0 | 0.0 | category |
| 12 | tax | 506 | 0 | 0.00 | 0 | 0.0 | 0 | 0.0 | int64 |
| 13 | ptratio | 506 | 0 | 0.00 | 0 | 0.0 | 0 | 0.0 | float64 |
| 14 | b | 506 | 0 | 0.00 | 0 | 0.0 | 0 | 0.0 | float64 |
| 15 | lstat | 506 | 0 | 0.00 | 0 | 0.0 | 0 | 0.0 | float64 |
| Column | Description |
|---|---|
| variable | name of variable |
| obs | number of observations |
| q_zer | number of zeros |
| p_zer | percentage of zeros |
| q_na | number of missing |
| p_na | percentage of missing |
| q_inf | number of infinity |
| p_inf | percentage of infinity |
| dtype | Python dtype |
Release History
- 1.0.0
- First release
- 1.0.1 - 1.0.7
- Minor adjustments to get package working correctly
Meta
Brian Pietracatella – bpietrac@gmail.com
Distributed under the MIT license. See LICENSE for more information.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file exploretransform-1.0.7.tar.gz.
File metadata
- Download URL: exploretransform-1.0.7.tar.gz
- Upload date:
- Size: 12.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.6.0.post20200814 requests-toolbelt/0.9.1 tqdm/4.50.0 CPython/3.8.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b44d53d4620f1c7c6bd96790d12704f023be647fd96ce5f7af4ae4522f83ccff
|
|
| MD5 |
30bc9bd46ffaf39ed83776ffe38cd009
|
|
| BLAKE2b-256 |
857829e85a35ef875e9d1b277e240e9a00597d8ac5f655d43d9cc49e0a68b493
|
File details
Details for the file exploretransform-1.0.7-py3-none-any.whl.
File metadata
- Download URL: exploretransform-1.0.7-py3-none-any.whl
- Upload date:
- Size: 11.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.6.0.post20200814 requests-toolbelt/0.9.1 tqdm/4.50.0 CPython/3.8.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
32c3e7aa73c802efca7152db78a48438f733809da92c7e53183b7389d5844580
|
|
| MD5 |
556123b62c46c6a7f915c71c468b2518
|
|
| BLAKE2b-256 |
9391be8914c651fe7268fb5bc91ea2910d98d7d17d1035b4c94059abc632d1d2
|