No project description provided

Project description

PreProcess1

Super Easy Way of PreProcessing your Data!

Good News!

Preprocess1 simplifies the preprocessing steps that are some time essential for ML/modelling, such as imputations, one hot encoding. There are over 20 preprocessing steps available. A summary of the options is below:

Auto infer data types
Impute (simple or with surrogate columns)
Ordinal Encoder
Drop categorical variables that have zero variance or near-zero variance
Club categorical variables levels together as a new level (other_infrequent) that are rare / at the bottom 5% of the variable
distribution
Club unseen levels in test dataset with most/least frequent levels in train dataset
Reduce high cardinality in categorical features using clustering or counts
Generate sub-features from time feature such as 'month','weekday',is_month_end','is_month_start' & 'hour'
Group features by calculating min, max, mean, median & sd of similar features
Make nonlinear features (polynomial, sin, cos & tan)
Scales & Power Transform (zscore,minmax,yeo-johnson,quantile,maxabs,robust) , including option to transform target variable
Apply binning to variables when numeric features are provided as a list
Detect & remove outliers using isolation forest, KNN and PCA
Apply clusters to segment entire data
One Hot / Dummy encoding
Remove special characters from column names such as commas, square brackets etc. to make it compatible with Jason dependent models
Feature Selection through Random Forest, LightGBM and Pearson Correlation
Fix multicollinearity
Feature Interaction (DFS), multiply, divided, add and subtract features
Apply dimension reduction techniques such as pca_liner, pca_kernal, incremental or Tsne. except for pca_liner, all other methods only take the number of components (as integer) i.e no variance explanation method available

You can install the library as

pip install preprocess1
from preprocess1 import toolkit as t

Although one can use the methods individually (by calling the respective class) , such as:

binn = t.Binning(['feature_tobin'])
binned_data = binn.fit_transform(training_data)
binned_new_data = binn.transform(test_data)

However, there is more power to it. We have made pre-built complete pipelines to deploy all sorts of preprocessing transformers. Path1 is for supervised ML, and Path2 is for unsupervised ML problems. Below is how you use it:

# apply the path to the training dataset while clubbing rare categorical levels & scaling numerical features
# Imputation & One Hot Encoding is automatically applied
data_training_transformed = t.Preprocess_Path_One(training_data, 'target_column', club_rare_levels = True, scale_data= True)
# apply the pipeline to the test data set
data_test_transformed = pipe.fit_transform(test_data)

You can find more information under the docstring of each class/function. Enjoy coding! Please share your ideas, suggestions and critique with me.

License

Project details

Release history Release notifications | RSS feed

This version

0.1.42

Aug 1, 2020

0.1.41

Jul 6, 2020

0.1.40

Jul 6, 2020

0.1.39

Jul 6, 2020

0.1.38

May 26, 2020

0.1.37

May 26, 2020

0.1.36

Apr 29, 2020

0.1.35

Apr 29, 2020

0.1.34

Feb 9, 2020

0.1.33

Feb 5, 2020

0.1.32

Feb 4, 2020

0.1.31

Feb 3, 2020

0.1.30

Jan 29, 2020

0.1.29

Jan 29, 2020

0.1.28

Jan 27, 2020

0.1.27

Jan 26, 2020

0.1.26

Jan 26, 2020

0.1.25

Jan 20, 2020

0.1.24

Jan 20, 2020

0.1.23

Jan 18, 2020

0.1.22

Jan 17, 2020

0.1.21

Jan 17, 2020

0.1.20

Jan 16, 2020

0.1.19

Jan 16, 2020

0.1.18

Jan 16, 2020

0.1.17

Jan 15, 2020

0.1.16

Jan 15, 2020

0.1.15

Jan 14, 2020

0.1.14

Jan 14, 2020

0.1.13

Jan 14, 2020

0.1.12

Jan 14, 2020

0.1.11

Jan 13, 2020

0.1.10

Jan 13, 2020

0.1.9

Jan 13, 2020

0.1.8

Jan 11, 2020

0.1.7

Jan 11, 2020

0.1.6

Jan 11, 2020

0.1.5

Jan 11, 2020

0.1.4

Jan 10, 2020

0.1.3

Jan 10, 2020

0.1.2

Dec 18, 2019

0.1.1

Dec 18, 2019

0.1.0

Dec 7, 2019

0.0.9

Nov 25, 2019

0.0.5

Nov 25, 2019

0.0.4

Nov 25, 2019

0.0.3

Nov 25, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

preprocess1-0.1.42.tar.gz (29.6 kB view details)

Uploaded Aug 1, 2020 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

preprocess1-0.1.42-py3-none-any.whl (41.2 kB view details)

Uploaded Aug 1, 2020 Python 3

File details

Details for the file preprocess1-0.1.42.tar.gz.

File metadata

Download URL: preprocess1-0.1.42.tar.gz
Upload date: Aug 1, 2020
Size: 29.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/47.1.0 requests-toolbelt/0.9.1 tqdm/4.47.0 CPython/3.8.4rc1

File hashes

Hashes for preprocess1-0.1.42.tar.gz
Algorithm	Hash digest
SHA256	`22045d7174552832d98f7cab0862f7a5b07831f26673a2e762da27f4fa92ac03`
MD5	`66aa0ee2bda1fb6898938665596857cb`
BLAKE2b-256	`0c12e44e25dd08b1a77e229b0f29ed1ff58c2649c2017c0f62b0f1c9461b4850`

See more details on using hashes here.

File details

Details for the file preprocess1-0.1.42-py3-none-any.whl.

File metadata

Download URL: preprocess1-0.1.42-py3-none-any.whl
Upload date: Aug 1, 2020
Size: 41.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/47.1.0 requests-toolbelt/0.9.1 tqdm/4.47.0 CPython/3.8.4rc1

File hashes

Hashes for preprocess1-0.1.42-py3-none-any.whl
Algorithm	Hash digest
SHA256	`54ae56d1c3884432e1da8c2959c4172ed8ffd5cf0719e313a2a556e329e7f94e`
MD5	`bf8cd340c202b77586d4e7e89b7b1232`
BLAKE2b-256	`210c2b9cec0516e91a0c53b4d34c67512b893686e7a3bb2591f0765d5fdf1899`

See more details on using hashes here.

preprocess1 0.1.42

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

PreProcess1

Super Easy Way of PreProcessing your Data!

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes