Skip to main content

No project description provided

Project description

PreProcess1

Super Easy Way of PreProcessing your Data!

Good News!

Preprocess1 simplifies the preprocessing steps that are some time essential for ML/modelling, such as imputations, one hot encoding. There are over 20 preprocessing steps available. A summary of the options is below:

  • Auto infer data types
  • Impute (simple or with surrogate columns)
  • Ordinal Encoder
  • Drop categorical variables that have zero variance or near-zero variance
  • Club categorical variables levels together as a new level (other_infrequent) that are rare / at the bottom 5% of the variable
    distribution
  • Club unseen levels in test dataset with most/least frequent levels in train dataset
  • Reduce high cardinality in categorical features using clustering or counts
  • Generate sub-features from time feature such as 'month','weekday',is_month_end','is_month_start' & 'hour'
  • Group features by calculating min, max, mean, median & sd of similar features
  • Make nonlinear features (polynomial, sin, cos & tan)
  • Scales & Power Transform (zscore,minmax,yeo-johnson,quantile,maxabs,robust) , including option to transform target variable
  • Apply binning to variables when numeric features are provided as a list
  • Detect & remove outliers using isolation forest, KNN and PCA
  • Apply clusters to segment entire data
  • One Hot / Dummy encoding
  • Remove special characters from column names such as commas, square brackets etc. to make it compatible with Jason dependent models
  • Feature Selection through Random Forest, LightGBM and Pearson Correlation
  • Fix multicollinearity
  • Feature Interaction (DFS), multiply, divided, add and subtract features
  • Apply dimension reduction techniques such as pca_liner, pca_kernal, incremental or Tsne. except for pca_liner, all other methods only take the number of components (as integer) i.e no variance explanation method available

You can install the library as

pip install preprocess1
from preprocess1 import toolkit as t

Although one can use the methods individually (by calling the respective class) , such as:

binn = t.Binning(['feature_tobin'])
binned_data = binn.fit_transform(training_data)
binned_new_data = binn.transform(test_data)

However, there is more power to it. We have made pre-built complete pipelines to deploy all sorts of preprocessing transformers. Path1 is for supervised ML, and Path2 is for unsupervised ML problems. Below is how you use it:

# apply the path to the training dataset while clubbing rare categorical levels & scaling numerical features
# Imputation & One Hot Encoding is automatically applied
data_training_transformed = t.Preprocess_Path_One(training_data, 'target_column', club_rare_levels = True, scale_data= True)
# apply the pipeline to the test data set
data_test_transformed = pipe.fit_transform(test_data)

You can find more information under the docstring of each class/function. Enjoy coding! Please share your ideas, suggestions and critique with me.

License

Copyright 2019-2020 Fahad Akbar fahad.akbar@gmail.com

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

preprocess1-0.1.42.tar.gz (29.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

preprocess1-0.1.42-py3-none-any.whl (41.2 kB view details)

Uploaded Python 3

File details

Details for the file preprocess1-0.1.42.tar.gz.

File metadata

  • Download URL: preprocess1-0.1.42.tar.gz
  • Upload date:
  • Size: 29.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/47.1.0 requests-toolbelt/0.9.1 tqdm/4.47.0 CPython/3.8.4rc1

File hashes

Hashes for preprocess1-0.1.42.tar.gz
Algorithm Hash digest
SHA256 22045d7174552832d98f7cab0862f7a5b07831f26673a2e762da27f4fa92ac03
MD5 66aa0ee2bda1fb6898938665596857cb
BLAKE2b-256 0c12e44e25dd08b1a77e229b0f29ed1ff58c2649c2017c0f62b0f1c9461b4850

See more details on using hashes here.

File details

Details for the file preprocess1-0.1.42-py3-none-any.whl.

File metadata

  • Download URL: preprocess1-0.1.42-py3-none-any.whl
  • Upload date:
  • Size: 41.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/47.1.0 requests-toolbelt/0.9.1 tqdm/4.47.0 CPython/3.8.4rc1

File hashes

Hashes for preprocess1-0.1.42-py3-none-any.whl
Algorithm Hash digest
SHA256 54ae56d1c3884432e1da8c2959c4172ed8ffd5cf0719e313a2a556e329e7f94e
MD5 bf8cd340c202b77586d4e7e89b7b1232
BLAKE2b-256 210c2b9cec0516e91a0c53b4d34c67512b893686e7a3bb2591f0765d5fdf1899

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page