A python package to handle EDA and feature extraction and also return the best hyperparameters for a tabular classification problem.

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

EDA FEATURE_EXTRACTOR MODEL

A python package to do EDA, feature selection and display the best hyperparameters for a pre-built classification model.

Useful for datasets with no NaNs or null values present. Can be used for normal classification tasks, next update will work for Regression type problems and incorporate a sorted arrangement of variables.

before utilizing the package, ensure no Null or NaN values remain.

pip install eda-fe-model

pip install eda-fe-model==0.2.2

Using the library

from eda_fe_model import package

package.EDA()
package.feature_extraction()

Use to_categorical from keras.utils, to One Hot Encode the labels

package.build_best_model()
package.model_create()

EDA

package.EDA accepts the following:

        dataset = dataset
        columns_drop = columns to drop as a list. Accepts None
        one_hot_encode = True/False
        label_encode = True/False
        normalize = True/False
        standardize = True/False
        target_varaible = single target, y
        test_size = percentage of the dataset to be used for testing purposes
        random_state

If the dataset only consists of categorical variables, set normalize or standardize to True.

returns the splitted dataset: x_train, x_test, y_train, y_test (respectively)

FEATURE EXTRACTION

package.feature_extraction accepts the following:

        train_X = train dataset consisting of predictors
        train_Y = train labels
        test_X = test dataset consisting of predictors
        test_Y = test labels
        rfe = True/False; Do you want to use Random Feature Extractor
        dim_out = Used only if rfe=True; output dimension; number of features to be selected 
        distribution = Distibution of the dataset you want to use for GLM

If rfe is False, set dim_out and distribution to be None, to return the input x and y for train and test datasets.
Try changing the distribution if error due to convergence appear.

returns x_train and x_test datasets with the user entered dimension/predictors

BUILD BEST MODEL

package.build_best_model accepts the follwoing:

        x = train dataset consisting of predictors
        y = One HOt Encoded training labels

returns a RandomizedSearchCV object.

Best Score: results.best_score_
Best Parameters: results.best_params_

CREATING THE MODEL

package.model_create accpets the best parameters from the build_best_model() and runs the model for a user specified epochs.

        x = the new train dataset consisting of just the predictors.
        y = One Hot encoded training labels

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.3.2

Nov 4, 2020

0.3.1

Nov 3, 2020

0.3.0

Nov 3, 2020

This version

0.2.4

Nov 3, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

eda-fe-model-0.2.4.tar.gz (5.5 kB view hashes)

Uploaded Nov 3, 2020 Source

Built Distribution

eda_fe_model-0.2.4-py3-none-any.whl (5.5 kB view hashes)

Uploaded Nov 3, 2020 Python 3

Hashes for eda-fe-model-0.2.4.tar.gz

Hashes for eda-fe-model-0.2.4.tar.gz
Algorithm	Hash digest
SHA256	`9ae91e18346333027570206dc7fda8ac7d97cd3f0d7d3d0ecc413ddb3eb3657f`
MD5	`44ce8d891ae34a1f9078994cf33273b6`
BLAKE2b-256	`dc01140ad8357dccd7a9ac7117a43bfec8f9e1c5a111afcc75ee498ccf7be4e0`

Hashes for eda_fe_model-0.2.4-py3-none-any.whl

Hashes for eda_fe_model-0.2.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e5fbd33bef81f920d99cfff73189fe620c943eeea36e182c7dfcb302b38fca4a`
MD5	`9f11f257a3ea4e23cd13c433243be976`
BLAKE2b-256	`df1d367a388abfbb54ac9d18fa8d991767d6cbe1ca2249331f5084d7eeb8085e`