Skip to main content

Toolbox for reinforced developing of machine learning models (as proof-of-concept)

Project description

Happy ;) Learning

Description:

Toolbox for reinforced developing of machine learning models (as proof-of-concept) in python. It is specially designed to evolve and optimize machine learning models using evolutionary algorithms both on the feature engineering side and on the hyper parameter tuning side.

Table of Content:

  1. Installation
  2. Requirements
  3. Introduction
    • Practical Usage
    • FeatureEngineer
    • FeatureTournament
    • FeatureSelector
    • FeatureLearning
    • ModelGenerator
    • NetworkGenerator
    • ClusteringGenerator
    • GeneticAlgorithm
    • SwarmIntelligence
    • DataMiner

1. Installation:

You can easily install Happy Learning via pip install happy_learning on every operating system.

2. Requirements:

  • ...

3. Introduction:

  • Practical Usage:

It covers all aspects of the developing process, such as feature engineering, feature and model selection as well as hyper parameter optimization.

  • Feature Engineer:

Process your tabular data smartly. The Feature Engineer module is equipped with all necessary (tabular) feature processing methods. Moreover, it is able to capture the metadata about the data set such as scaling measurement types of the features, taken processing steps, etc. To scale big data sets it generates temporary data files for each feature separately and loads them for processing purposes only.

  • Feature Learning:

It combines both the feature engineering module and the genetic algorithm module to create a reinforcement learning environment to smartly generate new features. The module creates separate learning environments for categorical and continuous features. The categorical features are one-hot encoded and then unified (one-hot merging). Whereas the (semi-) continuous features are systematically processed by using several transformation and interaction methods.

  • Feature Tournament:

Feature tournament is a process to evaluate the importance of each feature regarding to a specific target feature. It uses the concept of (Additive) Shapley Values to calculate the importance score.

-- Data Typing:

    Check whether represented data types of Pandas is equal to the real data types occuring in the data
  • Feature Selector:

The Feature Selector module applies the feature tournament to calculate feature importance scores and select automatically the best n features based on the scoring.

  • ModelGenerator:

The ModelGenerator module generates supervised machine learning models and all necessary hyper parameters for structured (tabular) data.

  -- Model / Hyper parameter:

     Classification models ...
        -> Ada Boosting (ada)
        -> Cat Boost (cat)
        -> Gradient Boosting Decision Tree (gbo)
        -> K-Nearest Neighbor (knn)
        -> Linear Discriminant Analysis (lida)
        -> Logisitic Regression (log)
        -> Quadratic Discriminant Analysis (qda)
        -> Random Forest (rf)
        -> Support-Vector Machine (svm)
        -> Nu-Support-Vector Machine (nusvm)
        -> Extreme Gradient Boosting Decision Tree (xgb)

     Regression models ...
        -> Ada Boosting (ada)
        -> Cat Boost (cat)
        -> Elastic Net (elastic)
        -> Generalized Additive Models (gam)
        -> Gradient Boosting Decision Tree (gbo)
        -> K-Nearest Neighbor (knn)
        -> Random Forest (rf)
        -> Support-Vector Machine (svm)
        -> Nu-Support-Vector Machine (nusvm)
        -> Extreme Gradient Boosting Decision Tree (xgb)
  • NetworkGenerator:

The NetworkGenerator module generates neural network architectures and all necessary hyper parameters for text data using PyTorch.

  -- Model / Hyper parameter:

     -> Attention Network (att)
     -> Gated Recurrent Unit (gru)
     -> Long-Short Term Memory (lstm)
     -> Multi-Layer Perceptron (mlp)
     -> Recurrent Neural Network (rnn)
     -> Recurrent Convolutional Neural Network (rcnn)
     -> Self-Attention (self)
     -> Transformer (trans)
  • ClusteringGenerator:

The ClusteringGenerator module generates unsupervised machine learning models and all necessary hyper parameters for text clustering.

  -- Model / Hyper parameter:

     -> Gibbs-Sampling Dirichlet Multinomial Modeling (gsdmm)
     -> Latent Dirichlet Allocation (lda)
     -> Latent Semantic Indexing (lsi)
     -> Non-Negative Matrix Factorization (nmf)
  • GeneticAlgorithm:

Reinforcement learning module either to evaluate the fittest model / hyper parameter configuration or to engineer (tabular) features. It captures several evaluation statistics regarding the evolution process as well as the model performance metrics. More over, it is able to transfer knowledge across re-trainings.

-- Model / Hyperparameter Optimization:

    Optimize model / hyper parameter selection ...
        -> Sklearn models
        -> Popular "stand alone" models like XGBoost, CatBoost, etc.
        -> Deep Learning models (using PyTorch only)
        -> Text clustering models (document & short-text)

-- Feature Engineering / Selection:

    Optimize feature engineering / selection using processing methods from Feature Engineer module ...
        -> Choose only features of fittest models to apply feature engineering based on the action space of the Feature Engineer module
  • SwarmIntelligence:

Reinforcement learning module either to evaluate the fittest model / hyper parameter configuration or to engineer (tabular) features. It captures several evaluation statistics regarding the evolution process as well as the model performance metrics. More over, it is able to transfer knowledge across re-trainings.

-- Model / Hyper parameter Optimization:

    Optimize model / hyper parameter selection ...
        -> Sklearn models
        -> Popular "stand alone" models like XGBoost, CatBoost, etc.
        -> Deep Learning models (using PyTorch only)
        -> Text clustering models (document & short-text)

-- Feature Engineering / Selection:

    Optimize feature engineering / selection using processing methods from Feature Engineer module ...
        -> Choose only features of fittest models to apply feature engineering based on the action space of the Feature Engineer module
  • DataMiner:

Combines all modules for handling structured (tabular) data sets. Therefore, it uses the ... -> Feature Engineer module to pre-process data in general (imputation, label encoding, date feature processing, etc.) -> Feature Learning module to smartly engineer tabular features -> Feature Selector module to select the most important features -> GeneticAlgorithm / SwarmIntelligence module to find a proper model and hyper parameter configuration by its self.

  • TextMiner

Use text data (natural language) by generating various numerical features describing the text

-- Segmentation:

    Categorize potential text features into following segments ...
        -> Web features
            1) URL
            2) EMail
        -> Enumerated features
        -> Natural language (original text features)
        -> Identifier (original id features)
        -> Unknown

-- Simple text processing:
    Apply simple processing methods to text features
        -> Merge two text features by given separator
        -> Replace occurances
        -> Subset data set or feature list by given string

-- Language methods:
    Apply methods to ...
        -> ... detect language in text
        -> ... translate using Google Translate under the hood

-- Generate linguistic features:
    Apply semantic text processing to generate numeric features
        -> Clean text counter (text after removing stop words, punctuation and special character and lemmatizing)
        -> Part-of-Speech Tagging counter & labels
        -> Named Entity Recognition counter & labels
        -> Dependencies counter & labels (Tree based / Noun Chunks)
        -> Emoji counter & labels

-- Generate similarity / clustering features:
    Apply similarity methods to generate continuous features using word embeddings
        -> TF-IDF

4. Documentation & Examples:

Check the methodology.pdf for the documentation and jupyter notebook for examples. Happy ;) Learning

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

happy_learning-0.4.6.tar.gz (240.7 kB view details)

Uploaded Source

Built Distributions

happy_learning-0.4.6-py3.7.egg (494.8 kB view details)

Uploaded Source

happy_learning-0.4.6-py3-none-any.whl (229.8 kB view details)

Uploaded Python 3

File details

Details for the file happy_learning-0.4.6.tar.gz.

File metadata

  • Download URL: happy_learning-0.4.6.tar.gz
  • Upload date:
  • Size: 240.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.3 readme-renderer/35.0 requests/2.28.0 requests-toolbelt/0.9.1 urllib3/1.26.9 tqdm/4.49.0 importlib-metadata/4.11.4 keyring/23.6.0 rfc3986/1.5.0 colorama/0.4.4 CPython/3.7.13

File hashes

Hashes for happy_learning-0.4.6.tar.gz
Algorithm Hash digest
SHA256 d45c43c897dd05a8b1cbfa512fc24d9a26409ee4b438c8435acd97e965831a82
MD5 9f39d807235842b7835d36ff405255b5
BLAKE2b-256 7e922d30f6f4ae28f80566b00b54054fed488bbd16b3487bc280d8edd1f39141

See more details on using hashes here.

File details

Details for the file happy_learning-0.4.6-py3.7.egg.

File metadata

  • Download URL: happy_learning-0.4.6-py3.7.egg
  • Upload date:
  • Size: 494.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.3 readme-renderer/35.0 requests/2.28.0 requests-toolbelt/0.9.1 urllib3/1.26.9 tqdm/4.49.0 importlib-metadata/4.11.4 keyring/23.6.0 rfc3986/1.5.0 colorama/0.4.4 CPython/3.7.13

File hashes

Hashes for happy_learning-0.4.6-py3.7.egg
Algorithm Hash digest
SHA256 32c50365c693591f6a6330f69cb48adf4be8a2f4c84a21a96e0d9e79e4af1fea
MD5 3de115fb3335d27f10feaae94a8a5a3f
BLAKE2b-256 882bf2f2e280550c32308f95d9eb1abff9e1457612065329d508faf61f8b236b

See more details on using hashes here.

File details

Details for the file happy_learning-0.4.6-py3-none-any.whl.

File metadata

  • Download URL: happy_learning-0.4.6-py3-none-any.whl
  • Upload date:
  • Size: 229.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.3 readme-renderer/35.0 requests/2.28.0 requests-toolbelt/0.9.1 urllib3/1.26.9 tqdm/4.49.0 importlib-metadata/4.11.4 keyring/23.6.0 rfc3986/1.5.0 colorama/0.4.4 CPython/3.7.13

File hashes

Hashes for happy_learning-0.4.6-py3-none-any.whl
Algorithm Hash digest
SHA256 dd62e941467d189ab677343c2fb78f542995b8177b45c3a6ed1ba59802106be7
MD5 90e660e39a6737c6ab139b8ef8e287f3
BLAKE2b-256 dd03c1b63d649c360ccfc99d31342e70ef8ec15c36c004b113451b14997823c3

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page