Skip to main content

General utilities to streamline data science and machine learning routines in python

Project description

Build Status codecov

JLpyUtils

Author: John T. Leonard
Repo: JLpyUtils

Custom modules/classes/methods for various data science, computer vision, and machine learning operations in python

Installing & Importing

In your command line interface (CLI):

$ pip install --upgrade JLpyUtils

After this, the package can be imported into jupyter notebook or python in general via the comman: import JLpyUtils

Modules:

JLpyUtils.ML
JLpyUtils.plot
JLpyUtils.img
JLpyUtils.video
JLpyUtils.file_utils
JLpyUtils.summary_tables
JLpyUtils.kaggle

Modules Overview

Below, we highlight several of the most interesting modules in more detail.

JLpyUtils.ML

Machine learning module for python focusing on streamlining and wrapping sklearn, xgboost, dask_ml, & tensorflow/keras functions

JLpyUtils.ML Sub-Modules:

JLpyUtils.ML.preprocessing 
JLpyUtils.ML.model_selection
JLpyUtils.ML.NeuralNet
JLpyUtils.ML.inspection
JLpyUtils.ML.postprocessing

The sub-modules within JLpyUtils.ML are summarized below:

JLpyUtils.ML.preprocessing

Functions related to preprocessing/feature engineering for machine learning

The main class of interest is the JLpyUtils.ML.preprocessing.Preprocessing_pipe class, which iterates through a standard preprocessing sequence and saves the resulting engineered data. The standard sequence is:

  1. LabelEncode.categorical_features
  2. Scale.continuous_features
    • for Scaler_ID in Scalers_dict.keys()
  3. Impute.categorical_features
    • for Imputer_cat_ID in Imputer_categorical_dict[Imputer_cat_ID].keys():
      *for Imputer_iter_class_ID in Imputer_categorical_dict[Imputer_cat_ID].keys():
  4. Imputer.continuous_features
    • for Imputer_cont_ID in Imputer_continuous_dict.keys():
      • for Imputer_iter_reg_ID in Imputer_continuous_dict[Imputer_cont_ID].keys():
  5. OneHotEncode
  6. CorrCoeffThreshold Finished!

JLpyUtils.ML.model_selection

Functions/classes for running hyperparameter searches across multiple types of models & comparing those models

The main classes of interest are the JLpyUtils.ML.model_selection.GridSearchCV class and the JLpyUtils.ML.model_selection.BayesianSearchCV class, which run hyperparameter GridSearchCV and BayesianSearchCV optimizations across different types of models & compares the results to allow one to find the best-of-best (BoB) model. The .fit functions for both these classes are compatible with evaluating sklearn models, tensorflow/keras models, and xgboost models. Check out the doc-strings for each class for additional notes on implementation.

JLpyUtils.ML.NeuralNet

sub-modules/functions/classes for streamlining common neural-net architectures implemented in tensorflow/keras.

The most notetable sub-modules are the DenseNet and Conv2D modules, which provide a keras implementation of a general dense neural network & 2D convolutional neural network, where the depth & general architecture of the network s are defined by generic hyperparameters, such that one can easily perform a grid search across multiple neural network architectures.

JLpyUtils.ML.inspection

Functions to inspect features and/or models after training

JLpyUtils.ML.postprocessing

ML model outputs postprocessing helper functions

JLpyUtils.plot

This module contains helper functions related to common plotting operations via matplotlib.

The most noteable functions are:

JLpyUtils.plot.corr_matrix(): Plot a correlation matrix chart

JLpyUtils.plot.ccorr_pareto(): Plot a pareto bar-chart for 1 label of interest within a correlation dataframe

JLpyUtils.plot.hist_or_bar(): Iterate through each column in a dataframe and plot the histogram or bar chart for the data.

JLpyUtils.img

This module contains functions/classes related to image analysis, most of which wrap SciKit image functions in some way.

The most noteable functions are:

JLpyUtils.img.auto_crop.use_edges(): Use skimage.feature.canny method to find edges in the image passed and autocrop on the outermost edges

JLpyUtils.img.decompose_video_to_img(): Use cv2 to pull out image frames from a video and save them as png files

JLpyUtils.kaggle

This module contains functions for interacting with kaggle. The simplest and most useful function is:

JLpyUtils.kaggle.competition_download_files(competition)

where competition is the competition name, such as "home-credit-default-risk"

JLpyUtils.file_utils

This module contains simple but extremely useful helper functions to save and load standard file types including 'hdf', 'csv', 'json', 'dill'. Essentially the save and load functions take care of the boiler plate operations related to saving or loading on the file-types specified above.

Example Notebooks

Basic notebook examples can be found in the (notebooks)[notebooks] folder. Some examples include:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

JLpyUtils-0.3.4.tar.gz (58.4 kB view hashes)

Uploaded Source

Built Distribution

JLpyUtils-0.3.4-py3-none-any.whl (84.3 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page