General utilities to streamline data science and machine learning routines in python
Project description
JLpyUtils
Author: John T. Leonard
Repo: JLpyUtils
Custom modules/classes/methods for various data science, computer vision, and machine learning operations in python
Installing & Importing
In your command line interface (CLI):
$ pip install --upgrade JLpyUtils
After this, the package can be imported into jupyter notebook or python in general via the comman:
import JLpyUtils
Modules:
JLpyUtils.ML
JLpyUtils.plot
JLpyUtils.img
JLpyUtils.video
JLpyUtils.file_utils
JLpyUtils.summary_tables
JLpyUtils.kaggle
Modules Overview
Below, we highlight several of the most interesting modules in more detail.
JLpyUtils.ML
Machine learning module for python focusing on streamlining and wrapping sklearn, xgboost, dask_ml, & tensorflow/keras functions
JLpyUtils.ML Sub-Modules:
JLpyUtils.ML.preprocessing
JLpyUtils.ML.model_selection
JLpyUtils.ML.NeuralNet
JLpyUtils.ML.inspection
JLpyUtils.ML.postprocessing
The sub-modules within JLpyUtils.ML are summarized below:
JLpyUtils.ML.preprocessing
Functions related to preprocessing/feature engineering for machine learning
The main class of interest is the JLpyUtils.ML.preprocessing.Preprocessing_pipe
class, which iterates through a standard preprocessing sequence and saves the resulting engineered data. The standard sequence is:
- LabelEncode.categorical_features
- Scale.continuous_features
- for Scaler_ID in Scalers_dict.keys()
- Impute.categorical_features
- for Imputer_cat_ID in Imputer_categorical_dict[Imputer_cat_ID].keys():
*for Imputer_iter_class_ID in Imputer_categorical_dict[Imputer_cat_ID].keys():
- for Imputer_cat_ID in Imputer_categorical_dict[Imputer_cat_ID].keys():
- Imputer.continuous_features
- for Imputer_cont_ID in Imputer_continuous_dict.keys():
- for Imputer_iter_reg_ID in Imputer_continuous_dict[Imputer_cont_ID].keys():
- for Imputer_cont_ID in Imputer_continuous_dict.keys():
- OneHotEncode
- CorrCoeffThreshold Finished!
JLpyUtils.ML.model_selection
Functions/classes for running hyperparameter searches across multiple types of models & comparing those models
The main classes of interest are the JLpyUtils.ML.model_selection.GridSearchCV
class and the JLpyUtils.ML.model_selection.BayesianSearchCV
class, which run hyperparameter GridSearchCV and BayesianSearchCV optimizations across different types of models & compares the results to allow one to find the best-of-best (BoB) model. The .fit
functions for both these classes are compatible with evaluating sklearn models, tensorflow/keras models, and xgboost models. Check out the doc-strings for each class for additional notes on implementation.
JLpyUtils.ML.NeuralNet
sub-modules/functions/classes for streamlining common neural-net architectures implemented in tensorflow/keras.
The most notetable sub-modules are the DenseNet
and Conv2D
modules, which provide a keras implementation of a general dense neural network & 2D convolutional neural network, where the depth & general architecture of the network s are defined by generic hyperparameters, such that one can easily perform a grid search across multiple neural network architectures.
JLpyUtils.ML.inspection
Functions to inspect features and/or models after training
JLpyUtils.ML.postprocessing
ML model outputs postprocessing helper functions
JLpyUtils.plot
This module contains helper functions related to common plotting operations via matplotlib.
The most noteable functions are:
JLpyUtils.plot.corr_matrix()
: Plot a correlation matrix chart
JLpyUtils.plot.ccorr_pareto()
: Plot a pareto bar-chart for 1 label of interest within a correlation dataframe
JLpyUtils.plot.hist_or_bar()
: Iterate through each column in a dataframe and plot the histogram or bar chart for the data.
JLpyUtils.img
This module contains functions/classes related to image analysis, most of which wrap SciKit image functions in some way.
The most noteable functions are:
JLpyUtils.img.auto_crop.use_edges()
: Use skimage.feature.canny method to find edges in the image passed and autocrop on the outermost edges
JLpyUtils.img.decompose_video_to_img()
: Use cv2 to pull out image frames from a video and save them as png files
JLpyUtils.kaggle
This module contains functions for interacting with kaggle. The simplest and most useful function is:
JLpyUtils.kaggle.competition_download_files(competition)
where competition
is the competition name, such as "home-credit-default-risk"
JLpyUtils.file_utils
This module contains simple but extremely useful helper functions to save and load standard file types including 'hdf', 'csv', 'json', 'dill'. Essentially the save
and load
functions take care of the boiler plate operations related to saving or loading on the file-types specified above.
Example Notebooks
Basic notebook examples can be found in the (notebooks)[notebooks] folder. Some examples include:
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file JLpyUtils-0.3.4.tar.gz
.
File metadata
- Download URL: JLpyUtils-0.3.4.tar.gz
- Upload date:
- Size: 58.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.1 requests-toolbelt/0.9.1 tqdm/4.38.0 CPython/3.7.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4844fb3d5b68e1c908b363b329b4e178f5f67144ac97924569ab8a1b234e3a43 |
|
MD5 | d6abffcf05e1eb6f95999e6d0fc1d846 |
|
BLAKE2b-256 | 35ceda63cc96a664a54527e78c54a7deedaff472218806408d5019d153f0586e |
File details
Details for the file JLpyUtils-0.3.4-py3-none-any.whl
.
File metadata
- Download URL: JLpyUtils-0.3.4-py3-none-any.whl
- Upload date:
- Size: 84.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.1 requests-toolbelt/0.9.1 tqdm/4.38.0 CPython/3.7.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9944e24b8895a77774a51c8bb3eedda5715a6b4f458298ea5e9950bf841c2522 |
|
MD5 | e7b4e2709980110821f791b8b3d1b398 |
|
BLAKE2b-256 | ce13f192c654ac50c1f0bcf74a1d852e70ea924729b9fdf6c4116bf6469cca55 |