Skip to main content

General utilities to streamline data science and machine learning routines in python

Project description

Build Status codecov

Python Data Science Library (pyDSlib)

Author: John T. Leonard
Repo: pyDSlib

Custom modules/classes/methods for various data science, computer vision, and machine learning operations in python

Installing & Importing

In your command line interface (CLI):

$ pip install --upgrade pyDSlib

After this, the package can be imported into jupyter notebook or python in general via the comman: import pyDSlib



Modules Overview

Below, we highlight several of the most interesting modules in more detail.


Machine learning module for python focusing on streamlining and wrapping sklearn, xgboost, dask_ml, & tensorflow/keras functions

pyDSlib.ML Sub-Modules:


The sub-modules within pyDSlib.ML are summarized below:


Functions related to preprocessing/feature engineering for machine learning

The main class of interest is the pyDSlib.ML.preprocessing.Preprocessing_pipe class, which iterates through a standard preprocessing sequence and saves the resulting engineered data. The standard sequence is:

  1. LabelEncode.categorical_features
  2. Scale.continuous_features
    • for Scaler_ID in Scalers_dict.keys()
  3. Impute.categorical_features
    • for Imputer_cat_ID in Imputer_categorical_dict[Imputer_cat_ID].keys():
      *for Imputer_iter_class_ID in Imputer_categorical_dict[Imputer_cat_ID].keys():
  4. Imputer.continuous_features
    • for Imputer_cont_ID in Imputer_continuous_dict.keys():
      • for Imputer_iter_reg_ID in Imputer_continuous_dict[Imputer_cont_ID].keys():
  5. OneHotEncode
  6. CorrCoeffThreshold Finished!


Functions/classes for running hyperparameter searches across multiple types of models & comparing those models

The main classes of interest are the pyDSlib.ML.model_selection.GridSearchCV class and the pyDSlib.ML.model_selection.BayesianSearchCV class, which run hyperparameter GridSearchCV and BayesianSearchCV optimizations across different types of models & compares the results to allow one to find the best-of-best (BoB) model. The .fit functions for both these classes are compatible with evaluating sklearn models, tensorflow/keras models, and xgboost models. Check out the doc-strings for each class for additional notes on implementation.


sub-modules/functions/classes for streamlining common neural-net architectures implemented in tensorflow/keras.

The most notetable sub-modules are the DenseNet and Conv2D modules, which provide a keras implementation of a general dense neural network & 2D convolutional neural network, where the depth & general architecture of the network s are defined by generic hyperparameters, such that one can easily perform a grid search across multiple neural network architectures.


Functions to inspect features and/or models after training


ML model outputs postprocessing helper functions


This module contains helper functions related to common plotting operations via matplotlib.

The most noteable functions are:

pyDSlib.plot.corr_matrix(): Plot a correlation matrix chart

pyDSlib.plot.ccorr_pareto(): Plot a pareto bar-chart for 1 label of interest within a correlation dataframe

pyDSlib.plot.hist_or_bar(): Iterate through each column in a dataframe and plot the histogram or bar chart for the data.


This module contains functions/classes related to image analysis, most of which wrap SciKit image functions in some way.

The most noteable functions are:

pyDSlib.img.auto_crop.use_edges(): Use skimage.feature.canny method to find edges in the image passed and autocrop on the outermost edges

pyDSlib.img.decompose_video_to_img(): Use cv2 to pull out image frames from a video and save them as png files


This module contains functions for interacting with kaggle. The simplest and most useful function is:


where competition is the competition name, such as "home-credit-default-risk"


This module contains simple but extremely useful helper functions to save and load standard file types including 'hdf', 'csv', 'json', 'dill'. Essentially the save and load functions take care of the boiler plate operations related to saving or loading on the file-types specified above.

Example Notebooks

Basic notebook examples can be found in the (notebooks)[notebooks] folder. Some examples include:

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for pyDSlib, version 0.3.5
Filename, size File type Python version Upload date Hashes
Filename, size pyDSlib-0.3.5-py3-none-any.whl (138.0 kB) File type Wheel Python version py3 Upload date Hashes View
Filename, size pyDSlib-0.3.5.tar.gz (117.7 kB) File type Source Python version None Upload date Hashes View

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring DigiCert DigiCert EV certificate Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page