Skip to main content

diver is a series of tools to speed up common feature-set investigation, conditioning and encoding for common ML algorithms

Project description


Diver is the Dataset Inspector, Visualiser and Encoder library, automating and codifying common data science project steps as standardised and reusable methods.

See example-notebooks/house-price-demo.ipynb for a full walkthrough.


A set of functions which help perform checks for common dataset issues which can impact machine learning model performance.

inspector flow


A scikit-learn-formatted module which can perform various data-type encodings in a single go, and save the associated attributes from a train-set encoding to reuse on a test-set encoding:

  • The .fit_transform method learns various encodings (feature means and variances; categorical feature elements - yellow in the flow chart below) and then performs the various encodings on the feature train set
  • The .transform method applies train-set encodings to a test set

fit_transform flow


Functions for visualising aspects of the dataset

Correlation analysis

  • Display the correlation matrix for the top n correlating features (n specified by the user) against the dependent variable (at the bottom row of the matrix)


Future Work


  • Option for instances where there are no categorical features



  • Create a function to do this


  • is_public_holiday : bool
  • Update above diagram

Remove warnings

Make robust to non-consecutive indices in input df

Unit test all functions

Extreme values

PCA option?

Label balanced class checker (for classification problems)

Distribution and correlation analysis

  • Display correlation matrix for top n correlates alongside target at the bottom
  • Display pairplot for top n correlates alongside target at the bottom
  • Or instead of top n correlates, instead threshold of cumulative variance
  • Option to DROOP lower correlates (lower than threshold) if desired

Useful reading

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for diver, version 0.1.3
Filename, size File type Python version Upload date Hashes
Filename, size diver-0.1.3-py3-none-any.whl (28.2 kB) File type Wheel Python version py3 Upload date Hashes View hashes
Filename, size diver-0.1.3.tar.gz (25.5 kB) File type Source Python version None Upload date Hashes View hashes

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page