Skip to main content

diver is a series of tools to speed up common feature-set investigation, conditioning and encoding for common ML algorithms

Project description


Diver is the Dataset Inspector, Visualiser and Encoder library, automating and codifying common data science project steps as standardised and reusable methods.

See example-notebooks/house-price-demo.ipynb for a full walkthrough.


A set of functions which help perform checks for common dataset issues which can impact machine learning model performance.

inspector flow


A scikit-learn-formatted module which can perform various data-type encodings in a single go, and save the associated attributes from a train-set encoding to reuse on a test-set encoding:

  • The .fit_transform method learns various encodings (feature means and variances; categorical feature elements - yellow in the flow chart below) and then performs the various encodings on the feature train set
  • The .transform method applies train-set encodings to a test set

fit_transform flow


Functions for visualising aspects of the dataset

Correlation analysis

  • Display the correlation matrix for the top n correlating features (n specified by the user) against the dependent variable (at the bottom row of the matrix)


Future Work


  • Option for instances where there are no categorical features



  • Create a function to do this


  • is_public_holiday : bool
  • Update above diagram

Remove warnings

Make robust to non-consecutive indices in input df

Unit test all functions

Extreme values

PCA option?

Label balanced class checker (for classification problems)

Distribution and correlation analysis

  • Display correlation matrix for top n correlates alongside target at the bottom
  • Display pairplot for top n correlates alongside target at the bottom
  • Or instead of top n correlates, instead threshold of cumulative variance
  • Option to DROOP lower correlates (lower than threshold) if desired

Useful reading

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

diver-0.2.3.tar.gz (29.4 kB view hashes)

Uploaded Source

Built Distribution

diver-0.2.3-py3-none-any.whl (33.0 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page