Skip to main content

Package for ML model analysis

Project description

Unittests

Pytolemaic

What is Pytolemaic

Pytolemaic package analyzes your model and dataset and measure their quality.

The package supports classification/regression models built for tabular datasets (e.g. sklearn's regressors/classifiers), but will also support custom made models as long as they implement sklearn's API.

The package is aimed for personal use and comes with no guarantees. I hope you will find it useful. I will appreciate any feedback you have.

supported features

The package contains the following functionalities:

On model creation

  • Sensitivity Analysis: Calculation of feature importance for given model, either via sensitivity to feature value or sensitivity to missing values.
  • Vulnerability report: based on the feature sensitivity we measure model's vulnerability in respect to imputation, leakage, and # of features.
  • Scoring report: Report model's score on test data with confidence interval.
  • separation quality: Measure whether train and test data comes from the same distribution.
  • Overall quality: Provides overall quality measures

On prediction

  • Prediction uncertainty: Provides an uncertainty measure for given model's prediction.
  • Lime explanation: Provides Lime explanation for sample of interest.

How to use:

Examples on toy dataset can be found in /examples/toy_examples/ Examples on 'real-life' datasets can be found in /examples/interesting_examples/

Output examples:

Sensitivity Analysis:

  • The sensitivity of each feature ([0,1], normalized to sum of 1):
 'sensitivity_report': {
    'method': 'shuffled',
    'sensitivities': {
        'age': 0.12395,
        'capital-gain': 0.06725,
        'capital-loss': 0.02465,
        'education': 0.05769,
        'education-num': 0.13765,
        ...
      }
  }
  • Simple statistics on the feature sensitivity:
'shuffle_stats_report': {
     'n_features': 14,
     'n_low': 1,
     'n_zero': 0
}
  • Naive vulnerability scores ([0,1], lower is better):

    • Imputation: sensitivity of the model to missing values.
    • Leakge: chance of the model to have leaking features.
    • Too many features: Whether the model is based on too many features.
'vulnerability_report': {
     'imputation': 0.35,
     'leakage': 0,
     'too_many_features': 0.14
}  

scoring report

For given metric, the score and confidence intervals (CI) is calculated

'recall': {
    'ci_high': 0.763,
    'ci_low': 0.758,
    'ci_ratio': 0.023,
    'metric': 'recall',
    'value': 0.760,
},
'auc': {
    'ci_high': 0.909,
    'ci_low': 0.907,
    'ci_ratio': 0.022,
    'metric': 'auc',
    'value': 0.907
}    

Additionally, score quality measures the quality of the score based on the separability (auc score) between train and test sets.

Value of 1 means test set has same distribution as train set. Value of 0 means test set has fundamentally different distribution.

'separation_quality': 0.00611         

Combining the above measures into a single number we provide the overall quality of the model/dataset.

Higher quality value ([0,1]) means better dataset/model.

quality_report : { 
'model_quality_report': {
   'model_loss': 0.24,
   'model_quality': 0.41,
   'vulnerability_report': {...}},

'test_quality_report': {
   'ci_ratio': 0.023, 
   'separation_quality': 0.006, 
   'test_set_quality': 0},

'train_quality_report': {
   'train_set_quality': 0.85,
   'vulnerability_report': {...}}

prediction uncertainty

The module can be used to yield uncertainty measure for predictions.

    uncertainty_model = pytrust.create_uncertainty_model(method='confidence')
    predictions = uncertainty_model.predict(x_pred) # same as model.predict(x_pred)
    uncertainty = uncertainty_model.uncertainty(x_pred)

Lime explanation

The module can be used to produce lime explanations for sample of interest.

    explainer = pytrust.create_lime_explainer()
    explainer.explain(sample) # returns a dictionary
    explainer.plot(sample) # produce a graphical explanation    

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pytolemaic-0.8.11.tar.gz (37.0 kB view details)

Uploaded Source

Built Distribution

pytolemaic-0.8.11-py3-none-any.whl (57.1 kB view details)

Uploaded Python 3

File details

Details for the file pytolemaic-0.8.11.tar.gz.

File metadata

  • Download URL: pytolemaic-0.8.11.tar.gz
  • Upload date:
  • Size: 37.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.8.1

File hashes

Hashes for pytolemaic-0.8.11.tar.gz
Algorithm Hash digest
SHA256 76e9646253a40e6c47400cc5bc7aacfd3496a73852b85a59d6a1cd1e4eee2cd2
MD5 f9956f90a053296545d84e737d1be2e4
BLAKE2b-256 b0d451ee702208911abae9d6397e00bf8e33b24021edc86441f9cd915a235aa8

See more details on using hashes here.

Provenance

File details

Details for the file pytolemaic-0.8.11-py3-none-any.whl.

File metadata

  • Download URL: pytolemaic-0.8.11-py3-none-any.whl
  • Upload date:
  • Size: 57.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.8.1

File hashes

Hashes for pytolemaic-0.8.11-py3-none-any.whl
Algorithm Hash digest
SHA256 edae13b32efcd4be4ad3c5cd40b7491b011885eab75aa18ab3d6ef18fba17fff
MD5 5f63df5e599067cdcdb9f7fa45fb9dea
BLAKE2b-256 27cf7a096171d68a59c2d86e2ad9e7b85d55786f5c1a5e40eedfd10e311614ae

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page