Package for ML model analysis

These details have not been verified by PyPI

Project links

Homepage

Project description

Unittests

Pytolemaic

What is Pytolemaic

Pytolemaic package analyzes your model and dataset and measure their quality.

The package supports classification/regression models built for tabular datasets (e.g. sklearn's regressors/classifiers), but will also support custom made models as long as they implement sklearn's API.

The package is aimed for personal use and comes with no guarantees. I hope you will find it useful. I will appreciate any feedback you have.

Install

Pytolemaic package may be installed using pip:

pip install pytolemaic

supported features

The package contains the following functionalities:

On model creation

Dataset Analysis: Analysis aimed to detect issues in the dataset.
Sensitivity Analysis: Calculation of feature importance for given model, either via sensitivity to feature value or sensitivity to missing values.
Vulnerability report: Based on the feature sensitivity we measure model's vulnerability in respect to imputation, leakage, and # of features.
Scoring report: Report model's score on test data with confidence interval.
separation quality: Measure whether train and test data comes from the same distribution.
Overall quality: Provides overall quality measures

On prediction

Prediction uncertainty: Provides an uncertainty measure for given model's prediction.
Lime explanation: Provides Lime explanation for sample of interest.

How to use:

   pytrust = PyTrust(
       model=estimator,
       xtrain=xtrain, ytrain=ytrain,
       xtest=xtest, ytest=ytest)

   # dataset analysis report
   dataset_analysis_report = pytrust.dataset_analysis_report

   # feature sensitivity report
   sensitivity_report = pytrust.sensitivity_report

   # model's performance report
   scoring_report = pytrust.scoring_report

   # overall model's quality report
   quality_report = pytrust.quality_report

   for report in [dataset_analysis_report, sensitivity_report, scoring_report, quality_report]:
       report.plot() # plot graphs
       pprint(report.to_dict(printable=True)) # export report as a dictionary
       pprint(report.to_dict_meaning()) # print documentation for above dictionary


   # Insights & issues discovered in your data/model
   insights = pytrust.insights

   # estimate uncertainty of a prediction
   uncertainty_model = pytrust.create_uncertainty_model()

   # explain a prediction with Lime
   create_lime_explainer = pytrust.create_lime_explainer()

Examples on toy dataset can be found in /examples/toy_examples/ Examples on 'real-life' datasets can be found in /examples/interesting_examples/

Output examples:

Sensitivity Analysis:

The sensitivity of each feature ([0,1], normalized to sum of 1):

 'sensitivity_report': {
    'method': 'shuffled',
    'sensitivities': {
        'age': 0.12395,
        'capital-gain': 0.06725,
        'capital-loss': 0.02465,
        'education': 0.05769,
        'education-num': 0.13765,
        ...
      }
  }

Simple statistics on the feature sensitivity:

'shuffle_stats_report': {
     'n_features': 14,
     'n_low': 1,
     'n_zero': 0
}

Naive vulnerability scores ([0,1], lower is better):
- Imputation: sensitivity of the model to missing values.
- Leakge: chance of the model to have leaking features.
- Too many features: Whether the model is based on too many features.

'vulnerability_report': {
     'imputation': 0.35,
     'leakage': 0,
     'too_many_features': 0.14
}

scoring report

For given metric, the score and confidence intervals (CI) is calculated

'recall': {
    'ci_high': 0.763,
    'ci_low': 0.758,
    'ci_ratio': 0.023,
    'metric': 'recall',
    'value': 0.760,
},
'auc': {
    'ci_high': 0.909,
    'ci_low': 0.907,
    'ci_ratio': 0.022,
    'metric': 'auc',
    'value': 0.907
}

Additionally, score quality measures the quality of the score based on the separability (auc score) between train and test sets.

Value of 1 means test set has same distribution as train set. Value of 0 means test set has fundamentally different distribution.

'separation_quality': 0.00611

Combining the above measures into a single number we provide the overall quality of the model/dataset.

Higher quality value ([0,1]) means better dataset/model.

quality_report : { 
'model_quality_report': {
   'model_loss': 0.24,
   'model_quality': 0.41,
   'vulnerability_report': {...}},

'test_quality_report': {
   'ci_ratio': 0.023, 
   'separation_quality': 0.006, 
   'test_set_quality': 0},

'train_quality_report': {
   'train_set_quality': 0.85,
   'vulnerability_report': {...}}

prediction uncertainty

The module can be used to yield uncertainty measure for predictions.

    uncertainty_model = pytrust.create_uncertainty_model(method='confidence')
    predictions = uncertainty_model.predict(x_pred) # same as model.predict(x_pred)
    uncertainty = uncertainty_model.uncertainty(x_pred)

Lime explanation

The module can be used to produce lime explanations for sample of interest.

    explainer = pytrust.create_lime_explainer()
    explainer.explain(sample) # returns a dictionary
    explainer.plot(sample) # produce a graphical explanation

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

0.15.4

Jun 19, 2022

0.14.1

May 29, 2022

0.14.0

May 29, 2022

0.13.14

May 2, 2022

0.13.13

Apr 25, 2022

0.13.12

Jan 24, 2022

0.13.10

Jan 19, 2022

0.13.9

Dec 14, 2021

0.13.8

Oct 10, 2021

0.13.7

Sep 5, 2021

0.13.6

Aug 17, 2021

0.13.5

Aug 17, 2021

0.13.4

Aug 17, 2021

0.13.2

Aug 10, 2021

0.13.1

Aug 3, 2021

0.13.0

Aug 3, 2021

0.12.13

Jul 29, 2021

0.12.12

Apr 22, 2021

0.12.11

Apr 22, 2021

0.12.10

Mar 21, 2021

0.12.8

Mar 8, 2021

0.12.7

Mar 8, 2021

0.12.5

Mar 7, 2021

0.12.3

Mar 2, 2021

0.12.2

Feb 25, 2021

0.12.1

Feb 24, 2021

0.11.15

Feb 22, 2021

This version

0.11.14

Feb 22, 2021

0.11.13

Sep 14, 2020

0.11.10

Aug 5, 2020

0.11.8

May 19, 2020

0.11.7

May 19, 2020

0.11.4

May 19, 2020

0.11.2

Apr 27, 2020

0.11.0

Apr 18, 2020

0.10.5

Apr 2, 2020

0.10.4

Mar 21, 2020

0.10.3

Mar 19, 2020

0.10.1

Mar 1, 2020

0.10.0

Feb 23, 2020

0.9.4

Feb 19, 2020

0.9.2

Feb 19, 2020

0.9.1

Feb 17, 2020

0.9.0

Feb 16, 2020

0.8.11

Feb 14, 2020

0.8.10

Feb 14, 2020

0.8.9

Feb 14, 2020

0.8.8

Feb 14, 2020

0.8.7

Feb 14, 2020

0.8.6

Feb 14, 2020

0.8.5

Feb 14, 2020

0.8.2

Feb 7, 2020

0.8

Feb 6, 2020

0.7

Jan 9, 2020

0.5

Dec 2, 2019

0.4

Dec 2, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pytolemaic-0.11.14.tar.gz (56.5 kB view details)

Uploaded Feb 22, 2021 Source

Built Distribution

pytolemaic-0.11.14-py3-none-any.whl (82.3 kB view details)

Uploaded Feb 22, 2021 Python 3

File details

Details for the file pytolemaic-0.11.14.tar.gz.

File metadata

Download URL: pytolemaic-0.11.14.tar.gz
Upload date: Feb 22, 2021
Size: 56.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.57.0 CPython/3.9.1

File hashes

Hashes for pytolemaic-0.11.14.tar.gz
Algorithm	Hash digest
SHA256	`a33e4c91e726ecb34f1d765265098ea7bce5ca256aa023a20b221095d7cd7393`
MD5	`2b4cb7a4eceb8ae3c43914108ad86c2d`
BLAKE2b-256	`89d33d39b41e81379ede71ca6676c79ff8bf9e918f8a78902c40607cf5516d22`

See more details on using hashes here.

Provenance

File details

Details for the file pytolemaic-0.11.14-py3-none-any.whl.

File metadata

Download URL: pytolemaic-0.11.14-py3-none-any.whl
Upload date: Feb 22, 2021
Size: 82.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.57.0 CPython/3.9.1

File hashes

Hashes for pytolemaic-0.11.14-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3cf873843b730fb55f15f2df8bdcce611b8bdd68cf99ffd47b1392f03a0091c7`
MD5	`5e5107d383ba8b07bd83b8c04fa19d9a`
BLAKE2b-256	`4f30f166e08985d82df2c6ec2ea4e547f444f33019d44093b264047997266345`