Skip to main content

INGOT-DR (INterpretable GrOup Testing for Drug Resistance)

Project description

INGOT-DR

INGOT-DR ( INterpretable GrOup Testing for Drug Resistance) is an interpretable rule-based predictive model base on Group Testing and Boolean Compressed Sesing. For more details and citation please see the INGOT-DR paper. To access scripts used to produce the results in the paper please visit INGOT-DR Project. To access the data used in the paper please visit/cite M.tuberculosis dataset for drug resistant.

##Table of content

Installation

INGOT-DR can be installed from PyPI.

pip install ingotdr

Usage

INGOT-DR is implemented as a scikit-learn classifier. As a result, this classifier is compatible with most of scikit-learn tools (e.g. cross validation and hyper-parameter tuning tools). In the following section, we provide some usage examples:

Arguments

ingot.INGOTClassifier( w_weight=1, lambda_p=1, lambda_z=1, lambda_e=1, false_positive_rate_upper_bound=None,
                       false_negative_rate_upper_bound=None, max_rule_size=None, rounding_threshold=1e-5,
                       lp_relaxation=False, only_slack_lp_relaxation=False, lp_rounding_threshold=0,
                       is_it_noiseless=False, solver_name='PULP_CBC_CMD', solver_options=None)
Name Type Description Default
w_weight vector, float A vector, float to provide prior weight to w. 1.0
lambda_p float Regularization coefficient for positive labels. 1.0
lambda_z float Regularization coefficient for negative/zero labels. 1.0
lambda_e float Regularization coefficient for all slack variables. 1.0
false_positive_rate_upper_bound float False positive rate (FPR) upper bound. None
false_negative_rate_upper_bound float False negative rate(FNR) upper bound. None
max_rule_size int Maximum rule size. None
rounding_threshold float Threshold for ILP solutions for Rounding to 0 and 1. 1e-5
lp_relaxation bool A flag to use the lp relaxed version. False
only_slack_lp_relaxation bool A flag to only use the lp relaxed slack variables. False
lp_rounding_threshold float Threshold for lp solutions for Rounding to 0 and 1. Range from 0 to 1. 0.0
is_it_noiseless bool A flag to specify whether the problem is noisy or noiseless. False
solver_name str Solver's name provided by Pulp. 'PULP_CBC_CMD'
solver_options dict Solver's options provided by Pulp. None

Methods

Method Description
fit(X,y) Fit the model with respect to the given data.
get_params_dictionary(variable_type='w') Provide a dictionary of individuals with their status obtained by decoder. Type of the variable.e.g. 'w', 'ep' or 'en'
solution() Provide a vector of binary features importance. i.e. 1 if feature was used in the model 0 otherwise.
predict(X) Provide a predicted labels for X.
score(X,y) Provide the accuracy of self.predict(X) with respect to y
learned_rule(return_type='feature_name') Return a list of rules. return_type can be 'feature_name' or 'feature_id'.
write(fileType='mps', **kwargs) Create a file from the problem. fileType can be 'mps', 'lp', 'json' or 'display'. 'display' shows the ILP/LP problem on screen.

Training and evaluation

Example: The following is an example of training a classifier to predict resistance to second line drug Ciprofloxacin in TB isolates. In this example the feature matrix indicates presence/absence of SNPs in TB isolates, and the label vector represents the drug resistance phenotype. Sample data is available here.

from sklearn.model_selection import train_test_split
from sklearn.metrics import balanced_accuracy_score
import pandas as pd
import ingot

feature_matrix = 'SNPsMatrix_ciprofloxacin.csv'
label_vector =  'ciprofloxacinLabel.csv'

X = pd.read_csv(feature_matrix, index_col=0)
y = pd.read_csv(label_vector, index_col=0).to_numpy().ravel()

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=33, test_size=0.2, stratify=y)

clf = ingot.INGOTClassifier(lambda_p=10, lambda_z=0.01, false_positive_rate_upper_bound=0.1,
                            max_rule_size=20, solver_name='CPLEX_PY')
clf.fit(X_train,y_train)

y_pred = clf.predict(X_test)

print("Balanced accuracy: {}".format(balanced_accuracy_score(y_test, y_pred)))
print("Accuracy: {}".format(clf.score(X_test,y_test)))
print("Features in the learned rule: {}".format(clf.learned_rule()))

Output:

Note: Results may slightly vary for different solvers. Please see Choosing the solver.

Balanced accuracy: 0.8449477351916377
Accuracy: 0.9550561797752809
Features in the learned rule: ['7570, C, T', '7572, T, C', '7581, G, T', '7582, A, C', '7582, A, G']

Hyper-parameter tuning

Hyper-parameter tuning via scikit-learn Grid Search CV:

Example:

from sklearn.model_selection import train_test_split
from sklearn.metrics import balanced_accuracy_score
import pandas as pd
import ingot
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import make_scorer

feature_matrix = 'SNPsMatrix_ciprofloxacin.csv'
label_vector =  'ciprofloxacinLabel.csv'

X = pd.read_csv(feature_matrix, index_col=0)
y = pd.read_csv(label_vector, index_col=0).to_numpy().ravel()

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=33, test_size=0.2, stratify=y)

clf = ingot.INGOTClassifier(false_positive_rate_upper_bound=0.1, max_rule_size=20, solver_name='CPLEX_PY',
                            solver_options={'timeLimit': 1800})

scoring = dict(Accuracy='accuracy', balanced_accuracy=make_scorer(balanced_accuracy_score))
param_grid={'lambda_p': [ 1, 10, 100 ], 'lambda_z': [ 0.01, 0.1, 1 ]}
grid = GridSearchCV(estimator=clf, param_grid= param_grid, scoring=scoring, cv=5, refit ='balanced_accuracy',
                    n_jobs=-1, verbose= 3)
grid.fit(X_train, y_train)

y_pred = grid.predict(X_test)

print("Balanced accuracy: {}".format(balanced_accuracy_score(y_test, y_pred)))
print('Best params: {}'.format(grid.best_params_))

Output:

Balanced accuracy: 0.8449477351916377
Best params: {'lambda_p': 10, 'lambda_z': 0.01}

Optimizing for different target metric

Note: w_weight and lambda_e are not part of the main ILP (Eq (11)) defined in the INGOT-DR paper. These two variables are defined to provide freedom when Optimizing for different target metric (section 1.4) is needed. The complete objective function with these two variables would be:

complete objective function

Example: Classifier corresponding to Eq (16) with maximum rule size k=20 and specificity lower bound t= 90% can be defined as following:

clf = ingot.INGOTClassifier(w_weight=0, lambda_z=0, false_positive_rate_upper_bound=0.1, max_rule_size=20,
                            solver_name='CPLEX_PY')

The following table shows the combination of arguments needed to define some of ILPs in the paper

lp_relaxation only_slack_lp_relaxation is_it_noiseless Equation number in the paper
False False False Eq (11)
False True True Eq (3)
False True False Eq (4) with objective function of Eq (11)
False False True Eq (3)
True True False LP relaxation of Eq (4) with objective function of Eq (11)
True False False LP relaxation of Eq (4) with objective function of Eq (11)
True False True LP relaxation of Eq (3)
True True True LP relaxation of Eq (3)

Note: True value of lp_relaxation or is_it_noiseless with override only_slack_lp_relaxation. i.e. if one of them is True then value of only_slack_lp_relaxation is not important.

Note: To recreate and work with Eq (4), you only need to use combination in row 3 and use or tune lambda_e instead of lambda_p and lambda_z. For example:

param_grid={'lambda_e': [0.01, 0.1,  1, 10, 100 ]}
grid = GridSearchCV(estimator=clf, param_grid= param_grid, scoring=scoring, cv=5, refit ='balanced_accuracy',
                    n_jobs=-1, verbose= 3)

Choosing the solver

INGOT-DR supports a variety of solvers through the PuLP application programming interface (API). Solvers such as GLPK, COIN-OR CLP/CBC, CPLEX, GUROBI, MOSEK, XPRESS, CHOCO, MIPCL, SCIP.

List of available solvers on your machine:

import pulp as pl
solver_list = pl.listSolvers(onlyAvailable=True)

Name and properties of the solver can be specified via solver_name and solver_options. e.g:

clf = ingot.INGOTClassifier(solver_name='CPLEX_PY', solver_options={'timeLimit': 1800})

In the INGOT-DR paper, 'CPLEX_PY' is the main solver. Results may slightly vary for different solvers. IBM CPLEX for academic use is available here.

Citation:

For general use please cite our paper: INGOT-DR: an interpretable classifier forpredicting drug resistance in M. tuberculosis. (bibtex)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ingotdr-0.0.5.tar.gz (11.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ingotdr-0.0.5-py3-none-any.whl (9.5 kB view details)

Uploaded Python 3

File details

Details for the file ingotdr-0.0.5.tar.gz.

File metadata

  • Download URL: ingotdr-0.0.5.tar.gz
  • Upload date:
  • Size: 11.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.5.0.1 requests/2.22.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.4

File hashes

Hashes for ingotdr-0.0.5.tar.gz
Algorithm Hash digest
SHA256 620452c8b6f2250f7e035d95a656d4a604382c5dc3848c9d6783f35266a04714
MD5 3795349056ef68fce0cc4cfcab514b96
BLAKE2b-256 038f02ae30aca0b786fdb07f9762cdde5547ebc1278673f1f19e6761dde3bba6

See more details on using hashes here.

File details

Details for the file ingotdr-0.0.5-py3-none-any.whl.

File metadata

  • Download URL: ingotdr-0.0.5-py3-none-any.whl
  • Upload date:
  • Size: 9.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.5.0.1 requests/2.22.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.4

File hashes

Hashes for ingotdr-0.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 9c278818835a0bc1b6e80612f82ade26d6dec1fa62521698e9164e5cd47f5a59
MD5 8af24a7ab0cafe7c39fd4d02b18f9680
BLAKE2b-256 f42b0454dfce1c15c39f10acbcf7b540d288afe26e596f62e4bebc3e487f8596

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page