Tools to compute and visualize cohort-attention tied entropy, including CATSensitivity, CATSpecificity, and their combination CATmean
Project description
This is a package to compute cohort-attention tied entropy (CATE), in three forms CATSensitivity, CATSpecificity, and their combination -- CATmean, given the particular cohorts of interest. CATSensitivity and CATSpecificity are, in effect, an enhancement of Specificity and Sensitivity. The package gives the detailed measure wrt every of cohorts as well.
Cohort-attention Tied Entropy
CATSensitivity is given by
and CATSpecificity by
- Minimum = 0 when all samples are incorrectly identified
- Maximal = 1 when all samples are correctly identified
- Medium = 0.5 when 50% of samples are correctly predicted if they are identically treated
The TPR and TNR wrt a cohort can be the accuracy on the entropy basis here:
- is an user-defined cohort's weight ( here!), value of 0.5 by default
- is the accuracy wrt individual
- is the accuracy wrt individual 's samples only
CATmean
by default
Example
# The result file 'results.csv' looks as below
ID, tied_ID, cohort, true_label, pred_label, pred_proba
1, A, c1, 1, 1, 0.8
2, A, c1, 1, 0, 0.2
3, B, c1, 0, 0, 0.3
4, C, c2, 1, 1, 0.9
5, C, c2, 1, 1, 0.7
6, C, c2, 1, 1, 0.6
7, D, c2, 0, 1, 0.8
8, D, c2, 0, 0, 0.3
9, E, c3, 1, 1, 0.6
# sig: the cohorts of interest, .e.g., c1 and c2, which are the two values of the cohort columns
# alpha is the user-defined weight for sig, alpha=0.5 by default
# Codes:
from cate.metrics import CAT
import numpy as np
df = np.read_csv('results.csv')
cols = ['ID','tied_ID','cohort', 'true_label', 'pred_label', 'pred_proba']
col_ID, col_tied_ID, col_cohort, col_true_label, col_pred_label, col_pred_proba = cols[0], cols[1], cols[2], cols[3], cols[4], cols[5]
# cohorts of our interest
sig = ['c1','c2']
cat = CAT(df_val,
col_ID = col_ID,
col_tied_ID = col_tied_ID,
col_cohort = col_cohort,
col_true_label = col_true_label,
col_pred_label = col_pred_label,
col_pred_proba = col_pred_proba,
sig = sig,
alpha = 0.7)
# pred_proba: predicted probabilities
pred_proba = df['pred_proba']
# set prediected probability
cat.set_proba(pred_proba)
# compute AUC
cat.get_auc()
# compute Sensitivity and Specificity based on the cutoff
# cut_proba: probability cutoff, e.g., 0.85
cut_proba = 0.5
cat.get_sen_spe(cut_proba)
# compute the CATE
cat.score()
# visualize predicted probability (grouped by samples' true label)
cat.plot_proba()
# statistic of predicted probability
cat.stat_proba()
# visualize the statistics about tied samples - those who apper twice or more
cat.plot_score(col_true_label, num_occur=2)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
No source distribution files available for this release.See tutorial on generating distribution archives.
Built Distribution
catie-0.1.2-py3-none-any.whl
(6.2 kB
view hashes)