Skip to main content

Tools to compute and visualize cohort-attention tied entropy, including CATSensitivity, CATSpecificity, and their combination CATmean

Project description

This is a package to compute cohort-attention tied entropy (CATE), in three forms CATSensitivity, CATSpecificity, and their combination -- CATmean, given the particular cohorts of interest. CATSensitivity and CATSpecificity are, in effect, an enhancement of Specificity and Sensitivity. The package gives the detailed measure wrt every of cohorts as well.

Cohort-attention Tied Entropy

CATSensitivity is given by

and CATSpecificity by

  • Minimum = 0 when all samples are incorrectly identified
  • Maximal = 1 when all samples are correctly identified
  • Medium = 0.5 when 50% of samples are correctly predicted if they are identically treated

The TPR and TNR wrt a cohort can be the accuracy on the entropy basis here:

  • is an user-defined cohort's weight ( here!), value of 0.5 by default
  • is the accuracy wrt individual
  • is the accuracy wrt individual 's samples only

CATmean

by default

Example

# The result file 'results.csv' looks as below

ID, tied_ID, cohort, true_label, pred_label, pred_proba
1, A, c1, 1, 1, 0.8
2, A, c1, 1, 0, 0.2
3, B, c1, 0, 0, 0.3
4, C, c2, 1, 1, 0.9
5, C, c2, 1, 1, 0.7
6, C, c2, 1, 1, 0.6
7, D, c2, 0, 1, 0.8
8, D, c2, 0, 0, 0.3
9, E, c3, 1, 1, 0.6

# sig: the cohorts of interest, .e.g., c1 and c2, which are the two values of the cohort columns

# alpha is the user-defined weight for sig, alpha=0.5 by default

# Codes: 


from cate.metrics import CAT
import numpy as np

df = np.read_csv('results.csv')

cols = ['ID','tied_ID','cohort', 'true_label', 'pred_label', 'pred_proba']
col_ID, col_tied_ID, col_cohort, col_true_label, col_pred_label, col_pred_proba = cols[0], cols[1], cols[2], cols[3], cols[4], cols[5]

# cohorts of our interest
sig = ['c1','c2']

cat = CAT(df_val, 
	col_ID = col_ID, 
	col_tied_ID = col_tied_ID, 
	col_cohort = col_cohort, 
	col_true_label = col_true_label, 
	col_pred_label = col_pred_label, 
	col_pred_proba = col_pred_proba,
	sig = sig,
	alpha = 0.7)

# pred_proba: predicted probabilities
pred_proba = df['pred_proba']

# set prediected probability
cat.set_proba(pred_proba)

# compute AUC
cat.get_auc()

# compute Sensitivity and Specificity based on the cutoff
# cut_proba: probability cutoff, e.g., 0.85 
cut_proba = 0.5
cat.get_sen_spe(cut_proba)

# compute the CATE
cat.score()

# visualize predicted probability (grouped by samples' true label)
cat.plot_proba()

# statistic of predicted probability
cat.stat_proba()

# visualize the statistics about tied samples - those who apper twice or more
cat.plot_cohort_score()


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

catie-0.2.3-py3-none-any.whl (6.3 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page