Skip to main content

Tools to compute and visualize cohort-attention tied entropy, including CATSensitivity, CATSpecificity, and their combination - CATmean

Project description

This is a package to compute cohort-attention tied entropy (CATE), in three forms CATSensitivity, CATSpecificity, and their combination -- CATmean, given the particular cohorts of interest. The package produces AUC, specificity, sensitivity, and the detailed measure wrt every of cohorts as well. Sensitivity and Specificity are in effect a particular case of CATSensitivity and CATSpecificity.

Cohort-attention Tied Entropy

CATSensitivity is given by

and CATSpecificity by

  • Excellent: CATSensitivity = CATSpecificity = 1 when all samples are correctly identified
  • Bad: CATSensitivity = CATSpecificity = 0 when all are incorrectly identified
  • No fit: CATSensitivity = CATSpecificity = 0.5 when 50% of samples are correctly predicted if they are identically treated (i.e., alpha = 0.5)

The TPR and TNR wrt a cohort can be the accuracy on the entropy basis:

  • is an user-defined cohort's weight ( here!), value of 0.5 by default
  • is the accuracy wrt individual
  • is the accuracy wrt individual 's samples only

CATmean

by default

install

pip3 install catie

upgrade to the latest version

pip3 install catie -U

Usage

Example: the result file 'results.csv' looks as below

ID, tied_ID, cohort, true_label, pred_label, pred_proba
1, A, c1, 1, 1, 0.8
2, A, c1, 1, 0, 0.2
3, B, c1, 0, 0, 0.3
4, C, c2, 1, 1, 0.9
5, C, c2, 1, 1, 0.7
6, C, c2, 1, 1, 0.6
7, D, c2, 0, 1, 0.8
8, D, c2, 0, 0, 0.3
9, E, c3, 1, 1, 0.6

Codes

import os
import sys
import numpy as np
import pandas as pd
import math
import plotly.express as px
from metrics import CAT

# load results
df = pd.read_csv('results.csv')


cols = ['ID','tied_ID','cohort', 'true_label', 'pred_label', 'pred_proba']
col_ID, col_tied_ID, col_cohort, col_true_label, col_pred_label, col_pred_proba = cols[0], cols[1], cols[2], cols[3], cols[4], cols[5]

# the cohorts of interest, .e.g., c1 and c2, two values of the cohort columns
sig = ['c1','c2']

# user-predefined weight for sig, alpha=0.7 by default
alpha = 0.7

# cut of the predicted probability (e.g., predicted as a positive if proba > 0.5 and negative otherwise) 
cut_proba = 0.5

cat = CAT(
	col_ID = col_ID, 
	col_tied_ID = col_tied_ID, 
	col_cohort = col_cohort, 
	col_true_label = col_true_label, 
	col_pred_label = col_pred_label, 
	col_pred_proba = col_pred_proba,
	sig = sig,
	alpha = alpha)

# pred_proba: predicted probabilities
pred_proba = df['pred_proba']

# initialize dataframe
cat.init_data(df)

# set prediected probability
cat.set_proba(pred_proba)

# compute AUC
cat.get_auc()

# compute Sensitivity and Specificity based on the cutoff
cat.get_sen_spe(cut_proba)

# compute the CATE
cat.score()

# visualize predicted probability (grouped by samples' true label)
cat.plot_proba()

# statistic of predicted probability
cat.stat_proba()

# visualize the statistics about tied samples - those who apper twice or more
cat.plot_cohort_score()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

catie-1.2.0-py3-none-any.whl (6.4 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page