Tools to compute cohort-attention tied entropy, including CATSensitivity, CATSpecificity, and their combination - CATmean
Project description
This is a package to compute cohort-attention tied entropy (CATE), in three forms CATSensitivity, CATSpecificity, and their combination -- CATmean, given the particular cohorts of interest. The package produces AUC, specificity, sensitivity, and the detailed measure wrt every of cohorts as well. Sensitivity and Specificity are in effect a particular case of CATSensitivity and CATSpecificity.
Cohort-attention Tied Entropy
CATSensitivity is given by
and CATSpecificity by
- Excellent: CATSensitivity = CATSpecificity = 1 when all samples are correctly identified
- Bad: CATSensitivity = CATSpecificity = 0 when all are incorrectly identified
- No fit: CATSensitivity = CATSpecificity = 0.5 when 50% of samples are correctly predicted if they are identically treated (i.e., alpha = 0.5)
The TPR and TNR wrt a cohort can be the accuracy on the entropy basis:
- is an user-defined cohort's weight ( here!), value of 0.5 by default
- is the accuracy wrt individual
- is the accuracy wrt individual 's samples only
CATmean
by default
install
pip3 install catie
upgrade to the latest version
pip3 install catie -U
Usage
Example: the result file 'results.csv' looks as below
ID, tied_ID, cohort, true_label, pred_label, pred_proba
1, A, c1, 1, 1, 0.8
2, A, c1, 1, 0, 0.2
3, B, c1, 0, 0, 0.3
4, C, c2, 1, 1, 0.9
5, C, c2, 1, 1, 0.7
6, C, c2, 1, 1, 0.6
7, D, c2, 0, 1, 0.8
8, D, c2, 0, 0, 0.3
9, E, c3, 1, 1, 0.6
Codes
import os
import sys
import numpy as np
import pandas as pd
import math
from metrics import CAT
# load results
df = pd.read_csv('results.csv')
cols = ['ID','tied_ID','cohort', 'true_label', 'pred_label', 'pred_proba']
col_ID, col_tied_ID, col_cohort, col_true_label, col_pred_label, col_pred_proba = cols[0], cols[1], cols[2], cols[3], cols[4], cols[5]
# the cohorts of interest, .e.g., c1 and c2, two values of the cohort columns
sig = ['c1','c2']
# user-predefined weight for sig, alpha=0.7 by default
alpha = 0.7
# cut of the predicted probability (e.g., predicted as a positive if proba > 0.5 and negative otherwise)
cut_proba = 0.5
cat = CAT(
col_ID = col_ID,
col_tied_ID = col_tied_ID,
col_cohort = col_cohort,
col_true_label = col_true_label,
col_pred_label = col_pred_label,
col_pred_proba = col_pred_proba
)
# pred_proba: predicted probabilities
pred_proba = df['pred_proba']
# initialize dataframe
cat.init_data(df)
# set prediected probability
cat.set_proba(pred_proba)
# compute AUC
cat.get_auc()
# identify positive versus negative based on the cutoff
cat.dichotomize(cut_proba)
# compute Sensitivity and Specificity
cat.get_sen_spe()
# compute the CATE
cat.score(sig, alpha, beta)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
File details
Details for the file catie-2.2.0-py3-none-any.whl
.
File metadata
- Download URL: catie-2.2.0-py3-none-any.whl
- Upload date:
- Size: 5.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.7.1 importlib_metadata/3.10.0 pkginfo/1.8.2 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.62.0 CPython/3.7.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 85f30959c76c60167f460ff127b7db03d288f7dc2a80f4f38c72af1012dcc07d |
|
MD5 | 1e2f774ae4eb44a770b15455ba1dfe0b |
|
BLAKE2b-256 | db95e6c46efe3ecb16ca1205dd7a00e5ed7bb5896f0c4672b27015012741aa59 |