Skip to main content

A comprehensive Python library for clinical AI fairness assessment with 37 metrics across five domains

Project description

EquiMed_DSS

A Comprehensive Python Library for Clinical AI Fairness Assessment

Evaluate reliability, equity, governance, and intersectionality in clinical AI systems using 37 metrics across five domains

License Python Code style: black Imports: isort


Overview

EquiMed_DSS (Equitable Medical Decision Support System) provides a systematic framework for evaluating clinical AI systems across multiple dimensions of fairness, reliability, and governance. The library implements 37 metrics across five domains specifically designed for healthcare applications where equity and safety are paramount.

Key Features

Feature Description
37 Metrics Five domains (reliability, equity, governance, representation/robustness, technical-supplement fairness) plus geographic and advanced-appendix metrics
Clinical AI Focus Designed specifically for healthcare applications
Statistical Analyses HLM, Mediation Analysis, Network Statistics
Publication-Ready Visualizations 6 manuscript-quality figure generators
Multi-Format Data Support MySQL, CSV, TSV, JSON with automatic standardization
Intersectional Analysis Detect bias across demographic combinations
Geographic Equity BEMI and GCC measure evidence-burden mismatch and regional concentration
Tidy Reporting Tables export_table renders metric results as markdown, LaTeX, or HTML

Table of Contents


Installation

Prerequisites

  • Python 3.8 or higher
  • pip package manager

Install from Source

# Clone the repository
git clone https://github.com/johnmuteba/EquiMed_DSS.git
cd EquiMed_DSS

# Create virtual environment (recommended)
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install the package with dependencies
pip install -e .

Install via pip

pip install equimed_dss

Installing inside Jupyter or conda (read this if you hit ModuleNotFoundError)

The most common installation problem is installing into a different Python than the one your notebook or environment actually runs. You see Successfully installed equimed_dss in a terminal, then ModuleNotFoundError: No module named 'equimed_dss' in Jupyter. This is an environment mismatch, not a package problem. Install into the running interpreter:

In a Jupyter notebook cell (installs into the active kernel, then restart the kernel):

%pip install equimed_dss

From a terminal, target a specific interpreter explicitly:

python -m pip install equimed_dss          # uses THIS python
# conda example:
conda activate myenv && python -m pip install equimed_dss

Confirm the install is visible to your interpreter:

import sys; print(sys.executable)          # which python is running
import equimed_dss; print(equimed_dss.__version__)

Dependencies

numpy>=1.20.0
pandas>=1.3.0
scipy>=1.7.0
scikit-learn>=1.0.0
matplotlib>=3.5.0
seaborn>=0.11.0
networkx>=2.6.0
statsmodels>=0.13.0

Quick Start

Generate Sample Data

from equimed_dss.utils import SampleDataGenerator

# Generate synthetic clinical AI evaluation data
generator = SampleDataGenerator(random_state=42)
data = generator.generate_fairness_data(n_samples=1000)

print(f"Generated {len(data)} samples with columns: {list(data.columns)}")
# Output: Generated 1000 samples with columns: ['id', 'race', 'gender', 'age_group', 'prediction', 'actual', 'confidence']

Calculate Fairness Metrics

import numpy as np
from equimed_dss.domain2 import HierarchicalEquityRatio, HarmAdjustedFairnessGap

# Example: Calculate Hierarchical Equity Ratio across racial groups
her_metric = HierarchicalEquityRatio()
group_performance = {
    'White': 0.85,
    'Black': 0.78,
    'Hispanic': 0.80,
    'Asian': 0.87
}

her_scores = her_metric.calculate_her(group_performance)
gini = her_metric.calculate_bias_gini(list(group_performance.values()))

print(f"Equity Ratios: {her_scores}")
print(f"Bias-Gini Index: {gini:.4f}")
# Interpretation: Gini < 0.2 indicates low dispersion (good)

Analyze Distributional Fairness

from equimed_dss.appendix import JensenShannonDivergence, WassersteinDistance

# Compare prediction distributions between groups
group_a_predictions = np.array([0.9, 0.85, 0.78, 0.92, 0.88])
group_b_predictions = np.array([0.75, 0.70, 0.68, 0.72, 0.65])

# Jensen-Shannon Divergence (between two aggregate distributions: no underlying
# per-observation sample, so it prints "CI unavailable")
jsd = JensenShannonDivergence()
jsd_result = jsd.calculate_jsd(group_a_predictions, group_b_predictions)
print(jsd_result)                   # JSD = ... :: 95% CI unavailable

# Wasserstein Distance (the two inputs are treated as samples -> bootstrap CI)
wd = WassersteinDistance()
wd_result = wd.calculate_wd(group_a_predictions, group_b_predictions)
print(wd_result)                    # WD = ... :: 95% CI [...] (bootstrap)

Data Format

Required Data Structure

EquiMed_DSS expects data in a standardized format. Use the built-in utilities to convert your data:

from equimed_dss.utils import CorpusLoader, DemographicProcessor

# Load and standardize your data
loader = CorpusLoader()

# From CSV
df = loader.load_from_csv('your_data.csv', text_column='clinical_notes')

# Validate format
validation = loader.validate_format(df)
print(validation)

Expected Columns

Column Type Required Description
id string Yes Unique identifier
content string For text analysis Clinical text/notes
prediction float For fairness metrics Model prediction (0-1)
actual int For fairness metrics Ground truth (0 or 1)
race string For demographic analysis Racial/ethnic group
gender string For demographic analysis Gender identity
age_group string For demographic analysis Age category

Sample Data Schema

{
  "id": "patient_001",
  "content": "Patient presents with chest pain...",
  "prediction": 0.85,
  "actual": 1,
  "race": "Black",
  "gender": "Female",
  "age_group": "Middle Age"
}

Metrics Overview

Domain 1: Reliability & Calibration

Metric Abbreviation Range Ideal Description
Decision Flip Rate DFR [0, 1] close to 0 Decision instability under counterfactual inputs
Embedding Consistency Score ECS [0, 1] higher Embedding stability under perturbation
Inter-Rater Reliability (ICC 2,1) ICC [0, 1] > 0.75 Agreement across judges

Every metric result carries a 95% confidence interval and prints it. Just print(result) and the value is shown alongside its CI (the result is still a plain dict, so result['flip_rate'] etc. keep working):

import numpy as np
from equimed_dss.domain1 import DecisionFlipRate, EmbeddingConsistencyScore, InterRaterReliability

# Decision Flip Rate (DFR): how often decisions flip under counterfactual inputs
dfr = DecisionFlipRate()
print(dfr.calculate_dfr(
    original_decisions=['ACS', 'ACS', 'non-cardiac', 'ACS'],
    counterfactual_decisions=['ACS', 'non-cardiac', 'non-cardiac', 'ACS'],
))
# DFR = 0.250 :: 95% CI [0.046; 0.699] (Wilson score)

# Embedding Consistency Score (ECS): stability of embeddings under perturbation
ecs = EmbeddingConsistencyScore()
original = np.random.RandomState(0).rand(10, 8)
perturbed = original + np.random.RandomState(1).normal(0, 0.05, (10, 8))
print(ecs.calculate_ecs(original, perturbed))
# ECS = 0.003 :: 95% CI [0.002; 0.005] (bootstrap)

# Inter-Rater Reliability (ICC 2,1): agreement across judges (subjects x raters)
icc = InterRaterReliability()
print(icc.calculate_icc_2_1(np.array([[3, 4, 3], [5, 5, 4], [2, 3, 2], [4, 4, 5]])))
# ICC(2,1) = 0.750 :: 95% CI [-0.500; 0.816] (bootstrap (over items))

Domain 2: Fairness, Equity & Ethics

Metric Abbreviation Range Ideal Description
Hierarchical Equity Ratio HER [0, ∞) 0.8-1.25 Group equity (4/5ths rule)
Bias-Gini Index BGI [0, 1] < 0.2 Performance dispersion
Harm-Adjusted Fairness Gap HAFG [0, ∞) < 0.1 Clinical harm-weighted disparity
Ethical Risk Index ERI [0, ∞) < 0.05 Aggregated ethical violations
Intersectional Bias Score IBS varies low Subgroup outlier detection
from equimed_dss.domain2 import HierarchicalEquityRatio, HarmAdjustedFairnessGap, EthicalRiskIndex

# Harm-Adjusted Fairness Gap. Pass per-case error labels (group*_cases) to get a
# bootstrap CI; with only aggregate counts the result prints "CI unavailable".
hafg = HarmAdjustedFairnessGap()
result = hafg.calculate_hafg(
    group1_errors={'fn': 5, 'fp': 10},
    group2_errors={'fn': 2, 'fp': 5},
    group1_cases=['fn'] * 5 + ['fp'] * 10 + ['tn'] * 85,
    group2_cases=['fn'] * 2 + ['fp'] * 5 + ['tn'] * 93,
)
print(result)
# HAFG = 0.562 :: 95% CI [...] (bootstrap)

# Ethical Risk Index (CHR-style: value + Wilson SVR + bootstrap ERI CI)
eri = EthicalRiskIndex()
result = eri.calculate_eri(
    violations=[{'severity': 2.5}, {'severity': 1.0}, {'severity': 5.0}],
    n_total_outputs=100
)
print(result)
# ERI = 0.085 :: 95% CI [...] (bootstrap)

Domain 3: Governance & Transparency

Metric Abbreviation Range Ideal Description
Temporal Fairness Drift TFD varies stable Fairness degradation over time
Audit Traceability Score ATS [0, 1] > 0.9 Audit trail completeness
Governance Compliance Index GCI [0, 1] 1.0 Regulatory compliance
from equimed_dss.domain3 import TemporalFairnessDrift, AuditTraceabilityScore, GovernanceComplianceIndex

# Temporal Fairness Drift
tfd = TemporalFairnessDrift()
result = tfd.calculate_drift([0.85, 0.84, 0.86, 0.83, 0.75, 0.84])
print(result)                       # TFD = mean PDI :: 95% CI [...] (bootstrap)
print("Drift detected:", result['drift_detected'])

# Audit Traceability Score: fraction of audit records that are fully traceable
ats = AuditTraceabilityScore()
result = ats.calculate_ats(n_traceable=8, n_total=10)
print(result)                       # ATS = 0.800 :: 95% CI [...] (Wilson score)

Domain 4: Representation & Robustness

Metric Abbreviation Range Ideal Description
Semantic Parity Gap SPG [0, ∞) close to 0 Embedding-centroid distance between identical cases differing only by a protected attribute
Clinical Hallucination Rate CHR [0, 1] close to 0 Fraction of response claims unsupported by the retrieved context (NLI entailment)
Instructional Vulnerability Index IVI [0, 1] close to 0 Proportion of decisions that flip under a biased/leading instruction
Geographic Representation Index GRI [0, 1] higher Set-based non-Western variety of represented locations
import numpy as np
from equimed_dss.domain4 import (
    SemanticParityGap, ClinicalHallucinationRate,
    InstructionalVulnerabilityIndex, GeographicRepresentationIndex,
)

# Clinical Hallucination Rate (CHR): unsupported claims at entailment threshold tau
chr_ = ClinicalHallucinationRate()
result = chr_.calculate_chr(support_scores=[0.9, 0.2, 0.4, 0.8, 0.1], tau=0.5)
print(result)                       # CHR = 0.600 :: 95% CI [...] (Wilson score)

# Instructional Vulnerability Index (IVI): decision flips under a biased prompt
ivi = InstructionalVulnerabilityIndex()
result = ivi.calculate_ivi(neutral_outputs=['acs', 'non_cardiac', 'acs', 'other'],
                           biased_outputs=['acs', 'acs', 'acs', 'other'])
print(result)                       # IVI = 0.250 :: 95% CI [...] (Wilson score)

# Semantic Parity Gap (SPG): centroid distance between two demographic embedding clusters
spg = SemanticParityGap()
rng = np.random.RandomState(0)
result = spg.calculate_spg(privileged_embeddings=rng.rand(20, 16),
                           marginalized_embeddings=rng.rand(20, 16) + 0.1)
print(result)                       # SPG = ... :: 95% CI [...] (bootstrap)

Domain 5: Technical-supplement Fairness

Metric Abbreviation Range Ideal Description
Intersectional Calibration Error ICE [0, 1] close to 0 Population-weighted calibration error across intersectional groups; dICE = max gap
Weighted Clinical Harm-Adjusted Fairness Gap wHAFG [0, 1] close to 0 Severity-weighted harm gap across groups
Lexical Diversity Disparity Index LDDI [0, ∞) close to 0 Root type-token-ratio gap across groups
Recommendation Entropy Gap REG [0, ∞) bits close to 0 Max-min recommendation-distribution entropy across groups
Counterfactual Parity Score CPS [0, 1] 1.0 Mean response similarity under a demographic swap (CFU = 1 − CPS)
Clinical Information Density Ratio CIDR [0, 1] 1.0 Group concept-density relative to the richest group
Diagnostic Completeness Index DCI [0, 1] higher Coverage of a reference differential set; dDCI = max gap
Uncertainty Quantification Gap UQG [0, ∞) close to 0 Hedging-density disparity across groups
Geographic Representation Bias Index GRBI [0, ∞) nats close to 0 KL divergence of corpus geography from disease burden
Healthcare System Stratified Fairness HSSF [0, 1] close to 0 Population-weighted within-system demographic gap
Intersectional Shapley Fairness Value ISFV varies low Shapley attribution of disparity to each attribute + interactions
Semantic Robustness Parity Index SRPI [0, 1] 1.0 Min/max paraphrase robustness across groups
from equimed_dss.domain5 import (
    CounterfactualParityScore, GeographicRepresentationBiasIndex,
    IntersectionalShapleyFairnessValue,
)

# Counterfactual Parity Score (CPS) and counterfactual unfairness (CFU)
cps = CounterfactualParityScore()
result = cps.calculate_cps([0.95, 0.88, 0.91, 0.86])
print(result)                       # CPS = 0.900 :: 95% CI [...] (bootstrap)

# Geographic Representation Bias Index (GRBI): KL of corpus geography from burden.
# Pass corpus_records (one region label per evidence record) for a bootstrap CI;
# with only aggregate counts the result prints "CI unavailable".
grbi = GeographicRepresentationBiasIndex()
result = grbi.calculate_grbi(
    corpus_counts={'AMRO': 78, 'EURO': 11, 'WPRO': 8, 'EMRO': 3, 'AFRO': 0, 'SEARO': 0},
    burden_shares={'AMRO': 0.11, 'EURO': 0.19, 'WPRO': 0.10, 'EMRO': 0.23, 'AFRO': 0.15, 'SEARO': 0.21},
    corpus_records=(['AMRO'] * 78 + ['EURO'] * 11 + ['WPRO'] * 8 + ['EMRO'] * 3),
)
print(result)                       # GRBI = ... :: 95% CI [...] (bootstrap)

# Intersectional Shapley Fairness Value (ISFV): attribute attribution of disparity
isfv = IntersectionalShapleyFairnessValue()
result = isfv.calculate_isfv(
    attributes={'race': ['A', 'A', 'B', 'B'], 'sex': ['M', 'F', 'M', 'F']},
    outcomes=[0.8, 0.7, 0.5, 0.6],
)
print(result)                       # ISFV = total disparity :: 95% CI [...] (bootstrap)
print("By attribute:", result['shapley_by_attribute'])

Appendix: Advanced Metrics

These 9 additional metrics provide deeper statistical analysis:

Metric Class Range Threshold Description
Bootstrap Confidence Intervals BootstrapConfidenceIntervals varies CI width < 0.05 Robust uncertainty estimation
Statistical Power Analysis StatisticalPowerAnalysis [0, 1] ≥ 0.8 Sample size adequacy
Bias Concentration Index BiasConcentrationIndex [0, 1] > 0.7 Bias distribution across groups
Mutual Information Content MutualInformationContent [0, ∞) < 0.1 Demographic information leakage
Jensen-Shannon Divergence JensenShannonDivergence [0, 1] < 0.1 Distributional similarity
Wasserstein Distance WassersteinDistance [0, ∞) < 0.1 Optimal transport distance
Network Modularity NetworkModularity [-1, 1] > 0.3 Metric clustering structure
Transparency Score TransparencyScore [0, 1] > 0.7 Explanation quality
Robustness Certification RobustnessCertificationScore [0, 1] > 0.8 Perturbation stability
from equimed_dss.appendix import (
    BootstrapConfidenceIntervals,
    StatisticalPowerAnalysis,
    BiasConcentrationIndex,
    MutualInformationContent,
    JensenShannonDivergence,
    WassersteinDistance,
    NetworkModularity,
    TransparencyScore,
    RobustnessCertificationScore
)
import numpy as np

# Bootstrap Confidence Intervals
bci = BootstrapConfidenceIntervals(n_bootstrap=1000, random_state=42)
data = np.random.normal(0.85, 0.05, 100)
result = bci.calculate_bci(data)
print(result)                       # BCI = observed stat :: 95% CI [...] (bootstrap)

# Statistical Power Analysis (an analytic design quantity: prints "CI unavailable")
spa = StatisticalPowerAnalysis()
result = spa.calculate_sample_size(effect_size=0.5, power=0.8)
print(result)                       # SampleSize = N per group :: 95% CI unavailable

# Bias Concentration Index
bci_metric = BiasConcentrationIndex()
result = bci_metric.calculate_bci([0.3, 0.25, 0.25, 0.2])  # Bias proportions
print(result)                       # BiasConcentration = ... :: 95% CI [...] (bootstrap)

# Mutual Information Content
mic = MutualInformationContent()
demographics = np.array([0, 0, 1, 1, 2, 2, 0, 1])  # Encoded demographics
outcomes = np.array([1, 1, 0, 0, 1, 0, 1, 0])      # Model outcomes
result = mic.calculate_mic(demographics, outcomes)
print(result)                       # MIC = ... :: 95% CI [...] (bootstrap)

# Network Modularity
nm = NetworkModularity()
adjacency = np.array([[0, 0.8, 0.3], [0.8, 0, 0.4], [0.3, 0.4, 0]])
result = nm.calculate_modularity(adjacency)
print(result)                       # NM = ... :: 95% CI [...] (bootstrap)

# Transparency Score
ts = TransparencyScore()
explanations = [
    {'explanation_quality': 0.8, 'feature_importance': 0.75, 'interpretability': 0.9},
    {'explanation_quality': 0.7, 'feature_importance': 0.8, 'interpretability': 0.85}
]
result = ts.calculate_ts(explanations)
print(result)                       # TS = ... :: 95% CI [...] (bootstrap)

# Robustness Certification Score
rcs = RobustnessCertificationScore()
original = np.array([1, 1, 0, 1, 0])
perturbed = [np.array([1, 1, 0, 1, 0]), np.array([1, 0, 0, 1, 0])]
result = rcs.calculate_rcs(original, perturbed)
print(result)                       # RCS = ... :: 95% CI [...] (bootstrap)

Geographic Equity (v1.1.0)

equimed_dss.geographic quantifies how well the evidence base reflects the global disease burden:

  • BurdenEvidenceMismatch (BEMI): total-variation distance between the regional evidence distribution and the regional disease-burden distribution. Range [0, 1]; 0 means evidence tracks burden exactly, 1 means the two distributions are completely disjoint.
  • GeographicConcentration (GCC): sample-corrected Gini coefficient (G*) and normalized Shannon entropy (H_norm) for the regional distribution of included studies. G* = 0 and H_norm = 1 both indicate even coverage; G* = 1 and H_norm = 0 indicate single-region concentration. Note that Gini and entropy run in opposite directions.
  • WHO_REGION_IHD_BURDEN: bundled reference constant of normalized IHD DALY shares from Roth GA et al., 2020 (GBD). AFRO and SEARO together carry about 36% of global IHD burden.
from equimed_dss.geographic import BurdenEvidenceMismatch, GeographicConcentration, WHO_REGION_IHD_BURDEN

evidence = {"AFRO": 5, "AMRO": 40, "EURO": 30, "SEARO": 3, "WPRO": 10, "EMRO": 2}

# Pass per-study region labels (evidence_records / region_records) for a bootstrap
# CI; with only aggregate counts the result prints "CI unavailable".
records = [r for r, n in evidence.items() for _ in range(n)]

bemi = BurdenEvidenceMismatch()
bemi_result = bemi.calculate_bemi(
    evidence_counts=evidence,
    burden_shares=WHO_REGION_IHD_BURDEN,
    evidence_records=records,
)
print(bemi_result)                  # BEMI = ... :: 95% CI [...] (bootstrap)

gcc = GeographicConcentration()
gcc_result = gcc.calculate_gcc(evidence, region_records=records)
print(gcc_result)                   # GCC = G* :: 95% CI [...] (bootstrap)
print("H_norm:", gcc_result['entropy_normalized'])

Reporting Tables (v1.1.0)

equimed_dss.reporting converts metric results into tidy DataFrames and exports them:

from equimed_dss.geographic import (
    BurdenEvidenceMismatch, GeographicConcentration, WHO_REGION_IHD_BURDEN,
)
from equimed_dss.reporting import geographic_table, export_table

evidence = {"AFRO": 5, "AMRO": 40, "EURO": 30, "SEARO": 3, "WPRO": 10, "EMRO": 2}
bemi_result = BurdenEvidenceMismatch().calculate_bemi(
    evidence_counts=evidence, burden_shares=WHO_REGION_IHD_BURDEN)
gcc_result = GeographicConcentration().calculate_gcc(evidence)

df = geographic_table(bemi_result, gcc_result)
print(df)                                    # show the table in the console
print(export_table(df, fmt="markdown"))      # render it as markdown to the screen

# To also save to files, pass a path (returns None, so do not wrap these in print):
export_table(df, fmt="markdown", path="results/geographic.md")
export_table(df, fmt="latex",    path="results/geographic.tex")
export_table(df, fmt="html",     path="results/geographic.html")

Output:

                       metric  value
0                        BEMI   0.48
1                  Gini* (G*)  0.613
2                      H_norm  0.742
3  concentration (1 - H_norm)  0.258
4     most_underserved_region   EMRO
5                   n_regions      6

Visualizations

All plot helpers in equimed_dss.utils return a Matplotlib figure (they do not call plt.show()), and write to save_path when given, so they compose cleanly in scripts, notebooks, and report pipelines.

Equity radar — one normalized score per domain for an at-a-glance audit:

from equimed_dss.utils import plot_equity_radar

fig = plot_equity_radar(
    {"Reliability": 0.16, "Fairness": 0.95, "Governance": 0.80,
     "Representation": 0.33, "Robustness": 0.73},
    reference=0.8,                      # optional acceptability-target ring
    save_path="equity_radar.png",
)

Geographic dumbbell — disease burden vs evidence share per region (reads the burden-evidence mismatch, BEMI, far more clearly than a bubble plot):

from equimed_dss.utils import plot_geographic_dumbbell

fig = plot_geographic_dumbbell(
    burden_shares={"AMRO": 0.114, "SEARO": 0.211, "AFRO": 0.150,
                   "EMRO": 0.230, "EURO": 0.195, "WPRO": 0.100},
    evidence_shares={"AMRO": 0.780, "SEARO": 0.0, "AFRO": 0.002,
                     "EMRO": 0.037, "EURO": 0.105, "WPRO": 0.077},
    save_path="geographic_dumbbell.png",
)

Other helpers: plot_bland_altman, plot_control_chart, plot_correlation_matrix, plot_her_heatmap, plot_metric_distribution, plot_network_graph, and the six manuscript figures plot_figure2…7.


Statistical Analyses

EquiMed_DSS includes advanced statistical methods:

Hierarchical Linear Modeling (HLM)

from equimed_dss.statistics import HierarchicalLinearModeling

hlm = HierarchicalLinearModeling()
# Decompose outcome variance into between-group (e.g. hospital) vs within-group
# components; the ICC reports the between-group share.

Mediation Analysis

from equimed_dss.statistics import MediationAnalysis

mediation = MediationAnalysis(n_bootstrap=1000)
# Decompose a total effect into direct and indirect (mediated) pathways with a
# bootstrap CI for the indirect effect.

Network Statistics

from equimed_dss.statistics import NetworkStatistics

network = NetworkStatistics()
# Calculate centrality measures, clustering coefficients

Reliability Analysis

from equimed_dss.statistics import ReliabilityAnalysis

reliability = ReliabilityAnalysis()
# Cronbach's Alpha, Bland-Altman analysis

Visualizations

Generate publication-ready figures (Figures 2-7 from manuscript):

Each plot_figure* function takes a structured dict of inputs (the exact keys are documented in each function's docstring). Use generate_figure_data() for ready-to-run sample inputs, then swap in your own data using the same keys:

from equimed_dss.utils import (
    generate_figure_data,
    plot_figure2_reliability_dashboard,
    plot_figure3_corpus_comparison,
    plot_figure4_temporal_robustness,
    plot_figure5_ethics_governance,
    plot_figure6_metric_networks,
    plot_figure7_intersectional_heatmap,
)

figs = generate_figure_data()  # sample inputs for every figure

plot_figure2_reliability_dashboard(figs["fig2"], save_path="figures/fig2.png")
plot_figure3_corpus_comparison(figs["fig3"], save_path="figures/fig3.png")
plot_figure4_temporal_robustness(figs["fig4"], save_path="figures/fig4.png")
plot_figure5_ethics_governance(figs["fig5"], save_path="figures/fig5.png")
plot_figure6_metric_networks(figs["fig6"], save_path="figures/fig6.png")
plot_figure7_intersectional_heatmap(figs["fig7"], save_path="figures/fig7.png")

Examples

The examples/ directory contains comprehensive usage examples:

# Domain examples
python examples/example_domain1.py  # Reliability metrics
python examples/example_domain2.py  # Fairness & Ethics metrics
python examples/example_domain3.py  # Governance metrics
python examples/example_appendix.py # Advanced metrics

# Advanced examples
python examples/example_advanced_metrics.py  # All 9 new metrics
python examples/example_dataset.py           # Sample data generation
python examples/example_advanced_network.py  # Network analysis

For an end-to-end tutorial with data loading, metric calculations, statistical analyses, and visualization examples, see the EquiMed-DSS vignette.


Project Structure

EquiMed_DSS/
├── equimed_dss/
│   ├── domain1/              # Reliability & Calibration (3 metrics)
│   │   ├── dfr.py            # Decision Flip Rate (DFR)
│   │   ├── ecs.py            # Embedding Consistency Score (ECS)
│   │   └── icc.py            # Inter-Rater Reliability (ICC)
│   ├── domain2/              # Fairness, Equity & Ethics (4 metrics)
│   │   ├── her.py            # Hierarchical Equity Ratio + Bias-Gini
│   │   ├── hafg.py           # Harm-Adjusted Fairness Gap
│   │   ├── eri.py            # Ethical Risk Index
│   │   └── ibs.py            # Intersectional Bias Score
│   ├── domain3/              # Governance & Transparency (3 metrics)
│   │   ├── tfd.py            # Temporal Fairness Drift
│   │   ├── ats.py            # Audit Traceability Score
│   │   └── gci.py            # Governance Compliance Index
│   ├── domain4/              # Representation & Robustness (4 metrics)
│   │   ├── spg.py            # Semantic Parity Gap (SPG)
│   │   ├── chr.py            # Clinical Hallucination Rate (CHR)
│   │   ├── ivi.py            # Instructional Vulnerability Index (IVI)
│   │   └── gri.py            # Geographic Representation Index (GRI)
│   ├── domain5/              # Technical-supplement fairness (12 metrics)
│   │   ├── calibration.py    # Intersectional Calibration Error (ICE)
│   │   ├── harm.py           # Weighted Clinical Harm-Adjusted Fairness Gap (wHAFG)
│   │   ├── text.py           # LDDI, REG, CIDR, DCI, UQG
│   │   ├── counterfactual.py # Counterfactual Parity Score (CPS), SRPI
│   │   ├── geographic_bias.py# Geographic Representation Bias Index (GRBI)
│   │   ├── system.py         # Healthcare System Stratified Fairness (HSSF)
│   │   └── shapley.py        # Intersectional Shapley Fairness Value (ISFV)
│   ├── geographic/           # Geographic equity (2 metrics + reference)
│   │   ├── burden_evidence.py# Burden-Evidence Mismatch (BEMI)
│   │   ├── concentration.py  # Geographic Concentration of Coverage (GCC)
│   │   └── reference_data.py # WHO_REGION_IHD_BURDEN reference shares
│   ├── appendix/             # Advanced Metrics (9 metrics)
│   │   └── advanced_metrics.py
│   ├── reporting/            # Tidy result tables + export
│   │   ├── tables.py         # hierarchical/mediation/network/geographic tables
│   │   └── export.py         # export_table (markdown/LaTeX/HTML)
│   ├── statistics/           # Statistical Analyses
│   │   ├── hierarchical.py   # Hierarchical Linear Modeling
│   │   ├── mediation.py      # Mediation Analysis
│   │   ├── network_stats.py  # Network Statistics
│   │   └── reliability_stats.py
│   └── utils/                # Utilities
│       ├── data_formatters.py # Data loading & conversion
│       ├── visualization.py   # Publication-ready figures
│       └── sample_data.py     # Sample data generation
├── examples/                 # Usage examples
├── tests/                    # Test suite (124 tests)
├── docs/                     # Documentation (incl. VIGNETTE.md)
└── pyproject.toml

API Reference

Core Classes

Class Module Description
DecisionFlipRate domain1 Decision instability under counterfactual inputs
EmbeddingConsistencyScore domain1 Representation stability under perturbation
InterRaterReliability domain1 Inter-rater reliability (ICC(2,1))
HierarchicalEquityRatio domain2 Group equity ratios + Bias-Gini
HarmAdjustedFairnessGap domain2 Clinical harm gaps
EthicalRiskIndex domain2 Ethical violations
IntersectionalBiasScore domain2 Subgroup bias detection
TemporalFairnessDrift domain3 Fairness over time
AuditTraceabilityScore domain3 Audit completeness
GovernanceComplianceIndex domain3 Regulatory compliance
SemanticParityGap domain4 Latent demographic sensitivity (SPG)
ClinicalHallucinationRate domain4 Unsupported-claim rate (CHR)
InstructionalVulnerabilityIndex domain4 Susceptibility to bias-priming (IVI)
GeographicRepresentationIndex domain4 Non-Western location share (GRI)
IntersectionalCalibrationError domain5 Intersectional calibration gap (ICE)
WeightedClinicalHarmAdjustedFairnessGap domain5 Severity-weighted harm gap (wHAFG)
LexicalDiversityDisparityIndex domain5 Vocabulary-richness disparity (LDDI)
RecommendationEntropyGap domain5 Recommendation-entropy gap (REG)
CounterfactualParityScore domain5 Counterfactual response parity (CPS)
ClinicalInformationDensityRatio domain5 Clinical-concept density ratio (CIDR)
DiagnosticCompletenessIndex domain5 Guideline-differential coverage (DCI)
UncertaintyQuantificationGap domain5 Hedging-density gap (UQG)
GeographicRepresentationBiasIndex domain5 KL geography-vs-burden divergence (GRBI)
HealthcareSystemStratifiedFairness domain5 System-stratified fairness (HSSF)
IntersectionalShapleyFairnessValue domain5 Shapley disparity attribution (ISFV)
SemanticRobustnessParityIndex domain5 Cross-group paraphrase robustness (SRPI)
BootstrapConfidenceIntervals appendix Uncertainty quantification
StatisticalPowerAnalysis appendix Sample size planning
BiasConcentrationIndex appendix Bias distribution
MutualInformationContent appendix Information leakage
JensenShannonDivergence appendix Distribution divergence
WassersteinDistance appendix Optimal transport
NetworkModularity appendix Community structure
TransparencyScore appendix Explanation quality
RobustnessCertificationScore appendix Perturbation stability
BurdenEvidenceMismatch geographic Evidence-burden mismatch (BEMI)
GeographicConcentration geographic Regional concentration (GCC)
WHO_REGION_IHD_BURDEN geographic IHD DALY burden reference shares
hierarchical_coefficients_table reporting HLM results as tidy DataFrame
mediation_effects_table reporting Mediation results as tidy DataFrame
network_centrality_table reporting Network centrality as tidy DataFrame
geographic_table reporting BEMI/GCC results as tidy DataFrame
export_table reporting Render DataFrame to markdown/LaTeX/HTML

Contributing

We welcome contributions! Please see CONTRIBUTING.md for guidelines.

Development Setup

# Clone and install in development mode
git clone https://github.com/johnmuteba/EquiMed_DSS.git
cd EquiMed_DSS
pip install -e ".[dev]"

# Run tests
pytest tests/ -v --cov=equimed_dss

# Code quality
black equimed_dss tests examples
isort equimed_dss tests examples
mypy equimed_dss

Citation

If you use EquiMed_DSS in your research, please cite:

@software{muteba_equimed_dss_2025,
  title={EquiMed_DSS: A Comprehensive Library for Clinical AI Fairness Assessment},
  author={Muteba Mwamba, John},
  year={2025},
  url={https://github.com/johnmuteba/EquiMed_DSS},
  note={37 metrics for reliability, equity, governance, representation, and robustness in clinical AI}
}

License

This project is licensed under the MIT License - see the LICENSE file for details.


Acknowledgments

  • Developed for advancing equity in clinical AI systems
  • Built with support from the research community
  • Statistical methods based on peer-reviewed literature

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

equimed_dss-1.9.4.tar.gz (212.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

equimed_dss-1.9.4-py3-none-any.whl (112.7 kB view details)

Uploaded Python 3

File details

Details for the file equimed_dss-1.9.4.tar.gz.

File metadata

  • Download URL: equimed_dss-1.9.4.tar.gz
  • Upload date:
  • Size: 212.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for equimed_dss-1.9.4.tar.gz
Algorithm Hash digest
SHA256 ed17a3e4ec595433b3c8b0f5fa8f585c4e38844147b778c03d9b493764c4231e
MD5 617fde4260c213fc72dd7764752569dc
BLAKE2b-256 91088395812e00e138a60d621f2433a9da3069740524d3d7f563f7e62b5cef85

See more details on using hashes here.

File details

Details for the file equimed_dss-1.9.4-py3-none-any.whl.

File metadata

  • Download URL: equimed_dss-1.9.4-py3-none-any.whl
  • Upload date:
  • Size: 112.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for equimed_dss-1.9.4-py3-none-any.whl
Algorithm Hash digest
SHA256 2329347af908348bbb82641a6c796988f2f31602f8b7024c5ad2a967fc145058
MD5 86645b3c80939ea50bb1e5745aecae2e
BLAKE2b-256 8df009695060a61b2beb564f8a420f7533b8f7bf2ae7774857e24d955e821dee

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page