A library for evaluating ML model performance across subgroups with stratified metrics and bootstrap confidence intervals
Project description
Model Auditor
A Python library for evaluating machine learning model performance across subgroups with support for stratified metrics, bootstrap confidence intervals, and hierarchical visualizations.
Installation
pip install model-auditor
Features
- Stratified Evaluation: Evaluate model metrics across different subgroups (e.g., by age, gender, region)
- Bootstrap Confidence Intervals: Calculate 95% confidence intervals for all supported metrics
- Comprehensive Metrics: Built-in support for classification metrics including:
- Sensitivity, Specificity, Precision, Recall, F1 Score
- AUROC, AUPRC
- Matthews Correlation Coefficient (MCC)
- F-beta Score (configurable beta)
- TPR, TNR, FPR, FNR
- Count metrics (N, TP, TN, FP, FN, Positive, Negative)
- Threshold Optimization: Automatic threshold selection using the Youden index
- Hierarchical Visualization: Generate data structures for sunburst/treemap plots
- Extensible Design: Protocol-based architecture for custom metrics
Quick Start
from model_auditor import Auditor
from model_auditor.metrics import Sensitivity, Specificity, AUROC, F1Score
# Initialize the auditor
auditor = Auditor()
# Add your data
auditor.add_data(df)
# Define stratification features
auditor.add_feature(name="age_group", label="Age Group")
auditor.add_feature(name="gender", label="Gender")
# Define the score column and threshold
auditor.add_score(name="risk_score", label="Risk Score", threshold=0.5)
# Define the outcome column
auditor.add_outcome(name="diagnosis", mapping={"positive": 1, "negative": 0})
# Set metrics to evaluate
auditor.set_metrics([
Sensitivity(),
Specificity(),
AUROC(),
F1Score()
])
# Run evaluation with bootstrap confidence intervals
results = auditor.evaluate(score_name="risk_score", n_bootstraps=1000)
# Convert results to a DataFrame
results_df = results.to_dataframe()
print(results_df)
Threshold Optimization
Find the optimal decision threshold using the Youden index:
auditor = Auditor()
auditor.add_data(df)
auditor.add_score(name="risk_score")
auditor.add_outcome(name="label")
# Find optimal threshold
optimal_threshold = auditor.optimize_score_threshold(score_name="risk_score")
# Output: Optimal threshold for 'risk_score' found at: 0.423
Available Metrics
Classification Metrics
| Metric | Class | Description |
|---|---|---|
| Sensitivity | Sensitivity() |
TP / (TP + FN) |
| Specificity | Specificity() |
TN / (TN + FP) |
| Precision | Precision() |
TP / (TP + FP) |
| Recall | Recall() |
TP / (TP + FN) |
| F1 Score | F1Score() |
Harmonic mean of precision and recall |
| F-beta | FBetaScore(beta=2.0) |
Weighted harmonic mean |
| MCC | MatthewsCorrelationCoefficient() |
Matthews Correlation Coefficient |
Ranking Metrics
| Metric | Class | Description |
|---|---|---|
| AUROC | AUROC() |
Area Under ROC Curve |
| AUPRC | AUPRC() |
Area Under Precision-Recall Curve |
Rate Metrics
| Metric | Class | Description |
|---|---|---|
| TPR | TPR() |
True Positive Rate |
| TNR | TNR() |
True Negative Rate |
| FPR | FPR() |
False Positive Rate |
| FNR | FNR() |
False Negative Rate |
Count Metrics
| Metric | Class | Description |
|---|---|---|
| N | nData() |
Sample size |
| TP | nTP() |
True positive count |
| TN | nTN() |
True negative count |
| FP | nFP() |
False positive count |
| FN | nFN() |
False negative count |
| Positive | nPositive() |
Positive class count |
| Negative | nNegative() |
Negative class count |
Custom Metrics
Create custom metrics by implementing the AuditorMetric protocol:
from model_auditor.metrics import AuditorMetric
import pandas as pd
class AccuracyMetric(AuditorMetric):
name = "accuracy"
label = "Accuracy"
inputs = ["tp", "tn", "fp", "fn"]
ci_eligible = True
def data_call(self, data: pd.DataFrame) -> float:
tp = data["tp"].sum()
tn = data["tn"].sum()
fp = data["fp"].sum()
fn = data["fn"].sum()
return (tp + tn) / (tp + tn + fp + fn)
# Use with the auditor
auditor.set_metrics([AccuracyMetric(), Sensitivity()])
Hierarchical Visualization
Generate data for hierarchical plots (sunburst, treemap):
from model_auditor.plotting import HierarchyPlotter
plotter = HierarchyPlotter()
plotter.set_data(df)
plotter.set_features(["region", "age_group", "gender"])
plotter.set_score(name="risk_score")
plotter.set_aggregator("median") # or "mean", or a custom function
# Compile plot data
plot_data = plotter.compile(container="All Patients")
# Use with Plotly
import plotly.graph_objects as go
fig = go.Figure(go.Sunburst(
labels=plot_data.labels,
ids=plot_data.ids,
parents=plot_data.parents,
values=plot_data.values,
marker=dict(colors=plot_data.colors)
))
fig.show()
Custom Hierarchies
Define complex hierarchies with conditional features:
from model_auditor.plotting.schemas import Hierarchy, HLevel, HItem
hierarchy = Hierarchy(levels=[
HLevel([HItem(name="region")]),
HLevel([
HItem(name="urban_category", query="region == 'Urban'"),
HItem(name="rural_category", query="region == 'Rural'")
]),
HLevel([HItem(name="age_group")])
])
plotter.set_features(hierarchy)
Disabling Confidence Intervals
For faster evaluation without confidence intervals:
results = auditor.evaluate(score_name="risk_score", n_bootstraps=None)
Output Format
Results are returned as nested dataclass objects that can be converted to DataFrames:
# Get results as DataFrame
df = results.to_dataframe(n_decimals=3, metric_labels=True)
# Access specific feature results
gender_results = results.features["gender"].to_dataframe()
# Access specific level results
male_results = results.features["gender"].levels["Male"].to_dataframe()
Controlling Feature Level Order
By default, feature levels appear in the order they were encountered in the
data. To control the row order in exported DataFrames, assign the feature
column a pd.Categorical dtype with an explicit categories list before
passing the data to the auditor:
import pandas as pd
from model_auditor import Auditor
from model_auditor.metrics import Sensitivity, Specificity
# Declare the desired display order for the 'age_group' column.
# Categories not present in the data still appear as rows (with NaN values).
df["age_group"] = pd.Categorical(
df["age_group"],
categories=["<30", "30-50", "50-70", ">70"],
ordered=True,
)
auditor = Auditor()
auditor.add_data(df)
auditor.add_feature(name="age_group")
auditor.add_score(name="risk_score", threshold=0.5)
auditor.add_outcome(name="outcome")
auditor.set_metrics([Sensitivity(), Specificity()])
results = auditor.evaluate(score_name="risk_score", n_bootstraps=None)
# Rows appear in the declared order: <30, 30-50, 50-70, >70.
# If no rows belong to a declared category (e.g. '>70' is absent from the
# data), that category still appears as a row with NaN metric values.
df_out = results.features["age_group"].to_dataframe()
The same order is preserved in style_dataframe() and in the score-level
ScoreEvaluation.to_dataframe() / ScoreEvaluation.style_dataframe() exports.
Non-categorical feature columns are unaffected.
License
Notebook Styling
For Jupyter notebooks, style_dataframe(...) returns a pandas Styler that colours cells by relative performance tier within each metric column.
# Colour all levels in a feature by relative tier (default: performance metrics only)
display(results.features['age_group'].style_dataframe(n_decimals=3, metric_labels=True))
# Also colour count columns (N, TP, TN, …)
display(results.features['gender'].style_dataframe(include_count_metrics=True))
# Opt into custom colours
display(results.style_dataframe(
low_color="#ffd6d6",
medium_color="#fff9c4",
high_color="#d0f0d0",
))
Tier assignment
| Tier | Default colour | Meaning |
|---|---|---|
| High | #d4edda (green) |
Top third of values in the column |
| Medium | #fff3cd (yellow) |
Middle third |
| Low | #f8d7da (red) |
Bottom third |
Tiers are computed per metric column across all rows in the table. Lower-is-better metrics (fpr, fnr) are inverted: a lower value receives the high (green) tier.
Parameters
| Parameter | Default | Description |
|---|---|---|
n_decimals |
3 |
Decimal places for numeric display |
metric_labels |
False |
Use metric labels as column headers instead of names |
include_count_metrics |
False |
Also style count columns (N, TP, TN, FP, FN, Pos., Neg.) |
low_color |
"#f8d7da" |
Background colour for low-tier cells |
medium_color |
"#fff3cd" |
Background colour for medium-tier cells |
high_color |
"#d4edda" |
Background colour for high-tier cells |
MIT License
Author
Beatrice BM
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file model_auditor-0.1.7.tar.gz.
File metadata
- Download URL: model_auditor-0.1.7.tar.gz
- Upload date:
- Size: 40.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
389d23adbe5a14a1acbbd25c628781fa37a7ee2997183322062f92e6d11607b8
|
|
| MD5 |
3d67c5b82e4d3f04b0c9515c64800534
|
|
| BLAKE2b-256 |
49275786801eb8c17f0e7e4289cd5f27006c80f411b820cca12b0ee9fef21ade
|
Provenance
The following attestation bundles were made for model_auditor-0.1.7.tar.gz:
Publisher:
publish.yml on beatrice-b-m/model-auditor
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
model_auditor-0.1.7.tar.gz -
Subject digest:
389d23adbe5a14a1acbbd25c628781fa37a7ee2997183322062f92e6d11607b8 - Sigstore transparency entry: 1111410685
- Sigstore integration time:
-
Permalink:
beatrice-b-m/model-auditor@272f5e260ed6701faed604f4db81620f6b33ac0d -
Branch / Tag:
refs/tags/v0.1.7 - Owner: https://github.com/beatrice-b-m
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@272f5e260ed6701faed604f4db81620f6b33ac0d -
Trigger Event:
release
-
Statement type:
File details
Details for the file model_auditor-0.1.7-py3-none-any.whl.
File metadata
- Download URL: model_auditor-0.1.7-py3-none-any.whl
- Upload date:
- Size: 25.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1d8d938ec8b8a2d740ac3c3036114904c3eabcff67329390ccfd32106b0a9f3e
|
|
| MD5 |
c73e2c51fc80339293ab6c4d01bff22b
|
|
| BLAKE2b-256 |
5f0364814a1c67e8820a4264b50a51ed0e5e14f98b00d47c163a44e3a32f3bf6
|
Provenance
The following attestation bundles were made for model_auditor-0.1.7-py3-none-any.whl:
Publisher:
publish.yml on beatrice-b-m/model-auditor
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
model_auditor-0.1.7-py3-none-any.whl -
Subject digest:
1d8d938ec8b8a2d740ac3c3036114904c3eabcff67329390ccfd32106b0a9f3e - Sigstore transparency entry: 1111410734
- Sigstore integration time:
-
Permalink:
beatrice-b-m/model-auditor@272f5e260ed6701faed604f4db81620f6b33ac0d -
Branch / Tag:
refs/tags/v0.1.7 - Owner: https://github.com/beatrice-b-m
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@272f5e260ed6701faed604f4db81620f6b33ac0d -
Trigger Event:
release
-
Statement type: