Skip to main content

Universal auditor model that suppresses unreliable AI predictions in human-AI systems. Works with sklearn, PyTorch, HuggingFace, OpenAI, Anthropic, and any custom model.

Project description

AuditorAI

Stop your AI model from showing wrong predictions to users.

CI PyPI version Python versions License: MIT

The problem: Your AI model gets predictions wrong sometimes, and those wrong predictions reach your users — causing bad decisions, lost trust, and real harm.

The solution: AuditorAI adds a second model (the "auditor") that learns when your primary model is likely wrong and suppresses those predictions before they reach the user. The suppressed cases get routed to a human instead.

The result: Only confident, reliable predictions are shown. Wrong predictions are caught and handled by humans, improving your overall system accuracy.

Quick numbers

Dataset AI alone With AuditorAI Auditor AUROC Flag rate
Breast Cancer RandomForest +auditor 0.93 2%
Wine GradientBoosting +auditor 0.75 3%
Digits LogisticRegression +auditor 0.93 4%

The auditor identifies unreliable predictions with high precision while flagging only 2–4% of cases for human review.


Install

pip install auditorai
What you need Command
Base (sklearn models) pip install auditorai
PyTorch models pip install "auditorai[pytorch]"
HuggingFace models pip install "auditorai[hf]"
OpenAI models pip install "auditorai[openai]"
Anthropic models pip install "auditorai[anthropic]"
Everything pip install "auditorai[all]"

Quickstart

With a sklearn model

from sklearn.datasets import load_breast_cancer
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from auditorai import AuditorSystem, wrap

# 1. Load data and train your model as usual
X, y = load_breast_cancer(return_X_y=True)
X_temp, X_test, y_temp, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
X_train, X_val, y_train, y_val = train_test_split(X_temp, y_temp, test_size=0.25, random_state=42)

model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)

# 2. Wrap your model and train the auditor (3 lines)
system = AuditorSystem(wrap(model))
system.train(X_val, y_val)       # uses held-out data
system.auto_tune(X_val, y_val)   # finds best threshold

# 3. Get audited predictions
result = system.predict(X_test)
print(result["show_mask"])       # True = safe to show
print(result["suppress_mask"])   # True = let human decide
print(result["p_wrong"])         # confidence score per prediction

With a PyTorch model

from auditorai import AuditorSystem, wrap
# pip install "auditorai[pytorch]"

adapter = wrap(your_torch_model,
               adapter_type="pytorch",
               n_classes=3)
system = AuditorSystem(adapter)
system.train(X_val, y_val)
result = system.predict(X_test)

With an OpenAI model

from auditorai import AuditorSystem, wrap
# pip install "auditorai[openai]"

def parse_response(text):
    # parse your model output -> (class_index, confidence)
    return int(text.strip()), 0.85

adapter = wrap("gpt-4o-mini",
               adapter_type="openai",
               parse_response=parse_response,
               n_classes=2)
system = AuditorSystem(adapter)
system.train(texts_val, y_val)
result = system.predict(texts_test)

How it works

Step 1:  Your model makes a prediction
            ↓
Step 2:  AuditorAI scores it (0 = confident, 1 = likely wrong)
            ↓
Step 3:  Score ≥ threshold? → Suppress (human decides)
         Score < threshold? → Show prediction to user
            ↓
Step 4:  Only confident predictions reach your users

Key insight: The auditor trains on held-out validation data — data your primary model has never seen during training. This means it learns your model's real-world failure patterns, not memorized ones. The auditor uses uncertainty signals like confidence, entropy, and margin between top predictions to detect when your model is likely wrong.


CLI

Works from the command line with zero Python:

# Run on a built-in dataset
auditorai run --data breast_cancer --report

# Run on your own CSV (last column = label)
auditorai run --data mydata.csv --model-type gradient_boosting

# Sweep thresholds to find the best one
auditorai sweep --data breast_cancer --steps 20

# Validate a saved auditor on new data
auditorai validate --adapter-path outputs/models --data breast_cancer

# All options
auditorai run --help

What you get

1. Evaluation report (printed to terminal)

==================================================
  AUDITOR SYSTEM - EVALUATION REPORT
==================================================
  AI-only accuracy:         94.7%
  Joint system accuracy:    94.2%
  Auditor AUROC:            0.512
  Suppression rate:         1.8%
  Cases shown:              112
  Cases suppressed:         2
  Auditor precision:        50.0%
  Auditor recall:           16.7%
==================================================

2. Prediction dict (in Python)

result = system.predict(X_test)

# For each sample in X_test:
result["show_mask"]       # bool array — True means show this prediction
result["suppress_mask"]   # bool array — True means suppress this prediction
result["p_wrong"]         # float array — probability this prediction is wrong
result["ai_predictions"]  # the actual predicted class labels

3. Plots saved to outputs/

File What it shows
score_dist.png Auditor score distribution (correct vs. wrong predictions)
threshold_sweep.png Accuracy gain vs. suppression threshold curve
breakdown.png Shown vs. suppressed × correct vs. error breakdown

Supported models

Model type Adapter Extra install
Any sklearn model SklearnAdapter (included)
XGBoost / LightGBM SklearnAdapter (included)
PyTorch nn.Module PyTorchAdapter pip install "auditorai[pytorch]"
HuggingFace pipeline HuggingFaceAdapter pip install "auditorai[hf]"
OpenAI (GPT-4o etc) APIAdapter pip install "auditorai[openai]"
Anthropic (Claude) APIAdapter pip install "auditorai[anthropic]"
Any custom model Subclass ModelAdapter (included)

Custom adapter pattern

from auditorai import ModelAdapter, AuditorSystem
import numpy as np

class MyAdapter(ModelAdapter):
    def __init__(self, model):
        self.model = model

    def predict(self, X) -> np.ndarray:
        return self.model.my_predict(X)

    def predict_proba(self, X) -> np.ndarray:
        scores = self.model.my_scores(X)
        return np.column_stack([1 - scores, scores])  # must sum to 1.0

system = AuditorSystem(MyAdapter(my_model))
system.train(X_val, y_val)

FAQ

Q: Does my model need to support predict_proba? No. sklearn models without predict_proba (like SVC) are automatically wrapped with CalibratedClassifierCV to produce calibrated probabilities. No extra code needed.

Q: What data do I pass to system.train()? Validation data — data your primary model has NOT trained on. This is critical. Passing training data produces an unreliable auditor that can't detect real errors.

Q: What does suppress_mask=True mean for my application? It means AuditorAI is not confident in that prediction. What you do with it is up to you — show a warning, route to a human reviewer, or request more information from the user.

Q: How do I choose the threshold? Use system.auto_tune(X_val, y_val) and it picks the threshold that maximizes joint accuracy automatically. Or use auditorai sweep from the CLI to see the full tradeoff curve and pick manually.

Q: Will this work on text / images / tabular data? Yes. The auditor works on your model's probability outputs, not the raw inputs. As long as your adapter returns valid probabilities (rows sum to 1.0), the data type does not matter.


Project structure

auditorai/
├── adapters/
│   ├── base.py                ← ModelAdapter ABC + wrap() function
│   ├── sklearn_adapter.py     ← wraps any sklearn model
│   ├── pytorch_adapter.py     ← wraps PyTorch nn.Module
│   ├── huggingface_adapter.py ← wraps HF pipelines and models
│   └── api_adapter.py         ← wraps OpenAI / Anthropic / custom HTTP
├── core/
│   ├── auditor.py             ← AuditorModel: trains on primary errors
│   ├── router.py              ← threshold sweep and routing logic
│   ├── system.py              ← AuditorSystem: main entry point
│   └── evaluate.py            ← reports and plots
├── cli/
│   └── main.py                ← auditorai run / sweep / validate
└── utils/
    ├── data.py                ← load_any() smart data loader
    └── logging.py             ← shared logger

Contributing

git clone https://github.com/Apurva0614/Auditorai.git
cd Auditorai
pip install -e ".[dev]"
pytest tests/ -v

See CONTRIBUTING.md for full guidelines.


Research

This implementation is based on the auditor model framework for human-AI decision systems. The core idea — training a second model to predict when the primary AI is wrong, then suppressing those predictions to let a human decide — was formalized in Auditor Models for Efficient Human-AI Collaboration (De-Arteaga, M. et al., 2025, medRxiv). AuditorAI makes this research practical by providing a drop-in library that works with any ML framework.


License

MIT — use it freely, commercially or otherwise.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

auditorai-0.2.0.tar.gz (37.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

auditorai-0.2.0-py3-none-any.whl (37.6 kB view details)

Uploaded Python 3

File details

Details for the file auditorai-0.2.0.tar.gz.

File metadata

  • Download URL: auditorai-0.2.0.tar.gz
  • Upload date:
  • Size: 37.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for auditorai-0.2.0.tar.gz
Algorithm Hash digest
SHA256 71f40f12f29960adbfaa8d9ff769c161c97bfa90f0c6e8ec92158a0fb10f99e9
MD5 c29b23ac56cbedc74d567d829d8fd60d
BLAKE2b-256 a37bbcf0efbb77ebe120b5ec81dbd07424c0da2ba2b068d744328c6d283085d6

See more details on using hashes here.

Provenance

The following attestation bundles were made for auditorai-0.2.0.tar.gz:

Publisher: publish.yml on Apurva0614/Auditorai

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file auditorai-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: auditorai-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 37.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for auditorai-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 94f88a595abb7d1aa83916639aa8ffb524b18d398bfa23a62a5f516998579ce9
MD5 0123df404345be7794a2ce31a124f46c
BLAKE2b-256 6dff7e73bd56bf466a0b32d90321ba91d91c45ba39b9614eab200002d7d05c46

See more details on using hashes here.

Provenance

The following attestation bundles were made for auditorai-0.2.0-py3-none-any.whl:

Publisher: publish.yml on Apurva0614/Auditorai

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page