Conformal Model Moderation & Human-in-the-Loop Routing Python Library

These details have not been verified by PyPI

Intended Audience
- Developers
- Science/Research
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3
Topic
- Scientific/Engineering :: Artificial Intelligence

Project description

commCP 🛡️

commCP (Conformal Model Moderation & Human-in-the-Loop Routing) is a post-training wrapper for binary classification estimators. It combines Conformal Prediction (to enforce statistical reliability guarantees) and LLM Refereeing to decide when a prediction can be auto-accepted vs. when it should be escalated for human review.

Inspired by stats-centric tools like MAPIE, commCP bridges the gap between statistical guarantees and LLM verification for the AI era.

Features

Statistical Coverage Guarantees: Enforces target error rates ($1 - \alpha$) via conformal calibration.
Selective Prediction / HITL: Automatically routes predictions into auto_decided or escalated queues.
LLM-as-a-Referee: Mediates ensemble disagreements and conformal "gray-zone" uncertainties dynamically.
Cost-Optimized: Bypasses the LLM completely for obvious acceptances or low-confidence/high-risk rejections, keeping API costs to a minimum.
Seamless sklearn Compatibility: Works with any estimator exposing a predict_proba method (e.g., LogisticRegression, RandomForest, XGBoost).

Installation

# Install from source (or PyPI once published)
pip install .

How to Prepare Your Data

Conformal prediction requires a held-out calibration set that the model was not exposed to during training. Before wrapping your model, split your dataset into three distinct partitions:

Training Set (e.g., 60% of data): Used to train your base classifier (e.g. RandomForest).
Calibration Set (e.g., 20% of data): Used by commCP to calculate safety thresholds. Crucial: Calibrating on the training set violates statistical guarantees.
Test Set (e.g., 20% of data): Used for incoming predictions and routing.

Quick Start Guide

1. Train Your Classifier

from sklearn.ensemble import RandomForestClassifier
from commcp import CommCP

# Train a standard sklearn classifier
model = RandomForestClassifier()
model.fit(X_train, y_train)

2. Wrap and Calibrate commCP

Initialize CommCP with your trained estimator, along with a task description and class labels (which are required to build high-accuracy prompts for the LLM Referee). Pass a held-out calibration set to establish the conformal threshold.

# Initialize commcp wrapper (configured for significance level alpha=0.05 -> 95% coverage)
ccp = CommCP(
    estimator=model,
    task_description="Predict whether a patient has heart disease based on clinical features",
    class_labels={0: "Healthy", 1: "Heart Disease Present"},
    alpha=0.05,
    llm_provider="groq", # Supports "groq" or "openai"
    verify_margin=0.15   # Trigger LLM verification on predictions within 15% of the threshold
)

# Calibrate
ccp.calibrate(X_calib, y_calib)

3. Predict & Moderate

Predict outcomes for test data. CommCP will execute conformal gating, query the LLM referee on borderline cases, and partition predictions.

# Run predictions
results = ccp.predict(
    X_test, 
    text_dossiers=text_descriptions # Optional natural language dossiers for LLM inspection
)

# Get automation and routing results
print(f"Automation rate: {results.automation_rate:.2%}")

# Access lists of auto-decided and escalated records
auto_cases = results.auto_decided  # list of dicts
human_queue = results.escalated    # list of dicts

Understanding the Results Structure

Each record inside results.auto_decided and results.escalated is a dictionary with the following schema:

sample_index (int): The index of the sample in the test dataset.
model_prediction (int): The raw output prediction of your base classifier.
confidence (float): The probability score assigned to the predicted class by the model.
route (str): The final decision route. It will be:
- "ACCEPT": Auto-accepted directly by Conformal Prediction (high confidence).
- "LLM_VERIFIED": Evaluated by the LLM Referee (due to borderline confidence or ensemble disagreement) and the LLM agreed with the classifier.
- "HUMAN_REVIEW": Escalated to a human reviewer (due to low confidence or LLM disagreement).
route_reason (str): A detailed description explaining why this route was selected.
llm_prediction (int or None): The decision made by the LLM Referee (0 or 1 if called, otherwise None).
llm_reasoning (str or None): The short reasoning sentence written by the LLM Referee (if called).
final_prediction (int or None): The final automated prediction value if automated. If escalated to a human, this is None (requiring human resolution).

How Statistics are Calculated

When you call results.stats(y_true=y_test), the library calculates the following metrics under the hood:

Automation Rate:

Automation Rate = (ACCEPT Cases + LLM_VERIFIED Cases) / Total Samples

Empirical Conformal Coverage: The accuracy of the automated decisions against the true labels. Under conformal prediction, this is mathematically guaranteed to be >= 1 - alpha.
Human-in-the-Loop (HITL) System Accuracy:
```
HITL System Accuracy = (Correct Automated Predictions + Total Human Escalated Cases) / Total Samples
```
(Note: This calculation assumes the human reviewer acts as a ground-truth oracle and corrects any escalated case to the right label).

4. Evaluate Guarantees

Verify if your target mathematical coverage guarantee was met:

empirical_coverage = results.coverage(y_test)
print(f"Empirical Coverage: {empirical_coverage:.2%}") # Should be >= 95%

Examine system performance details:

print(results.stats(y_test))

Customizing Gating Logic

CommCP dynamically adjusts its gating based on your model architecture:

Single Models: Uses Gray-Zone Gating. Calls the LLM referee only when confidence is close but below the conformal cutoff.
Ensembles: Uses Disagreement Gating. Automatically inspects ensemble consensus and calls the LLM to referee conflicting model predictions.

License

Licensed under the MIT License.

Project details

These details have not been verified by PyPI

Intended Audience
- Developers
- Science/Research
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3
Topic
- Scientific/Engineering :: Artificial Intelligence

Release history Release notifications | RSS feed

1.0.8

Jun 12, 2026

1.0.7

Jun 12, 2026

1.0.6

Jun 11, 2026

1.0.5

Jun 11, 2026

1.0.4

Jun 11, 2026

1.0.3

Jun 11, 2026

1.0.2

Jun 11, 2026

This version

1.0.1

Jun 11, 2026

1.0.0

Jun 11, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

commcp-1.0.1.tar.gz (14.9 kB view details)

Uploaded Jun 11, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

commcp-1.0.1-py3-none-any.whl (13.3 kB view details)

Uploaded Jun 11, 2026 Python 3

File details

Details for the file commcp-1.0.1.tar.gz.

File metadata

Download URL: commcp-1.0.1.tar.gz
Upload date: Jun 11, 2026
Size: 14.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for commcp-1.0.1.tar.gz
Algorithm	Hash digest
SHA256	`1b23bc61bb3ea5fb2a2f7af7a5396111b128d33c17bb454ff7d83476ca305c2e`
MD5	`55a0accf7efbac13561b966a6089e68a`
BLAKE2b-256	`4504969b95837a3253981507386de58b1aabc7f7e78d55efe518104effc630ff`

See more details on using hashes here.

File details

Details for the file commcp-1.0.1-py3-none-any.whl.

File metadata

Download URL: commcp-1.0.1-py3-none-any.whl
Upload date: Jun 11, 2026
Size: 13.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for commcp-1.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ef648fe140ec74b048e89bfd336c0e1b1ec447bae20c4444fa8ca87b32544e2f`
MD5	`1d8a5bcecf0e1afffb08b7fe1b8b32cb`
BLAKE2b-256	`870bd1bab775bebefa68f8231a319bc0b9e11e202ae143c2e0b300bd352b3386`

See more details on using hashes here.

commcp 1.0.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

commCP 🛡️

Features

Installation

How to Prepare Your Data

Quick Start Guide

1. Train Your Classifier

2. Wrap and Calibrate commCP

3. Predict & Moderate

Understanding the Results Structure

How Statistics are Calculated

4. Evaluate Guarantees

Customizing Gating Logic

License

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes