Audit ML models beyond accuracy — calibration, fairness, latent health, and deployment verdicts.
Project description
TrustLens
Audit ML models beyond accuracy — calibration, fairness, latent health, and deployment verdicts.
Quickstart · How It Works · Demo Video · Docs · Project Showcase
Your model has 92% accuracy. It's still not safe for deployment.
Accuracy measures what went right. TrustLens measures what can go wrong — in production, on subgroups, and at high confidence.
Why TrustLens
Standard evaluation stops at accuracy. Silent failures happen when:
- A model is overconfident — "90% sure" but right only 60% of the time
- Performance collapses on subgroups — gender, age, or region hidden inside a good aggregate score
- The model is confidently wrong — high-confidence errors that indicate systemic risk
- Latent representations overlap — classes bleed together where the model can't tell them apart
TrustLens surfaces all four with a single audit, and outputs a machine-readable deployment verdict.
Supported Frameworks
TrustLens uses a Prediction Resolver Architecture to automatically handle different ML frameworks:
- scikit-learn — Full support for all
ClassifierMixinestimators. - XGBoost — Native support for
XGBClassifierand rawBoosterobjects. - Planned — LightGBM, CatBoost, PyTorch, TensorFlow/Keras.
TrustLens automatically detects your model's framework. You don't need to change your code when switching from sklearn to XGBoost.
Quickstart
pip install trustlens
# Extended visualization support
pip install trustlens[full]
Run a one-line audit to see why 94% accuracy isn't the full story:
from trustlens import quick_analyze
quick_analyze(dataset="breast_cancer")
TRUST SCORE: 68/100 [D]
Assessment : Low Trust — Blocked by high diagnostic risk
Base Score : 76
Penalties Applied : -7.7 (Failure Risk)
Final Score : 68
→ Model shows high failure risk and is NOT ready for deployment.
How It Works
TrustLens runs four diagnostic modules and combines them into a single Trust Score (0–100) with a CI/CD-ready deployment verdict.
| Module | What It Catches |
|---|---|
| Calibration | Confidence vs. correctness mismatch, overconfidence, ECE |
| Fairness | Subgroup performance gaps, equalized-odds violations |
| Representation | Latent space health, class separation, overlap detection |
| Decision Engine | Composite Trust Score + Ready / Blocked verdict |
Full Audit
Automatic Detection (Sklearn / XGBoost)
from trustlens import analyze
from xgboost import XGBClassifier
model = XGBClassifier().fit(X_train, y_train)
# TrustLens automatically detects XGBoost and resolves predictions
report = analyze(
model=model,
X=X_test,
y_true=y_test,
sensitive_features={"gender": gender_test}
)
report.show()
Manual Prediction Override
For external inference systems or unsupported frameworks, you can pass predictions directly:
report = analyze(
model=None, # optional when passing y_pred/y_prob
X=X_test,
y_true=y_test,
y_pred=external_preds,
y_prob=external_probs
)
Audit Metadata & Provenance
Every report tracks its own backend provenance for auditability:
print(report.metadata["framework"]) # "xgboost"
print(report.metadata["backend"]) # {'resolver': 'xgboost', 'framework_version': '2.0.3', ...}
Save & Export
# Save as a unified JSON artifact (best for experiment trackers)
report.save("report.json")
# Save as a full directory bundle (best for human review)
report.save("trust_report/")
Output artifacts (Directory Bundle)
trust_report/
├── trust_score.json ← deployment verdict & composite score
├── report.json ← raw diagnostic metrics
├── metadata.json ← environment, version, backend provenance
├── report.txt ← human-readable summary
└── visuals/ ← per-module diagnostic plots (PNG)
CI/CD gating
Gate model promotion on trust_score.json — no custom scripting needed:
{
"score": 68,
"grade": "D",
"verdict": "Low Trust — Blocked by high failure risk",
"is_blocked": true
}
Diagnostics in Practice
|
Calibration Does confidence align with correctness? |
Fairness & Bias Are subgroups treated equally? |
|
Latent Space Health Is class separation clean? |
Deployment Verdict Is this model safe to ship? |
Demo
15-minute walkthrough: diagnostics, trust scoring, fairness analysis, and visual dashboards.
Want a deeper look at the architecture and design decisions? → Interactive Project Showcase
Run the Full Demo
python demo.py
Generates multi-model comparisons, fairness deep-dives, latent space projections, JSON audits, and visual dashboards across all modules.
Contributing
All contributions welcome — new metrics, diagnostic plugins, and visualizations.
→ Contributing Guide · Open an Issue · Docs
Citation
@software{trustlens2026,
author = {Shahid Ul Islam},
title = {TrustLens: Audit ML models beyond accuracy},
year = {2026},
url = {https://github.com/Khanz9664/TrustLens}
}
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file trustlens-0.4.0.tar.gz.
File metadata
- Download URL: trustlens-0.4.0.tar.gz
- Upload date:
- Size: 88.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
713b74c781fe7bd8b15578fb3ffa5b63d5cfc10999677742adbd81069b6c1f4a
|
|
| MD5 |
c476b0413c1148893d5e1d2bc1b078a4
|
|
| BLAKE2b-256 |
33d9c4bc0f8791ab39c58cfbec0cf30485d37deb55eb8ff466b0e4d8e7b634c5
|
File details
Details for the file trustlens-0.4.0-py3-none-any.whl.
File metadata
- Download URL: trustlens-0.4.0-py3-none-any.whl
- Upload date:
- Size: 78.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
86eec860bacbcf86da0e3f4f2889f9752356655b14c3f8f0854ab13178027687
|
|
| MD5 |
a6f819deb8a7f432df47e52de512e862
|
|
| BLAKE2b-256 |
7e84cb0559eb1053f530a91403ef0f4bfb8d53afc39847e14e2074d0f6ead985
|