Debug your ML models beyond accuracy.

These details have not been verified by PyPI

Project links

Project description

TrustLens

Your model has 92% accuracy. But can you trust it?

TrustLens is the open-source library that answers the questions accuracy never does.

The Problem Nobody Talks About

You trained a model. It hits 92% accuracy on your validation set.

So you ship it.

Three months later:

A minority-class user gets consistently wrong predictions
The model is 90% confident on its worst mistakes
A regulator asks "why did it make that decision?" and you have no answer

Accuracy tells you how often your model is right. It tells you nothing about when it fails, why it fails, or who it fails.

TrustLens fixes that. In one function call.

Quick Analyze (Zero-Friction Start)

Try TrustLens instantly without bringing your own data or models. We provide a zero-friction entry point:

from trustlens import quick_analyze

# Automatically loads the breast cancer dataset, trains a baseline logic model,
# and runs the full analysis, returning a TrustReport and rendering the dashboard.
report = quick_analyze(dataset="breast_cancer")

Quick Usage with Custom Models

pip install trustlens

from trustlens import analyze

report = analyze(
  model,     # any sklearn-compatible model
  X_val,     # validation features
  y_val,     # ground truth
  y_prob=proba,  # predicted probabilities
)

print(report.trust_score)
report.show()

Output Insight:

==================================================================
  TrustLens Report
==================================================================
 Timestamp : 2026-04-16T15:43:02Z
 Model   : RandomForestClassifier
 Samples  : 2,500
 Classes  : 2

==================================================================
 TRUST SCORE: 61/100 [B]
 Assessment: Good Trust - minor issues to address
==================================================================

 Key Observations:
  * Calibration needs improvement (ECE > 0.1).
  * Model is overconfident on incorrect predictions (low confidence gap).

==================================================================
 Dimension breakdown:
  calibration    52.3/100 []
  failure      74.1/100 []
  bias        41.2/100 []
  representation   68.5/100 []

Your calibration is fine. Your bias score is not. TrustLens just saved you a PR disaster.

The Summary Dashboard

One line. One picture. Everything you need.

report.summary_plot()

The presentation-ready 6-panel dashboard shows:

Trust Score gauge: Your model's overall trustworthiness at a glance
Reliability diagram: Is your model overconfident or underconfident?
Confidence gap: Does high confidence actually mean high accuracy?
Error rate by class: Which classes are being failed?
Class distribution: Is your training data biased?
Sub-score breakdown: Which dimension needs the most work?

The Trust Score

A single, actionable number: 0 to 100.

Computed from four dimensions, each independently interpretable:

Dimension	What it measures	Weight
Calibration	Do probabilities reflect reality?	35%
Failure	Does confidence correlate with accuracy?	30%
Bias	Are all groups treated equally?	25%
Representation	Is the embedding space well-structured?	10%

Score	Grade	Recommendation
80-100	A	Production-ready
60-79	B	Good - fix flagged issues first
40-59	C	Investigate before deployment
0-39	D	Do not deploy

The Failure Showcase

Find your model's most dangerous mistakes in 1 line:

report.show_failures(top_k=5)

Output:

==================================================================
 TOP 5 CRITICAL FAILURES
 GradientBoostingClassifier | 58 total errors / 700 samples (8.3%)
==================================================================
 #  Sample  True  Pred  Confidence  Danger
 ------------------------------------------------------
 1  412     1   0    97.4%  CRITICAL
 2  88     0   1    95.1%  CRITICAL
 3  301     1   0    91.8%  HIGH
 4  556     0   1    89.2%  HIGH

 Insights:
   Mean confidence on top failures: 93.4%
   These are high-confidence mistakes - the model is
   certain it is right, but it is wrong.
   Overconfidence detected - consider calibration.

Real-World Use Cases

Medical AI

A diagnostic model with 94% accuracy has an ECE of 0.18 - dangerously overconfident on edge cases. TrustLens surfaces it before deployment.

Fraud Detection

Your model's confidence gap is 0.04 - it's almost as confident on fraud it misses as on fraud it catches. That's your false-negative problem, quantified.

Hiring, Loan, and Insurance

Subgroup analysis reveals a 23% accuracy gap between applicant demographics. You have a fairness problem. Now you know before a regulator tells you.

Research

Use CKA to compare representation quality across model architectures. Use faithfulness testing to benchmark explanation methods honestly.

Repository Structure

TrustLens/
├── assets/
│   ├── banner.png
│   └── logo.png
├── docs/
│   ├── DESIGN_PRINCIPLES.md
│   ├── FUTURE_EXTENSIONS.md
│   ├── GITHUB_ISSUES.md
│   ├── POSITIONING.md
│   └── REWRITTEN_ISSUES.md
├── examples/
│   ├── calibration_deep_dive.py
│   ├── cnn_vs_vit_trustlens.py
│   ├── custom_plugin_demo.py
│   ├── quickstart.py
│   └── trustlens_demo.ipynb
├── .github/workflows/
│   └── ci.yml
├── tests/
│   ├── test_api.py
│   ├── test_bias.py
│   ├── test_calibration.py
│   ├── test_failure.py
│   ├── test_output_formatting.py
│   ├── test_plugins.py
│   ├── test_representation.py
│   └── test_trust_score.py
├── trustlens/
│   ├── explainability/
│   │   ├── faithfulness.py
│   │   └── gradcam.py
│   ├── metrics/
│   │   ├── bias.py
│   │   ├── calibration.py
│   │   ├── failure.py
│   │   ├── faithfulness.py
│   │   └── representation.py
│   ├── plugins/
│   │   ├── base.py
│   │   └── registry.py
│   ├── visualization/
│   │   ├── bias_plots.py
│   │   ├── calibration_plots.py
│   │   ├── failure_plots.py
│   │   ├── representation_plots.py
│   │   └── summary_plot.py
│   ├── api.py
│   ├── report.py
│   ├── trust_score.py
│   └── utils.py
├── CHANGELOG.md
├── CONTRIBUTING.md
├── LICENSE
├── Makefile
├── pyproject.toml
├── README.md
├── requirements.txt
└── ROADMAP.md

Contributing

TrustLens is designed to grow with the community. Adding a new metric takes just four simple steps:

Testing Policy: Current test coverage is 67% to ensure core stability. It will be incrementally improved toward 85%+ as advanced modules (e.g., explainability, visualization) receive additional tests. All new contributions must maintain or improve this baseline.

Write a pure function my_metric(y_true, y_pred) -> float
Add it to the appropriate module (metrics/calibration.py, etc.)
Export it from metrics/__init__.py
Write a test in tests/test_<module>.py

See CONTRIBUTING.md for the full guide including instructions on adding plugins and explainability methods. Review docs/GITHUB_ISSUES.md for open tasks ready to be developed.

Citation

@software{trustlens2026,
 author = {Shahid Ul Islam},
 title = {TrustLens: Debug your ML models beyond accuracy},
 year  = {2026},
 url  = {https://github.com/Khanz9664/TrustLens},
}

Author & Maintainer

Shahid Ul Islam

GitHub: Khanz9664
Portfolio: Visit Portfolio
LinkedIn: Connect on LinkedIn
Instagram: Follow on Instagram

If TrustLens saved you from a bad deployment, star it. It helps other engineers find it before they make the same mistake.

GitHub | Portfolio | LinkedIn | Discussions

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.2

Apr 16, 2026

This version

0.1.1

Apr 16, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

trustlens-0.1.1.tar.gz (54.3 kB view details)

Uploaded Apr 16, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

trustlens-0.1.1-py3-none-any.whl (52.9 kB view details)

Uploaded Apr 16, 2026 Python 3

File details

Details for the file trustlens-0.1.1.tar.gz.

File metadata

Download URL: trustlens-0.1.1.tar.gz
Upload date: Apr 16, 2026
Size: 54.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for trustlens-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`a29572c781dc362f2de7d0738e5b260717997af6e2d7924446df58e9a46449df`
MD5	`0ff3775d8bf938618de2aa92293aadb5`
BLAKE2b-256	`a8e93136fef53fdd9083adc5d294fb3e5ba2fd66c41178ea716614793fb22555`

See more details on using hashes here.

File details

Details for the file trustlens-0.1.1-py3-none-any.whl.

File metadata

Download URL: trustlens-0.1.1-py3-none-any.whl
Upload date: Apr 16, 2026
Size: 52.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for trustlens-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`6f627b9b8cd3a8bc609c41910d91d07c680f4b63513cb14e17c91c943c15ce25`
MD5	`530e2c9b9311306d308f8f0c281620ca`
BLAKE2b-256	`866d365c6fee84dbf5f767162ece576b9720216f7c5428b332038cd7c7b6a857`

See more details on using hashes here.

trustlens 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

TrustLens

Your model has 92% accuracy. But can you trust it?

The Problem Nobody Talks About

Quick Analyze (Zero-Friction Start)

Quick Usage with Custom Models

The Summary Dashboard

The Trust Score

The Failure Showcase

Real-World Use Cases

Medical AI

Fraud Detection

Hiring, Loan, and Insurance

Research

Repository Structure

Contributing

Citation

Author & Maintainer

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes