Skip to main content

Audit ML models beyond accuracy — calibration, fairness, latent health, and deployment verdicts.

Project description

TrustLens

Audit ML models beyond accuracy — calibration, fairness, latent health, and deployment verdicts.


PyPI Downloads CI Coverage License: MIT Tests


Quickstart · How It Works · Demo Video · Docs · Project Showcase


Your model has 92% accuracy. It's still not safe for deployment.

Accuracy measures what went right. TrustLens measures what can go wrong — in production, on subgroups, and at high confidence.


Why TrustLens

Standard evaluation stops at accuracy. Silent failures happen when:

  • A model is overconfident — "90% sure" but right only 60% of the time
  • Performance collapses on subgroups — gender, age, or region hidden inside a good aggregate score
  • The model is confidently wrong — high-confidence errors that indicate systemic risk
  • Latent representations overlap — classes bleed together where the model can't tell them apart

TrustLens surfaces all four with a single audit, and outputs a machine-readable deployment verdict.


Supported Frameworks

TrustLens uses a Prediction Resolver Architecture to automatically handle different ML frameworks:

  • scikit-learn — Full support for all ClassifierMixin estimators.
  • XGBoost — Native support for XGBClassifier and raw Booster objects.
  • Planned — LightGBM, CatBoost, PyTorch, TensorFlow/Keras.

TrustLens automatically detects your model's framework. You don't need to change your code when switching from sklearn to XGBoost.


Quickstart

pip install trustlens
# Extended visualization support
pip install trustlens[full]

Run a one-line audit to see why 94% accuracy isn't the full story:

from trustlens import quick_analyze

quick_analyze(dataset="breast_cancer")
TRUST SCORE: 68/100 [D]
Assessment : Low Trust — Blocked by high diagnostic risk

  Base Score        : 76
  Penalties Applied : -7.7 (Failure Risk)
  Final Score       : 68

→ Model shows high failure risk and is NOT ready for deployment.

How It Works

TrustLens runs four diagnostic modules and combines them into a single Trust Score (0–100) with a CI/CD-ready deployment verdict.

Module What It Catches
Calibration Confidence vs. correctness mismatch, overconfidence, ECE
Fairness Subgroup performance gaps, equalized-odds violations
Representation Latent space health, class separation, overlap detection
Decision Engine Composite Trust Score + Ready / Blocked verdict

Full Audit

Automatic Detection (Sklearn / XGBoost)

from trustlens import analyze
from xgboost import XGBClassifier

model = XGBClassifier().fit(X_train, y_train)

# TrustLens automatically detects XGBoost and resolves predictions
report = analyze(
    model=model,
    X=X_test,
    y_true=y_test,
    sensitive_features={"gender": gender_test}
)

report.show()

Manual Prediction Override

For external inference systems or unsupported frameworks, you can pass predictions directly:

report = analyze(
    model=None, # optional when passing y_pred/y_prob
    X=X_test,
    y_true=y_test,
    y_pred=external_preds,
    y_prob=external_probs
)

Audit Metadata & Provenance

Every report tracks its own backend provenance for auditability:

print(report.metadata["framework"])  # "xgboost"
print(report.metadata["backend"])    # {'resolver': 'xgboost', 'framework_version': '2.0.3', ...}

Save & Export

# Save as a unified JSON artifact (best for experiment trackers)
report.save("report.json")

# Save as a full directory bundle (best for human review)
report.save("trust_report/")

Output artifacts (Directory Bundle)

trust_report/
├── trust_score.json    ← deployment verdict & composite score
├── report.json         ← raw diagnostic metrics
├── metadata.json       ← environment, version, backend provenance
├── report.txt          ← human-readable summary
└── visuals/            ← per-module diagnostic plots (PNG)

CI/CD gating

Gate model promotion on trust_score.json — no custom scripting needed:

{
  "score": 68,
  "grade": "D",
  "verdict": "Low Trust — Blocked by high failure risk",
  "is_blocked": true
}

Diagnostics in Practice

Calibration

Does confidence align with correctness?
Fairness & Bias

Are subgroups treated equally?
Latent Space Health

Is class separation clean?
Deployment Verdict

Is this model safe to ship?

Demo

Watch the demo

15-minute walkthrough: diagnostics, trust scoring, fairness analysis, and visual dashboards.

Want a deeper look at the architecture and design decisions? → Interactive Project Showcase


Run the Full Demo

python demo.py

Generates multi-model comparisons, fairness deep-dives, latent space projections, JSON audits, and visual dashboards across all modules.


Contributing

All contributions welcome — new metrics, diagnostic plugins, and visualizations.

Contributing Guide · Open an Issue · Docs



Citation

@software{trustlens2026,
  author = {Shahid Ul Islam},
  title  = {TrustLens: Audit ML models beyond accuracy},
  year   = {2026},
  url    = {https://github.com/Khanz9664/TrustLens}
}

Built by Shahid Ul Islam  ·  Portfolio  ·  LinkedIn

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

trustlens-0.4.0.tar.gz (88.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

trustlens-0.4.0-py3-none-any.whl (78.6 kB view details)

Uploaded Python 3

File details

Details for the file trustlens-0.4.0.tar.gz.

File metadata

  • Download URL: trustlens-0.4.0.tar.gz
  • Upload date:
  • Size: 88.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for trustlens-0.4.0.tar.gz
Algorithm Hash digest
SHA256 713b74c781fe7bd8b15578fb3ffa5b63d5cfc10999677742adbd81069b6c1f4a
MD5 c476b0413c1148893d5e1d2bc1b078a4
BLAKE2b-256 33d9c4bc0f8791ab39c58cfbec0cf30485d37deb55eb8ff466b0e4d8e7b634c5

See more details on using hashes here.

File details

Details for the file trustlens-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: trustlens-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 78.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for trustlens-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 86eec860bacbcf86da0e3f4f2889f9752356655b14c3f8f0854ab13178027687
MD5 a6f819deb8a7f432df47e52de512e862
BLAKE2b-256 7e84cb0559eb1053f530a91403ef0f4bfb8d53afc39847e14e2074d0f6ead985

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page