Skip to main content

The trust and evaluation layer for AI systems.

Project description

TruthLens

The trust and evaluation layer for AI systems.

PyPI version Python 3.10+ License: MIT

TruthLens automatically scores any AI response for groundedness, faithfulness, hallucination risk, and overall trustworthiness — in 2 lines of code.


Install

pip install truthlens

Requires Ollama for local evaluation (free), or any LLM API key.


Quick Start

from truthlens import TruthLens

# Local — no API key needed
tl = TruthLens(provider="ollama", model="llama3")
response = tl.chat(
    "Who created Python?",
    sources=["Python was created by Guido van Rossum, first released in 1991."]
)

print(response.content)             # AI's answer
print(response.trust_score)         # 95.0
print(response.hallucination_risk)  # Low
print(response.summary())
# [TruthLens] Trust: 95/100 | Ground: 96% | Faith: 94% | Hallucination: ✓ Low

Supported Providers

# OpenAI
tl = TruthLens(provider="openai",     model="gpt-4o",              api_key="sk-...")
# Anthropic
tl = TruthLens(provider="anthropic",  model="claude-sonnet-4-6",   api_key="sk-ant-...")
# Gemini
tl = TruthLens(provider="gemini",     model="gemini-1.5-flash",    api_key="AI...")
# Local Ollama (free)
tl = TruthLens(provider="ollama",     model="llama3")

What Gets Scored

Metric Description
Trust Score Composite 0–100
Groundedness How strongly the answer is backed by sources
Faithfulness Whether the answer accurately reflects sources
Citation Accuracy Whether references are valid
Hallucination Risk Low / Medium / High

Claim-Level Verification

tl = TruthLens(provider="ollama", model="llama3", include_claims=True)
response = tl.chat("Tell me about Python", sources=["..."])

for claim in response.claims:
    print(f"{claim['verdict']}: {claim['text']}")
# Supported:    Python was created by Guido van Rossum
# Unsupported:  Python was initially called ABC

RAG Evaluation

from truthlens import evaluate_rag

report = evaluate_rag(
    question="What is photosynthesis?",
    answer=generated_answer,
    retrieved_chunks=your_chunks,
)
print(report.rag_score)  # 90.1

Benchmark Runner

from truthlens import run_benchmark, generate_sample_dataset

report = run_benchmark(generate_sample_dataset(), model="llama3")
print(report.stats.avg_trust_score)  # 87.3

REST API (Proxy Server)

truthlens proxy

Then from any language:

const r = await fetch("http://localhost:8001/chat", {
  method: "POST",
  body: JSON.stringify({
    provider: "openai", model: "gpt-4o", api_key: "sk-...",
    messages: [{ role: "user", content: "Who created Python?" }],
    sources: ["Python was created by Guido van Rossum in 1991."]
  })
})
const data = await r.json()
console.log(data.trust_score)  // 95.0

Dashboard

truthlens start

Opens at http://localhost:5173 — 10 pages covering evaluate, claims, RAG, agent, benchmark, leaderboard, proxy analytics, and paper generation.


CLI

truthlens setup                          # check dependencies
truthlens start                          # start everything + open browser
truthlens proxy                          # start proxy server
truthlens evaluate -q "..." -a "..." -s "..."   # evaluate from terminal
truthlens benchmark --sample             # run built-in benchmark

Research

TruthLens is designed to support AI trustworthiness research.

Research questions:

  • RQ1: Can multi-metric evaluation predict factual reliability better than single metrics?
  • RQ2: Does claim-level verification correlate with human judgments?
  • RQ3: How does hallucination rate vary across domains?
  • RQ4: Does model size correlate with trust scores?

See paper/PAPER.md for the research paper draft.


Project Structure

truthlens/
├── truthlens/          ← Core evaluation library
│   ├── evaluator.py    ← 5-metric trust scoring
│   ├── claims.py       ← Claim-level verification
│   ├── rag.py          ← RAG pipeline evaluation
│   ├── agent.py        ← Agent trace evaluation
│   ├── benchmark.py    ← Benchmark runner
│   ├── leaderboard.py  ← Multi-model leaderboard
│   └── paper_generator.py ← Research paper generator
├── proxy/              ← Middleware proxy layer
│   ├── sdk.py          ← Python SDK (2-line integration)
│   ├── server.py       ← FastAPI proxy (port 8001)
│   ├── providers.py    ← OpenAI/Anthropic/Gemini/Ollama
│   └── database.py     ← SQLite evaluation logs
├── api/                ← Main API server (port 8000)
├── dashboard/          ← React dashboard (port 5173)
├── tests/              ← 36 unit tests
├── paper/              ← Research paper draft
├── start.bat           ← Windows: one-click start
├── start.sh            ← Mac/Linux: one-click start
└── GETTING_STARTED.md  ← Full setup guide

Contributing

  1. Fork the repo
  2. Create your branch: git checkout -b feature/my-feature
  3. Commit: git commit -m "Add my feature"
  4. Push: git push origin feature/my-feature
  5. Open a Pull Request

License

MIT © TruthLens Contributors

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

truthlens_ai-1.0.0.tar.gz (42.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

truthlens_ai-1.0.0-py3-none-any.whl (44.0 kB view details)

Uploaded Python 3

File details

Details for the file truthlens_ai-1.0.0.tar.gz.

File metadata

  • Download URL: truthlens_ai-1.0.0.tar.gz
  • Upload date:
  • Size: 42.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.7

File hashes

Hashes for truthlens_ai-1.0.0.tar.gz
Algorithm Hash digest
SHA256 e2fd09c6a966c4e65c4a30f3f4e99a4ce4b72e027780d10e0fe601a2525f656e
MD5 38d49c31cf0555a30a1100be5946c542
BLAKE2b-256 16bf9c795a21645d2cbd3716c54b2df96fe5af6d572a2e5e3225e5f71cc30723

See more details on using hashes here.

File details

Details for the file truthlens_ai-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: truthlens_ai-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 44.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.7

File hashes

Hashes for truthlens_ai-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5221d0286f191dd138dc9f540261d1b5684bc2922adfd04c1475023a50c1dbdc
MD5 3889ebb958833c802d6d434df2c2a639
BLAKE2b-256 abb68b8d1fcfd48d25991dceaca53632b24ea0b513ee4fea5c08152c9fb45d8f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page