The trust and evaluation layer for AI systems.
Project description
TruthLens
The trust and evaluation layer for AI systems.
TruthLens automatically scores any AI response for groundedness, faithfulness, hallucination risk, and overall trustworthiness — in 2 lines of code.
Install
pip install truthlens
Requires Ollama for local evaluation (free), or any LLM API key.
Quick Start
from truthlens import TruthLens
# Local — no API key needed
tl = TruthLens(provider="ollama", model="llama3")
response = tl.chat(
"Who created Python?",
sources=["Python was created by Guido van Rossum, first released in 1991."]
)
print(response.content) # AI's answer
print(response.trust_score) # 95.0
print(response.hallucination_risk) # Low
print(response.summary())
# [TruthLens] Trust: 95/100 | Ground: 96% | Faith: 94% | Hallucination: ✓ Low
Supported Providers
# OpenAI
tl = TruthLens(provider="openai", model="gpt-4o", api_key="sk-...")
# Anthropic
tl = TruthLens(provider="anthropic", model="claude-sonnet-4-6", api_key="sk-ant-...")
# Gemini
tl = TruthLens(provider="gemini", model="gemini-1.5-flash", api_key="AI...")
# Local Ollama (free)
tl = TruthLens(provider="ollama", model="llama3")
What Gets Scored
| Metric | Description |
|---|---|
| Trust Score | Composite 0–100 |
| Groundedness | How strongly the answer is backed by sources |
| Faithfulness | Whether the answer accurately reflects sources |
| Citation Accuracy | Whether references are valid |
| Hallucination Risk | Low / Medium / High |
Claim-Level Verification
tl = TruthLens(provider="ollama", model="llama3", include_claims=True)
response = tl.chat("Tell me about Python", sources=["..."])
for claim in response.claims:
print(f"{claim['verdict']}: {claim['text']}")
# Supported: Python was created by Guido van Rossum
# Unsupported: Python was initially called ABC
RAG Evaluation
from truthlens import evaluate_rag
report = evaluate_rag(
question="What is photosynthesis?",
answer=generated_answer,
retrieved_chunks=your_chunks,
)
print(report.rag_score) # 90.1
Benchmark Runner
from truthlens import run_benchmark, generate_sample_dataset
report = run_benchmark(generate_sample_dataset(), model="llama3")
print(report.stats.avg_trust_score) # 87.3
REST API (Proxy Server)
truthlens proxy
Then from any language:
const r = await fetch("http://localhost:8001/chat", {
method: "POST",
body: JSON.stringify({
provider: "openai", model: "gpt-4o", api_key: "sk-...",
messages: [{ role: "user", content: "Who created Python?" }],
sources: ["Python was created by Guido van Rossum in 1991."]
})
})
const data = await r.json()
console.log(data.trust_score) // 95.0
Dashboard
truthlens start
Opens at http://localhost:5173 — 10 pages covering evaluate, claims, RAG, agent, benchmark, leaderboard, proxy analytics, and paper generation.
CLI
truthlens setup # check dependencies
truthlens start # start everything + open browser
truthlens proxy # start proxy server
truthlens evaluate -q "..." -a "..." -s "..." # evaluate from terminal
truthlens benchmark --sample # run built-in benchmark
Research
TruthLens is designed to support AI trustworthiness research.
Research questions:
- RQ1: Can multi-metric evaluation predict factual reliability better than single metrics?
- RQ2: Does claim-level verification correlate with human judgments?
- RQ3: How does hallucination rate vary across domains?
- RQ4: Does model size correlate with trust scores?
See paper/PAPER.md for the research paper draft.
Project Structure
truthlens/
├── truthlens/ ← Core evaluation library
│ ├── evaluator.py ← 5-metric trust scoring
│ ├── claims.py ← Claim-level verification
│ ├── rag.py ← RAG pipeline evaluation
│ ├── agent.py ← Agent trace evaluation
│ ├── benchmark.py ← Benchmark runner
│ ├── leaderboard.py ← Multi-model leaderboard
│ └── paper_generator.py ← Research paper generator
├── proxy/ ← Middleware proxy layer
│ ├── sdk.py ← Python SDK (2-line integration)
│ ├── server.py ← FastAPI proxy (port 8001)
│ ├── providers.py ← OpenAI/Anthropic/Gemini/Ollama
│ └── database.py ← SQLite evaluation logs
├── api/ ← Main API server (port 8000)
├── dashboard/ ← React dashboard (port 5173)
├── tests/ ← 36 unit tests
├── paper/ ← Research paper draft
├── start.bat ← Windows: one-click start
├── start.sh ← Mac/Linux: one-click start
└── GETTING_STARTED.md ← Full setup guide
Contributing
- Fork the repo
- Create your branch:
git checkout -b feature/my-feature - Commit:
git commit -m "Add my feature" - Push:
git push origin feature/my-feature - Open a Pull Request
License
MIT © TruthLens Contributors
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file truthlens_ai-1.0.0.tar.gz.
File metadata
- Download URL: truthlens_ai-1.0.0.tar.gz
- Upload date:
- Size: 42.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e2fd09c6a966c4e65c4a30f3f4e99a4ce4b72e027780d10e0fe601a2525f656e
|
|
| MD5 |
38d49c31cf0555a30a1100be5946c542
|
|
| BLAKE2b-256 |
16bf9c795a21645d2cbd3716c54b2df96fe5af6d572a2e5e3225e5f71cc30723
|
File details
Details for the file truthlens_ai-1.0.0-py3-none-any.whl.
File metadata
- Download URL: truthlens_ai-1.0.0-py3-none-any.whl
- Upload date:
- Size: 44.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5221d0286f191dd138dc9f540261d1b5684bc2922adfd04c1475023a50c1dbdc
|
|
| MD5 |
3889ebb958833c802d6d434df2c2a639
|
|
| BLAKE2b-256 |
abb68b8d1fcfd48d25991dceaca53632b24ea0b513ee4fea5c08152c9fb45d8f
|