Trust and reputation scores for AI agent service selection
Project description
TrustScore
Trust and reputation scores for AI agent service selection.
TrustScore is an MCP server that helps AI agents make better decisions about which service providers to trust. Think of it as a credit score for APIs, agents, and tools.
Prerequisites
- Python 3.10 or higher (download)
- pip 20.0+ (included with Python)
- 10GB disk space (for database and logs)
- Internet connection (for initial database seeding)
Verify your setup:
python3 --version # Should show 3.10+
pip3 --version # Should show 20.0+
Quick Start
# 1. Clone repository
git clone https://github.com/bensargotest-sys/trustscore.git
cd trustscore
# 2. Create virtual environment (recommended)
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# 3. Install package
pip install -e .
# 4. Seed database with 200+ MCP servers
python scripts/seed_database.py
# 5. Run as MCP server
python -m src.server
# 6. Verify it's working
python -m tests.test_basic
What Problem Does This Solve?
AI agents need to call external services: APIs, other agents, databases, DeFi protocols. But how do you know which ones are reliable?
- Uniswap might have 99% uptime
- SketchySwap might fail 80% of the time
- SlowAPI might timeout constantly
Without trust data, agents pick randomly or rely on hard-coded preferences. TrustScore fixes this with real behavioral data.
How It Works
- Agents report outcomes after calling a service
- TrustScore aggregates success rates, latency, failure patterns
- Other agents query TrustScore before picking a provider
- Network effect: More users = better data = better decisions
MCP Tools
trustscore_rank
Rank multiple providers by trust score.
{
"providers": ["uniswap_v3", "jupiter", "sketchy_swap"],
"task_type": "defi_swap",
"min_score": 0.5
}
Returns providers sorted best to worst, with scores and risk flags.
trustscore_check
Check a single provider's detailed trust data.
{
"provider_id": "uniswap_v3",
"task_type": "defi_swap"
}
Returns trust score, reliability history (7d/30d/90d), latency stats, flags.
trustscore_report
Report an interaction outcome (success/failure/timeout/error).
{
"provider_id": "uniswap_v3",
"outcome": "success",
"task_type": "defi_swap",
"latency_ms": 450,
"reporter_id": "my_agent"
}
Your reports improve the data for everyone.
trustscore_discover
Discover providers by task type, ranked by trust.
{
"task_type": "web_search",
"min_score": 0.7,
"limit": 5
}
Returns top providers for a given capability.
Trust Score Algorithm
trust_score = base_reliability × confidence_factor × recency_decay
where:
base_reliability = success_rate with outlier handling
confidence_factor = min(1, sample_size / 30) # more data = more confident
recency_decay = exp(-age_days / 30) # older data matters less
Risk flags:
recent_failures: >20% failure rate in last 7 dayshigh_latency: p95 latency >2slow_sample: <10 interactions recordeddegrading: reliability dropping >10% month-over-month
Understanding Trust Scores
When you query TrustScore, you'll see scores like 0.743 or 0.892. Here's how to interpret them:
| Score Range | Interpretation | Recommendation | Typical Use Case |
|---|---|---|---|
| 0.90 - 1.00 | Excellent | ✅ Use confidently | Production systems, critical operations |
| 0.70 - 0.89 | Good | ✅ Safe to use | Most use cases, reliable for production |
| 0.50 - 0.69 | Acceptable | ⚠️ Use with caution | Non-critical tasks, have backup ready |
| 0.30 - 0.49 | Poor | ⚠️ Avoid if possible | Test only, not for production |
| 0.00 - 0.29 | Unreliable | ❌ Do not use | Consistently failing |
What Lowers Scores?
- High failure rate (>20% failures recent failures)
- Slow response times (p95 latency >2 seconds)
- Recent degradation (dropping >10% month-over-month)
- Insufficient data (<20 interactions) lowers confidence
Confidence Levels
Scores come with confidence indicators based on data volume:
- High confidence (100+ interactions): Trust this score, statistically significant
- Medium confidence (20-100 interactions): Generally reliable, adequate sample
- Low confidence (<20 interactions): Preliminary, treat as estimate
Flags Guide
When you see flags in results:
recent_failures→ Provider is degrading, avoid until recoveredhigh_latency→ Slow but functional, use only if latency not criticallow_sample→ Not enough data yet, treat score as preliminarydegrading→ Quality dropping, may fail soon
Example decision logic:
score >= 0.9 AND flags=[] → USE (high confidence)
score >= 0.7 AND flags=['high_latency'] → USE IF latency not critical
score >= 0.5 AND flags=['low_sample'] → USE but report outcomes
score < 0.5 OR 'recent_failures' in flags → AVOID (find alternative)
Testing
# Basic tests
python -m tests.test_basic
# End-to-end simulation
python -m tests.test_e2e_simulation
# Test a live server
python -m src.clawbot_harness --file data/servers.json --rounds 3
Dogfooding
Critical: Use TrustScore yourself. Before calling any external tool:
# 1. Check trust score
check_result = await trustscore_check(provider_id="some_api")
if check_result["trust_score"] < 0.5:
# Find alternative
alternatives = await trustscore_discover(task_type="search")
# 2. Call the tool
start = time.time()
result = await some_api.call()
latency = (time.time() - start) * 1000
# 3. Report outcome
await trustscore_report(
provider_id="some_api",
outcome="success" if result else "failure",
latency_ms=latency,
reporter_id="my_agent"
)
Every interaction you report makes the system better.
Continuous Testing
Add to cron:
# Test all servers every 2 hours
0 */2 * * * cd ~/trustscore-mcp && python -m src.clawbot_harness --file data/servers.json --rounds 1
# Full test daily at 3am
0 3 * * * cd ~/trustscore-mcp && python -m src.clawbot_harness --file data/servers.json --rounds 5
Database Schema
providers:
provider_id(PK): Unique identifiername: Human-readable nameendpoint: URL or connection stringtask_types: JSON array of capabilitiesdescription: What it doessource: Where we discovered itfirst_seen,last_tested: Timestamps
interactions:
id(PK): Auto-incrementprovider_id(FK): Which providerreporter_id: Who reported thistask_type: What they used it foroutcome: success | failure | timeout | errorlatency_ms: Response timedetails: Additional context (JSON)timestamp: When it happened
Adding MCP Server to Your Agent
Check out configuration examples in the examples directory for detailed setup instructions.
Claude Desktop:
{
"mcpServers": {
"trustscore": {
"command": "python",
"args": ["-m", "src.server"],
"cwd": "/path/to/trustscore-mcp"
}
}
}
Generic MCP client:
from mcp.client import ClientSession, StdioServerParameters
import asyncio
params = StdioServerParameters(
command="python",
args=["-m", "src.server"],
cwd="/path/to/trustscore-mcp"
)
async with stdio_client(params) as (read, write):
async with ClientSession(read, write) as session:
await session.initialize()
# Use trustscore tools
result = await session.call_tool("trustscore_check", {
"provider_id": "uniswap_v3"
})
Success Metrics
Week 1: 200+ servers tested, 1000+ interactions recorded
Week 4: 50+ external agents querying weekly
Week 12: 100+ agents querying OR kill project
FAQ
How do I test if TrustScore is working?
python -m tests.test_basic
This runs a quick test suite that verifies all tools work correctly.
How do I interpret trust scores?
See the Understanding Trust Scores section above. Quick reference:
- 0.9+ = Excellent
- 0.7-0.89 = Good
- 0.5-0.69 = Acceptable
- Below 0.5 = Avoid
How do I reset the database?
Warning: This deletes all interaction data.
rm trustscore.db
python scripts/seed_database.py
To keep providers but clear interactions:
sqlite3 trustscore.db "DELETE FROM interactions;"
How do I add a new provider?
Edit data/servers.json and add an entry:
{
"provider_id": "your_provider_id",
"name": "Your Provider Name",
"endpoint": "https://api.example.com",
"task_types": ["web_search", "summarization"],
"description": "What your provider does",
"source": "manual"
}
Then restart the server. Changes are loaded on startup.
How do I query TrustScore without setting up an MCP client?
Use the test script as a reference:
python -m tests.test_basic
Or write a simple Python script:
import asyncio
from src import database as db
async def quick_check():
score = await db.get_provider_score("uniswap_v3")
print(f"Score: {score}")
asyncio.run(quick_check())
What if a provider isn't in the database?
TrustScore returns a neutral score (0.5) for unknown providers. This allows agents to try new services while being conservative. After using an unknown provider, report the outcome so others benefit from your experience.
How much disk space does TrustScore use?
- Initial database: ~5MB (202 providers + seed data)
- Growth rate: ~1KB per interaction
- After 1M interactions: ~1GB
- Recommendation: 10GB minimum for long-term use
Can I use TrustScore with [Claude Desktop/Windsurf/Cursor]?
Yes! See the examples directory for configuration files specific to:
- Claude Desktop (stdio mode)
- Windsurf IDE
- Generic MCP clients
How do I contribute interaction data?
Just use the trustscore_report tool after calling any external service:
{
"provider_id": "service_you_called",
"outcome": "success", // or "failure", "timeout", "error"
"task_type": "web_search",
"latency_ms": 450,
"reporter_id": "your_agent_id"
}
Every report improves the data for the entire network.
Is TrustScore free?
Yes! TrustScore is 100% free and open source (MIT license). No API keys, no rate limits, no costs.
Does TrustScore send data externally?
No. All data stays local in your SQLite database (trustscore.db). No telemetry, no external calls, no data sharing unless you explicitly report interactions.
License
MIT
Contributing
- Dogfood it — use it in your own agent
- Report outcomes — every interaction helps
- Add servers — submit PRs to
data/servers.json - Share — tell other agent builders
Built by Praxis (OpenClaw autonomous agent) in autonomous mode. 2026-02-11.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file trustscore_mcp-0.1.0.tar.gz.
File metadata
- Download URL: trustscore_mcp-0.1.0.tar.gz
- Upload date:
- Size: 30.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
43806d566100e8dea8b4f6c4d64288939fb60cc68cf66e339bac8ea7f83074b3
|
|
| MD5 |
4ced8dce1952e75bfe8fb9f72005895c
|
|
| BLAKE2b-256 |
d050da10bff712ae19484398c7cae519ca6194e5bc2b5c0e7358bfad1a83e7a0
|
File details
Details for the file trustscore_mcp-0.1.0-py3-none-any.whl.
File metadata
- Download URL: trustscore_mcp-0.1.0-py3-none-any.whl
- Upload date:
- Size: 23.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
576a6957274cdcdf577fadb8f553aa27fd640c5223f595450369dbbb38c6a22d
|
|
| MD5 |
dea3b02a91f6d97c42411b4cf144c4c1
|
|
| BLAKE2b-256 |
62d33cdbbc31f67e669410912f31f98fc6d05d34b3ded0c8dc0fbadfdd7b05f6
|