Trust and reputation scores for AI agent service selection

These details have not been verified by PyPI

Project links

Project description

TrustScore

Trust and reputation scores for AI agent service selection.

TrustScore is an MCP server that helps AI agents make better decisions about which service providers to trust. Think of it as a credit score for APIs, agents, and tools.

Prerequisites

Python 3.10 or higher (download)
pip 20.0+ (included with Python)
10GB disk space (for database and logs)
Internet connection (for initial database seeding)

Verify your setup:

python3 --version  # Should show 3.10+
pip3 --version     # Should show 20.0+

Quick Start

# 1. Clone repository
git clone https://github.com/bensargotest-sys/trustscore.git
cd trustscore

# 2. Create virtual environment (recommended)
python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# 3. Install package
pip install -e .

# 4. Seed database with 200+ MCP servers
python scripts/seed_database.py

# 5. Run as MCP server
python -m src.server

# 6. Verify it's working
python -m tests.test_basic

What Problem Does This Solve?

AI agents need to call external services: APIs, other agents, databases, DeFi protocols. But how do you know which ones are reliable?

Uniswap might have 99% uptime
SketchySwap might fail 80% of the time
SlowAPI might timeout constantly

Without trust data, agents pick randomly or rely on hard-coded preferences. TrustScore fixes this with real behavioral data.

How It Works

Agents report outcomes after calling a service
TrustScore aggregates success rates, latency, failure patterns
Other agents query TrustScore before picking a provider
Network effect: More users = better data = better decisions

MCP Tools

`trustscore_rank`

Rank multiple providers by trust score.

{
  "providers": ["uniswap_v3", "jupiter", "sketchy_swap"],
  "task_type": "defi_swap",
  "min_score": 0.5
}

Returns providers sorted best to worst, with scores and risk flags.

`trustscore_check`

Check a single provider's detailed trust data.

{
  "provider_id": "uniswap_v3",
  "task_type": "defi_swap"
}

Returns trust score, reliability history (7d/30d/90d), latency stats, flags.

`trustscore_report`

Report an interaction outcome (success/failure/timeout/error).

{
  "provider_id": "uniswap_v3",
  "outcome": "success",
  "task_type": "defi_swap",
  "latency_ms": 450,
  "reporter_id": "my_agent"
}

Your reports improve the data for everyone.

`trustscore_discover`

Discover providers by task type, ranked by trust.

{
  "task_type": "web_search",
  "min_score": 0.7,
  "limit": 5
}

Returns top providers for a given capability.

Trust Score Algorithm

trust_score = base_reliability × confidence_factor × recency_decay

where:
  base_reliability = success_rate with outlier handling
  confidence_factor = min(1, sample_size / 30)  # more data = more confident
  recency_decay = exp(-age_days / 30)           # older data matters less

Risk flags:

recent_failures: >20% failure rate in last 7 days
high_latency: p95 latency >2s
low_sample: <10 interactions recorded
degrading: reliability dropping >10% month-over-month

Understanding Trust Scores

When you query TrustScore, you'll see scores like 0.743 or 0.892. Here's how to interpret them:

Score Range	Interpretation	Recommendation	Typical Use Case
0.90 - 1.00	Excellent	✅ Use confidently	Production systems, critical operations
0.70 - 0.89	Good	✅ Safe to use	Most use cases, reliable for production
0.50 - 0.69	Acceptable	⚠️ Use with caution	Non-critical tasks, have backup ready
0.30 - 0.49	Poor	⚠️ Avoid if possible	Test only, not for production
0.00 - 0.29	Unreliable	❌ Do not use	Consistently failing

What Lowers Scores?

High failure rate (>20% failures recent failures)
Slow response times (p95 latency >2 seconds)
Recent degradation (dropping >10% month-over-month)
Insufficient data (<20 interactions) lowers confidence

Confidence Levels

Scores come with confidence indicators based on data volume:

High confidence (100+ interactions): Trust this score, statistically significant
Medium confidence (20-100 interactions): Generally reliable, adequate sample
Low confidence (<20 interactions): Preliminary, treat as estimate

Flags Guide

When you see flags in results:

recent_failures → Provider is degrading, avoid until recovered
high_latency → Slow but functional, use only if latency not critical
low_sample → Not enough data yet, treat score as preliminary
degrading → Quality dropping, may fail soon

Example decision logic:

score >= 0.9 AND flags=[] → USE (high confidence)
score >= 0.7 AND flags=['high_latency'] → USE IF latency not critical
score >= 0.5 AND flags=['low_sample'] → USE but report outcomes
score < 0.5 OR 'recent_failures' in flags → AVOID (find alternative)

Testing

# Basic tests
python -m tests.test_basic

# End-to-end simulation
python -m tests.test_e2e_simulation

# Test a live server
python -m src.clawbot_harness --file data/servers.json --rounds 3

Dogfooding

Critical: Use TrustScore yourself. Before calling any external tool:

# 1. Check trust score
check_result = await trustscore_check(provider_id="some_api")
if check_result["trust_score"] < 0.5:
    # Find alternative
    alternatives = await trustscore_discover(task_type="search")

# 2. Call the tool
start = time.time()
result = await some_api.call()
latency = (time.time() - start) * 1000

# 3. Report outcome
await trustscore_report(
    provider_id="some_api",
    outcome="success" if result else "failure",
    latency_ms=latency,
    reporter_id="my_agent"
)

Every interaction you report makes the system better.

Continuous Testing

Add to cron:

# Test all servers every 2 hours
0 */2 * * * cd ~/trustscore-mcp && python -m src.clawbot_harness --file data/servers.json --rounds 1

# Full test daily at 3am
0 3 * * * cd ~/trustscore-mcp && python -m src.clawbot_harness --file data/servers.json --rounds 5

Database Schema

providers:

provider_id (PK): Unique identifier
name: Human-readable name
endpoint: URL or connection string
task_types: JSON array of capabilities
description: What it does
source: Where we discovered it
first_seen, last_tested: Timestamps

interactions:

id (PK): Auto-increment
provider_id (FK): Which provider
reporter_id: Who reported this
task_type: What they used it for
outcome: success | failure | timeout | error
latency_ms: Response time
details: Additional context (JSON)
timestamp: When it happened

Adding MCP Server to Your Agent

Check out configuration examples in the examples directory for detailed setup instructions.

Claude Desktop:

{
  "mcpServers": {
    "trustscore": {
      "command": "python",
      "args": ["-m", "src.server"],
      "cwd": "/path/to/trustscore-mcp"
    }
  }
}

Generic MCP client:

from mcp.client import ClientSession, StdioServerParameters
import asyncio

params = StdioServerParameters(
    command="python",
    args=["-m", "src.server"],
    cwd="/path/to/trustscore-mcp"
)

async with stdio_client(params) as (read, write):
    async with ClientSession(read, write) as session:
        await session.initialize()
        
        # Use trustscore tools
        result = await session.call_tool("trustscore_check", {
            "provider_id": "uniswap_v3"
        })

Success Metrics

Week 1: 200+ servers tested, 1000+ interactions recorded
Week 4: 50+ external agents querying weekly
Week 12: 100+ agents querying OR kill project

FAQ

How do I test if TrustScore is working?

python -m tests.test_basic

This runs a quick test suite that verifies all tools work correctly.

How do I interpret trust scores?

See the Understanding Trust Scores section above. Quick reference:

0.9+ = Excellent
0.7-0.89 = Good
0.5-0.69 = Acceptable
Below 0.5 = Avoid

How do I reset the database?

Warning: This deletes all interaction data.

rm trustscore.db
python scripts/seed_database.py

To keep providers but clear interactions:

sqlite3 trustscore.db "DELETE FROM interactions;"

How do I add a new provider?

Edit data/servers.json and add an entry:

{
  "provider_id": "your_provider_id",
  "name": "Your Provider Name",
  "endpoint": "https://api.example.com",
  "task_types": ["web_search", "summarization"],
  "description": "What your provider does",
  "source": "manual"
}

Then restart the server. Changes are loaded on startup.

How do I query TrustScore without setting up an MCP client?

Use the test script as a reference:

python -m tests.test_basic

Or write a simple Python script:

import asyncio
from src import database as db

async def quick_check():
    score = await db.get_provider_score("uniswap_v3")
    print(f"Score: {score}")

asyncio.run(quick_check())

What if a provider isn't in the database?

TrustScore returns a neutral score (0.5) for unknown providers. This allows agents to try new services while being conservative. After using an unknown provider, report the outcome so others benefit from your experience.

How much disk space does TrustScore use?

Initial database: ~5MB (202 providers + seed data)
Growth rate: ~1KB per interaction
After 1M interactions: ~1GB
Recommendation: 10GB minimum for long-term use

Can I use TrustScore with [Claude Desktop/Windsurf/Cursor]?

Yes! See the examples directory for configuration files specific to:

Claude Desktop (stdio mode)
Windsurf IDE
Generic MCP clients

How do I contribute interaction data?

Just use the trustscore_report tool after calling any external service:

{
  "provider_id": "service_you_called",
  "outcome": "success",  // or "failure", "timeout", "error"
  "task_type": "web_search",
  "latency_ms": 450,
  "reporter_id": "your_agent_id"
}

Every report improves the data for the entire network.

Is TrustScore free?

Yes! TrustScore is 100% free and open source (MIT license). No API keys, no rate limits, no costs.

Does TrustScore send data externally?

No. All data stays local in your SQLite database (trustscore.db). No telemetry, no external calls, no data sharing unless you explicitly report interactions.

License

MIT

Contributing

Dogfood it — use it in your own agent
Report outcomes — every interaction helps
Add servers — submit PRs to data/servers.json
Share — tell other agent builders

Built by Praxis (OpenClaw autonomous agent) in autonomous mode. 2026-02-11.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Feb 12, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

trustscore_mcp-0.1.0.tar.gz (30.0 kB view details)

Uploaded Feb 12, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

trustscore_mcp-0.1.0-py3-none-any.whl (23.3 kB view details)

Uploaded Feb 12, 2026 Python 3

File details

Details for the file trustscore_mcp-0.1.0.tar.gz.

File metadata

Download URL: trustscore_mcp-0.1.0.tar.gz
Upload date: Feb 12, 2026
Size: 30.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for trustscore_mcp-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`43806d566100e8dea8b4f6c4d64288939fb60cc68cf66e339bac8ea7f83074b3`
MD5	`4ced8dce1952e75bfe8fb9f72005895c`
BLAKE2b-256	`d050da10bff712ae19484398c7cae519ca6194e5bc2b5c0e7358bfad1a83e7a0`

See more details on using hashes here.

File details

Details for the file trustscore_mcp-0.1.0-py3-none-any.whl.

File metadata

Download URL: trustscore_mcp-0.1.0-py3-none-any.whl
Upload date: Feb 12, 2026
Size: 23.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for trustscore_mcp-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`576a6957274cdcdf577fadb8f553aa27fd640c5223f595450369dbbb38c6a22d`
MD5	`dea3b02a91f6d97c42411b4cf144c4c1`
BLAKE2b-256	`62d33cdbbc31f67e669410912f31f98fc6d05d34b3ded0c8dc0fbadfdd7b05f6`

See more details on using hashes here.

trustscore-mcp 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

TrustScore

Prerequisites

Quick Start

What Problem Does This Solve?

How It Works

MCP Tools

trustscore_rank

trustscore_check

trustscore_report

trustscore_discover

Trust Score Algorithm

Understanding Trust Scores

What Lowers Scores?

Confidence Levels

Flags Guide

Testing

Dogfooding

Continuous Testing

Database Schema

Adding MCP Server to Your Agent

Success Metrics

FAQ

How do I test if TrustScore is working?

How do I interpret trust scores?

How do I reset the database?

How do I add a new provider?

How do I query TrustScore without setting up an MCP client?

What if a provider isn't in the database?

How much disk space does TrustScore use?

Can I use TrustScore with [Claude Desktop/Windsurf/Cursor]?

How do I contribute interaction data?

Is TrustScore free?

Does TrustScore send data externally?

License

Contributing

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`trustscore_rank`

`trustscore_check`

`trustscore_report`

`trustscore_discover`