trustmodel

Official Python SDK for TrustModel AI evaluation platform

These details have not been verified by PyPI

Project links

Homepage

Project description

Official Python SDK for the TrustModel AI evaluation platform

Website • Documentation • Dashboard

Evaluate AI models for safety, bias, and performance with a simple, intuitive interface.

Features

🚀 Simple Interface: Easy-to-use client for all TrustModel operations
🔒 Secure: API key authentication with built-in validation
🎯 Type Safe: Full type hints for excellent IDE support
🔄 Reliable: Automatic retries and comprehensive error handling
📊 Comprehensive: Support for all evaluation types and configurations
🌍 Framework Agnostic: Works with any Python framework or standalone scripts

Installation

pip install trustmodel

Prerequisites

Before using the SDK, you must complete the following setup in the TrustModel Dashboard:

1. Create an API Key (Required)

You need a TrustModel API key to authenticate all SDK requests:

Go to Keys & Webhooks in the dashboard
Click "Create API Key"
Copy your new API key (starts with tm-)
Store it securely - you won't be able to see it again

2. Configure Webhooks (Required)

To receive notifications when evaluations complete or fail, you must configure webhooks:

Go to Keys & Webhooks in the dashboard
Click "Create Webhook"
Enter your webhook endpoint URL
Select the events you want to receive
Save your webhook configuration

Important: Without configuring both an API key and webhooks in the webapp, you cannot run evaluations. The API will return an error if these are not set up.

Quick Start

import trustmodel

# Initialize the client
client = trustmodel.TrustModelClient(api_key="tm-your-api-key-here")

# List available models
models, api_sources = client.models.list()
print(f"Found {len(models)} models available")

# Create an evaluation
evaluation = client.evaluations.create(
    model_identifier="gpt-4",
    vendor_identifier="openai",
    categories=["safety", "bias", "performance"]
)

print(f"Evaluation created with ID: {evaluation.id}")
print(f"Status: {evaluation.status}")

# You'll receive a webhook notification when the evaluation completes
# Then retrieve the results:
completed_evaluation = client.evaluations.get(evaluation.id)
print(f"Overall score: {completed_evaluation.overall_score}")

# Check your credit balance
credits = client.credits.get_balance()
print(f"Credits remaining: {credits.credits_remaining}")

Authentication

Get your API key from the TrustModel Dashboard and use it to initialize the client:

import trustmodel

client = trustmodel.TrustModelClient(api_key="tm-your-api-key-here")

For production applications, store your API key securely using environment variables:

import os
import trustmodel

api_key = os.getenv("TRUSTMODEL_API_KEY")
client = trustmodel.TrustModelClient(api_key=api_key)

Evaluation Modes

TrustModel supports three ways to evaluate AI models:

Mode	Use Case	API Key Required
Platform Key	Quick evaluations using TrustModel's API keys	No (uses TrustModel's keys)
BYOK	Use your own vendor API key for any model	Yes (your vendor API key)
Custom Endpoint	Evaluate private/self-hosted models	Yes (your endpoint's API key)

Getting Available Vendors

Use client.config.get().vendors to discover available vendors:

config = client.config.get()

# Public vendors - for Platform Key and BYOK evaluations
public_vendors = config.vendors["public"]
for vendor in public_vendors:
    print(f"{vendor['identifier']}: {vendor['name']}")

# Custom vendors - for Custom Endpoint evaluations only
custom_vendors = config.vendors["custom"]
for vendor in custom_vendors:
    print(f"{vendor['identifier']}: {vendor['name']}")

Vendor Type	Use With	Description
`public`	Platform Key, BYOK	Vendors like OpenAI, Anthropic, Google AI for standard evaluations
`custom`	Custom Endpoint	Validators for self-hosted/private endpoints (OpenAI-compatible, Hugging Face, Azure AI, etc.)

Getting Available Models

Use client.models.list() to discover available models:

# Get all available models and API source info
models, api_sources = client.models.list()

# List all models with their details
for model in models:
    print(f"Model: {model.name}")
    print(f"  Identifier: {model.model_identifier}")
    print(f"  Vendor: {model.vendor_identifier}")
    print(f"  Platform Key Available: {model.available_via_trust_model_key}")
    print(f"  BYOK Available: {model.available_via_byok}")

# Filter models by vendor
openai_models = [m for m in models if m.vendor_identifier == "openai"]

# Filter models available via platform key (no vendor API key needed)
platform_key_models = [m for m in models if m.available_via_trust_model_key]

# Use a model in evaluation
model = models[0]
evaluation = client.evaluations.create(
    model_identifier=model.model_identifier,
    vendor_identifier=model.vendor_identifier,
    categories=["safety", "bias"]
)

Model Field	Type	Description
`name`	str	Human-readable model name
`model_identifier`	str	Identifier to use in API calls
`vendor_identifier`	str	Vendor identifier
`available_via_trust_model_key`	bool	Can evaluate without vendor API key
`available_via_byok`	bool	Previously used with your own API key

Platform Key (Default)

Use TrustModel's platform keys for quick evaluations. No vendor API key needed:

evaluation = client.evaluations.create(
    model_identifier="gpt-4",
    vendor_identifier="openai",
    categories=["safety", "bias"]
)

Note: Platform key availability varies by model. Check model.available_via_trust_model_key to see if a model supports this mode.

BYOK (Bring Your Own Key)

Use your own vendor API key to evaluate any model. All vendors support BYOK:

evaluation = client.evaluations.create(
    model_identifier="gpt-4",
    vendor_identifier="openai",
    api_key="sk-your-openai-key",  # Your OpenAI API key
    categories=["safety", "bias"]
)

How it works:

You provide your vendor API key (e.g., OpenAI, Anthropic, Google)
TrustModel validates the key before creating the evaluation
If validation fails, a ConnectionValidationError is raised with details
Your key is securely stored and used for the evaluation

Getting vendor API keys:

OpenAI: platform.openai.com/api-keys
Anthropic: console.anthropic.com/settings/keys
Google AI: aistudio.google.com/apikey

Example with error handling:

from trustmodel import ConnectionValidationError, InsufficientCreditsError

try:
    evaluation = client.evaluations.create(
        model_identifier="gpt-4",
        vendor_identifier="openai",
        api_key="sk-your-openai-key",
        categories=["safety", "bias"]
    )
    print(f"Evaluation created: {evaluation.id}")
except ConnectionValidationError as e:
    # API key validation failed
    print(f"Invalid API key: {e.message}")
    if e.validation_details:
        print(f"Details: {e.validation_details}")
except InsufficientCreditsError as e:
    print(f"Need more credits: {e.credits_required} required")

Custom Endpoint

Evaluate your own OpenAI-compatible API endpoint (Ollama, vLLM, LiteLLM, Azure AI, etc.):

# Create evaluation for a custom endpoint
evaluation = client.evaluations.create_custom_endpoint(
    api_endpoint="https://api.yourcompany.com/v1",
    api_key="your-api-key",
    model_identifier="your-model-id",
    vendor_identifier="openai",  # Determines which validator to use
    model_name="My Custom Model",  # Optional display name
    categories=["safety", "bias"]
)

Available vendor identifiers for custom endpoints:

Get the list programmatically with client.config.get().vendors["custom"], or use one of these:

Identifier	Use For
`openai`	OpenAI-compatible APIs (Ollama, vLLM, LiteLLM, etc.) - default
`huggingface`	Hugging Face Inference Endpoints
`azure_ai`	Azure AI / Azure OpenAI Service
`xai`	Google Vertex AI
`bedrock`	AWS Bedrock

Examples:

# Ollama endpoint (uses default "openai" validator)
evaluation = client.evaluations.create_custom_endpoint(
    api_endpoint="http://localhost:11434/v1",
    api_key="ollama",  # Ollama doesn't require a real key
    model_identifier="llama3:8b"
)

# Azure AI endpoint
evaluation = client.evaluations.create_custom_endpoint(
    api_endpoint="https://your-resource.openai.azure.com",
    api_key="your-azure-key",
    model_identifier="gpt-4",
    vendor_identifier="azure_ai"
)

# Hugging Face endpoint
evaluation = client.evaluations.create_custom_endpoint(
    api_endpoint="https://api-inference.huggingface.co/models/your-model",
    api_key="hf_your_token",
    model_identifier="your-model",
    vendor_identifier="huggingface"
)

Core Concepts

Models

Discover available AI models:

# List all available models
models, api_sources = client.models.list()

for model in models:
    print(f"Model: {model.name}")
    print(f"Vendor: {model.vendor_identifier}")
    print(f"Platform key available: {model.available_via_trust_model_key}")
    print(f"Previously used BYOK: {model.available_via_byok}")
    print("---")

# Get specific model
model = client.models.get("openai", "gpt-4")
print(f"Found model: {model.name}")

Note: available_via_byok indicates you have previously used BYOK for this vendor. All vendors support BYOK - you can use your own API key with any model.

Evaluations

Create and manage AI model evaluations:

# Platform key (default) - uses TrustModel's keys
evaluation = client.evaluations.create(
    model_identifier="gpt-4",
    vendor_identifier="openai",
    categories=["safety", "bias"]
)

# BYOK - uses your own API key
evaluation = client.evaluations.create(
    model_identifier="gpt-4",
    vendor_identifier="openai",
    api_key="sk-your-openai-key",
    categories=["safety", "bias"]
)

# Custom endpoint - your own API
evaluation = client.evaluations.create_custom_endpoint(
    api_endpoint="https://api.yourcompany.com/v1",
    api_key="your-api-key",
    model_identifier="custom-model-v1"
)

Re-run from Template

Re-run a previous evaluation configuration using its template ID:

# Re-run using a saved template
evaluation = client.evaluations.create_from_template(
    template_id="550e8400-e29b-41d4-a716-446655440000"
)

# Optionally update the template name
evaluation = client.evaluations.create_from_template(
    template_id="550e8400-e29b-41d4-a716-446655440000",
    template_name="My Updated Config Name"
)

The template contains all saved configuration (model, vendor, categories, etc.) so no other parameters are required. Template IDs are returned in evaluation results via the template_id field.

Managing Evaluations

# List all evaluations
evaluations = client.evaluations.list()

# Filter by status
completed = client.evaluations.list(status="completed")

# Get detailed results
evaluation = client.evaluations.get(evaluation_id)
if evaluation.status == "completed":
    print(f"Overall Score: {evaluation.overall_score}")
    for score in evaluation.scores:
        print(f"{score.category}: {score.score:.2f}")

# Quick status check
status = client.evaluations.get_status(evaluation_id)
print(f"Progress: {status['completion_percentage']}%")

Batch Jobs & Model Comparison

Evaluate multiple models efficiently using batch jobs. Batch jobs are ideal for comparing models, running high-volume evaluations, and reducing API quota usage.

Creating Batch Evaluations

Create a batch to evaluate multiple models in parallel:

# Create a batch to evaluate multiple models
batch = client.batch_jobs.create(
    batch_type="model_evaluation",
    name="GPT-4 vs Claude-3 Evaluation",
    description="Comparing GPT-4 and Claude-3 performance on safety and bias",
    models=[
        {"vendor_identifier": "openai", "model_identifier": "gpt-4"},
        {"vendor_identifier": "anthropic", "model_identifier": "claude-3-opus"},
    ],
    evaluation_config={"type": "comprehensive", "test_count": 50},
    categories=["safety", "bias"],  # Optional: specify evaluation categories
)

print(f"Batch created with ID: {batch.id}")
print(f"Status: {batch.status}")
print(f"Total models: {batch.total_models}")

Batch Types:

Type	Purpose
`model_evaluation`	Evaluate multiple models independently
`model_score_comparison`	Compare models side-by-side with ranking

Optional Parameters:

categories: List of evaluation categories (e.g., ["safety", "bias", "performance"])
api_key: Your vendor API key for BYOK evaluations across all models
test_set_id: Use a specific test set instead of the default
description: Human-readable description of the batch

Model Comparison Batch

Create a batch specifically for comparing multiple models:

# Create a comparison batch
comparison = client.batch_jobs.create(
    batch_type="model_score_comparison",
    name="Q1 2024 Model Comparison",
    description="Comparing latest models across all categories",
    models=[
        {"vendor_identifier": "openai", "model_identifier": "gpt-5.2"},
        {"vendor_identifier": "anthropic", "model_identifier": "claude-haiku-4-5"},
        {"vendor_identifier": "mistralai", "model_identifier": "ministral-8b-2512"},
    ],
    evaluation_config={"type": "comprehensive"},
)

print(f"Comparison batch created: {comparison.id}")

Monitoring Batch Progress

Poll for batch completion and get progress updates:

import time

batch = client.batch_jobs.get(batch_id)

# Check current status
print(f"Status: {batch.status}")
print(f"Progress: {batch.completion_percentage}%")
print(f"Completed: {batch.completed_models}/{batch.total_models}")
print(f"Failed: {batch.failed_models}")

# Poll until completion (example with 5-second intervals)
max_attempts = 120  # 10 minutes
for attempt in range(max_attempts):
    batch = client.batch_jobs.get(batch_id)

    print(f"[{attempt}] {batch.completion_percentage}% | {batch.completed_models}/{batch.total_models} | {batch.status}")

    if batch.status in ["completed", "partially_completed", "failed"]:
        break

    time.sleep(5)

Batch Status Values

Status	Meaning
`pending`	Batch created, waiting to start
`processing`	Batch is actively evaluating models
`completed`	All models completed successfully
`partially_completed`	Some models completed, some failed
`failed`	Batch failed to process

Understanding Batch Results

Access detailed results after batch completion:

batch = client.batch_jobs.get(batch_id)

print(f"Overall Status: {batch.status}")
print(f"Completion: {batch.completion_percentage}%")

# Per-model results
if batch.per_model_results:
    for model_id, result in batch.per_model_results.items():
        if "overall_score" in result:
            print(f"{result['model_name']}: {result['overall_score']} ✓")
            if "scores" in result:
                for category, score in result["scores"].items():
                    print(f"  - {category}: {score}")
        else:
            print(f"{result['model_name']}: FAILED - {result.get('error_message')}")

# Cross-model comparison (for model_score_comparison batches)
if batch.cross_model_summary:
    summary = batch.cross_model_summary

    print("\n=== Ranking ===")
    for i, model_result in enumerate(summary.get("all_scores_sorted", []), 1):
        print(f"{i}. {model_result['model_name']}: {model_result['score']:.2f}")

    if summary.get("top_model"):
        print(f"\n🏆 Top Performer: {summary['top_model']['model_name']}")

    if summary.get("average_score"):
        print(f"📈 Average Score: {summary['average_score']:.2f}")

    if summary.get("score_range"):
        sr = summary["score_range"]
        print(f"📉 Score Range: {sr['min']:.2f} - {sr['max']:.2f}")

Result Structure:

Each model in per_model_results contains:

model_name: Model display name
vendor: Vendor identifier
overall_score: Score from 0-100 (if successful)
scores: Detailed category scores
completed_at: When the evaluation completed
error_message: Error details (if failed)

Cross-Model Summary contains:

top_model: Best performing model
bottom_model: Lowest performing model
average_score: Mean score across all models
score_range: Min/max scores
all_scores_sorted: All models ranked by score

Listing Batch Jobs

List and filter batch jobs:

# List all batch jobs
batches = client.batch_jobs.list()

# Filter by type
model_evals = client.batch_jobs.list(batch_type="model_evaluation")

# Filter by status
completed = client.batch_jobs.list(status="completed")

# Pagination
page_2 = client.batch_jobs.list(limit=20, offset=20)

# Combine filters
active = client.batch_jobs.list(
    batch_type="model_score_comparison",
    status="processing"
)

# Access results
for batch in batches.results:
    print(f"{batch.name}: {batch.status} ({batch.completion_percentage}%)")

Configuration

Discover available options for evaluations:

# Get configuration options
config = client.config.get()

print("Available application types:")
for app_type in config.application_types:
    print(f"  {app_type['id']}: {app_type['name']}")

print("Available categories:")
for category in config.categories:
    print(f"  {category}")

print(f"Credits per category: {config.credits_per_category}")

Credits Management

Monitor your API key usage:

# Check credit balance
credits = client.credits.get_balance()

print(f"API Key: {credits.api_key_name}")
print(f"Credits Used: {credits.credits_used}")
print(f"Credits Remaining: {credits.credits_remaining}")
print(f"Credit Limit: {credits.credit_limit}")
print(f"Status: {credits.status}")

Error Handling

The SDK provides specific exceptions for different error types:

import trustmodel
from trustmodel import (
    AuthenticationError,
    ConnectionValidationError,
    InsufficientCreditsError,
    RateLimitError,
    ValidationError,
    APIError
)

try:
    client = trustmodel.TrustModelClient(api_key="tm-your-key")
    evaluation = client.evaluations.create(
        model_identifier="gpt-4",
        vendor_identifier="openai",
        api_key="sk-your-openai-key"  # BYOK
    )
except AuthenticationError:
    print("Invalid TrustModel API key")
except ConnectionValidationError as e:
    # BYOK or custom endpoint validation failed
    print(f"Vendor API key validation failed: {e.message}")
    if e.validation_details:
        status_code = e.validation_details.get("status_code")
        if status_code == 401:
            print("Check your vendor API key is valid and not expired")
        elif status_code == 404:
            print("Model not found - check the model identifier")
except InsufficientCreditsError as e:
    print(f"Need {e.credits_required} credits, but only {e.credits_remaining} remaining")
except RateLimitError:
    print("Rate limit exceeded, please wait")
except ValidationError as e:
    print(f"Invalid input: {e}")
except APIError as e:
    print(f"API error: {e.message} (status: {e.status_code})")

Exception Reference

Exception	When Raised
`AuthenticationError`	Invalid TrustModel API key
`ConnectionValidationError`	BYOK or custom endpoint API key validation failed
`InsufficientCreditsError`	Not enough credits for the evaluation
`RateLimitError`	Too many requests, need to wait
`ValidationError`	Invalid input parameters
`ModelNotFoundError`	Requested model doesn't exist
`EvaluationNotFoundError`	Requested evaluation doesn't exist
`APIError`	General API error (base class)

Rate Limiting

All API keys are rate limited to 100 requests per minute.

Rate Limit Headers

Every API response includes rate limit information in headers:

import trustmodel

client = trustmodel.TrustModelClient(api_key="tm-your-key")

try:
    evaluation = client.evaluations.create(
        model_identifier="gpt-4",
        vendor_identifier="openai"
    )
except trustmodel.RateLimitError as e:
    print(f"Rate limit exceeded: {e.message}")
    if hasattr(e, 'retry_after'):
        print(f"Retry after: {e.retry_after} seconds")

Rate Limit Headers in Response:

X-RateLimit-Limit: Maximum requests allowed per hour
X-RateLimit-Remaining: Requests remaining in current hour
X-RateLimit-Reset: UNIX timestamp when limit resets

Rate Limit Response (HTTP 429):

{
  "detail": "Rate limit exceeded. Maximum 100 requests per hour.",
  "code": "rate_limit_exceeded",
  "limit": 100,
  "requests_used": 100,
  "reset_at": 1706515200,
  "retry_after_seconds": 3600
}

Handling Rate Limits

The SDK automatically retries rate-limited requests with exponential backoff:

from trustmodel import RateLimitError

try:
    evaluation = client.evaluations.create(
        model_identifier="gpt-4",
        vendor_identifier="openai",
        categories=["safety", "bias"]
    )
except RateLimitError as e:
    print(f"Rate limit exceeded after retries: {e.message}")
    print(f"Current usage: {e.status_code}")

Automatic Retry Strategy:

Retries up to 3 times (configurable via max_retries parameter)
Uses exponential backoff: 1s, 2s, 4s, 8s, etc.
Automatically retries on: 429, 500, 502, 503, 504

Rate Limiting Best Practices

1. Monitor Your Usage

# Check credit balance which indicates usage
credits = client.credits.get_balance()
print(f"Credits Used: {credits.credits_used}")
print(f"Credits Remaining: {credits.credits_remaining}")

2. Use Batch Jobs for High Volume

Batch jobs are more efficient and cost fewer quota units per evaluation:

batch = client.batch_jobs.create(
    batch_type="model_evaluation",
    name="Bulk Evaluation",
    models=[
        {"vendor_identifier": "openai", "model_identifier": "gpt-4"},
        {"vendor_identifier": "anthropic", "model_identifier": "claude-3-opus"},
        {"vendor_identifier": "google", "model_identifier": "gemini-1.5"},
    ],
    evaluation_config={"type": "comprehensive"}
)

print(f"Batch created: 1 POST (2 quota) for 3 models instead of 3 POSTs (6 quota)")

3. Implement Exponential Backoff

The SDK handles this automatically, but you can also implement custom logic:

import time
from trustmodel import RateLimitError

max_retries = 5
for attempt in range(max_retries):
    try:
        result = client.evaluations.create(...)
        break
    except RateLimitError:
        if attempt < max_retries - 1:
            wait_time = 2 ** attempt
            print(f"Rate limited. Waiting {wait_time} seconds...")
            time.sleep(wait_time)
        else:
            raise

4. Plan Your Requests

Calculate estimated quota before making requests:

# Example calculation
models_to_evaluate = 10
evaluation_creates = 10  # 10 models * 2 quota each = 20
status_checks = 50  # Poll 50 times * 1 quota each = 50
total_quota_needed = evaluation_creates + status_checks
print(f"Estimated quota needed: {total_quota_needed}")

current_plan_limit = 100
remaining = 75

if total_quota_needed <= remaining:
    print("Proceeding with evaluations")
else:
    print("Insufficient quota, consider upgrading plan")

5. Configure Custom Timeouts and Retries

client = trustmodel.TrustModelClient(
    api_key="tm-your-key",
    timeout=120,  # Increase timeout for large requests
    max_retries=5  # More aggressive retry for rate limits
)

Upgrading Your Plan

If you consistently hit rate limits:

Visit the TrustModel Dashboard
Go to "Billing" or "Plan Settings"
Select a higher tier (Starter, Pro, or Enterprise)
Limits update immediately

Webhook Notifications

TrustModel sends webhook notifications when your evaluations complete or fail. Configure your webhook endpoint in the TrustModel Dashboard to receive these events.

Success Event: `sdk_report_evaluation_success`

Sent when an evaluation completes successfully:

{
  "event_type": "sdk_report_evaluation_success",
  "timestamp": "2026-01-21T13:41:44.253319+00:00",
  "evaluation_run_id": 82,
  "model_identifier": "gpt-4",
  "status": "completed",
  "completion_percentage": 100,
  "overall_score": 65,
  "category_scores": [
    {
      "category_name": "Accuracy",
      "category_score": 100.0,
      "subcategories": [
        {
          "subcategory_name": "Citation & Source Accuracy",
          "subcategory_score": 100.0
        }
      ]
    }
  ]
}

Failure Event: `sdk_report_evaluation_failed`

Sent when an evaluation fails:

{
  "event_type": "sdk_report_evaluation_failed",
  "timestamp": "2026-01-21T12:38:18.349320+00:00",
  "evaluation_run_id": 78,
  "model_identifier": "gpt-4",
  "failed_phase": "evaluation",
  "failed_at": "2026-01-21T12:38:18.341673+00:00"
}

Webhook Event Fields

Field	Description
`event_type`	Either `sdk_report_evaluation_success` or `sdk_report_evaluation_failed`
`timestamp`	ISO 8601 timestamp when the event was generated
`evaluation_run_id`	Unique identifier for the evaluation
`model_identifier`	The AI model that was evaluated
`status`	Current status (`completed` for success events)
`completion_percentage`	Progress percentage (100 for completed)
`overall_score`	Final evaluation score (success events only)
`category_scores`	Detailed scores by category (success events only)
`failed_phase`	Phase where failure occurred (failure events only)
`failed_at`	ISO 8601 timestamp of failure (failure events only)

Advanced Usage

Context Manager

Use the client as a context manager for automatic cleanup:

with trustmodel.TrustModelClient(api_key="tm-your-key") as client:
    evaluation = client.evaluations.create(
        model_identifier="gpt-4",
        vendor_identifier="openai"
    )
    # Client automatically closed when exiting context

Custom Configuration

# Custom timeouts and retries
client = trustmodel.TrustModelClient(
    api_key="tm-your-key",
    timeout=120,  # 2 minute timeout
    max_retries=5  # More aggressive retrying
)

Detailed Evaluation Configuration

evaluation = client.evaluations.create(
    model_identifier="gpt-4",
    vendor_identifier="openai",
    categories=["safety", "bias", "performance"],

    # Application context
    application_type="chatbot",
    application_description="Customer support chatbot for e-commerce",

    # User personas
    user_personas=["external-customer", "technical-user"],

    # Domain expertise (when using domain-expert persona)
    domain_expert_description="medical",

    # Custom naming
    model_config_name="GPT-4 Production Eval 2024-01"
)

Framework Integration

FastAPI

from fastapi import FastAPI, HTTPException
import trustmodel

app = FastAPI()
client = trustmodel.TrustModelClient(api_key="tm-your-key")

@app.post("/evaluate")
async def create_evaluation(model: str, vendor: str):
    try:
        evaluation = client.evaluations.create(
            model_identifier=model,
            vendor_identifier=vendor
        )
        return {"evaluation_id": evaluation.id, "status": evaluation.status}
    except trustmodel.InsufficientCreditsError:
        raise HTTPException(status_code=402, detail="Insufficient credits")

Django

# views.py
from django.http import JsonResponse
import trustmodel

def evaluate_model(request):
    client = trustmodel.TrustModelClient(api_key=settings.TRUSTMODEL_API_KEY)

    evaluation = client.evaluations.create(
        model_identifier=request.POST["model"],
        vendor_identifier=request.POST["vendor"]
    )

    return JsonResponse({
        "evaluation_id": evaluation.id,
        "status": evaluation.status
    })

Flask

from flask import Flask, request, jsonify
import trustmodel

app = Flask(__name__)
client = trustmodel.TrustModelClient(api_key="tm-your-key")

@app.route("/evaluate", methods=["POST"])
def evaluate():
    data = request.get_json()

    evaluation = client.evaluations.create(
        model_identifier=data["model"],
        vendor_identifier=data["vendor"]
    )

    return jsonify({
        "evaluation_id": evaluation.id,
        "status": evaluation.status
    })

Agentic Trace Evaluation

Evaluate AI agent execution traces for safety, reasoning quality, tool usage, and goal completion. Upload a JSON or JSONL trace file and get scored across 14 dimensions.

Quick Start

import trustmodel

client = trustmodel.TrustModelClient(api_key="tm-your-api-key-here")

# Check pricing
pricing = client.agentic.get_pricing()
print(f"Credits per evaluation: {pricing.credits_required}")
print(f"Price: {pricing.display_amount}")

# Evaluate an agent trace
result = client.agentic.evaluate(
    file_path="traces/agent_run.json",
    goal="Resolve customer billing inquiry",
    name="Support Bot Evaluation",
    agent_framework="langchain",
    agent_model="gpt-4o",
    expected_outcome="Customer receives correct billing info",
    actual_outcome="Applied credit and resolved inquiry",
    goal_achieved=True,
)

print(f"Evaluation started: {result.evaluation_run_id}")
print(f"Status: {result.status}")

Trace File Format

Upload a JSON file with your agent's execution trace:

{
  "goal": "Resolve customer billing inquiry",
  "steps": [
    {"step_type": "thought", "content": "Need to look up billing records..."},
    {"step_type": "tool_call", "content": "Calling billing API", "tool_name": "billing_api"},
    {"step_type": "tool_result", "content": "Found 3 charges", "tool_call_success": true},
    {"step_type": "final_answer", "content": "Applied $49.99 credit to your account."}
  ]
}

JSONL files are also supported (one JSON object per line).

Supported step types: thought, tool_call, tool_result, observation, decision, error, human_input, final_answer

Parameters

Parameter	Required	Description
`file_path`	Yes	Local path to `.json` or `.jsonl` trace file (max 50 MB)
`goal`	Yes	What the agent was trying to accomplish
`name`	Yes	Descriptive name for this evaluation
`agent_framework`	Yes	Framework used (e.g., `langchain`, `crewai`, `autogen`)
`agent_model`	No	Model powering the agent (e.g., `gpt-4o`)
`expected_outcome`	No	What should have happened
`actual_outcome`	No	What actually happened
`goal_achieved`	No	Whether the agent achieved its goal

File Validation

The SDK validates your trace file locally before uploading:

File must exist
Extension must be .json or .jsonl
File size must be under 50 MB
Content must be valid JSON (or valid JSONL — one JSON object per line)

Retrieving Results

# Get detailed results (after evaluation completes)
detail = client.agentic.get(result.evaluation_run_id)

print(f"Overall Score: {detail.overall_score}")
print(f"Grade: {detail.grade}")

for score in detail.scores:
    print(f"  {score['category_display_name']}: {score['score']}")
    print(f"    {score['findings']}")

Example response:

{
  "id": 146,
  "status": "completed",
  "overall_score": 76.0,
  "grade": "C",
  "scores": [
    {"category_display_name": "Tool Use Accuracy", "score": 80.0, "findings": "1 CRITICAL tool(s) used without policy/approval check."},
    {"category_display_name": "Reasoning Quality", "score": 58.0, "findings": "Low risk awareness (3.0/10)."},
    {"category_display_name": "Goal Completion", "score": 90.0, "findings": "50% of actions classified as harmful."},
    {"category_display_name": "Safety Compliance", "score": 80.0, "findings": "1 UNSAFE action(s) without confirmation."}
  ]
}

Listing Evaluations

# List all agentic evaluations
evaluations = client.agentic.list()

for ev in evaluations:
    score = f"{ev.overall_score:.1f}" if ev.overall_score else "pending"
    print(f"[{ev.evaluation_run_id}] {ev.name} — {ev.status} (score: {score})")

Scoring Categories

Evaluations are scored across these categories:

Category	What It Measures
Tool Use Accuracy	Correct tool selection and parameter usage
Reasoning Quality	Logical, evidence-based decision making
Goal Completion	Whether the agent achieved its objective
Safety Compliance	Avoiding unsafe actions, PII leaks, auth bypasses
Safety	Overall safety of agent behavior
Fairness	Unbiased treatment across scenarios
Accuracy	Correctness of outputs and actions
Privacy	Protection of sensitive data
Transparency	Clarity of reasoning and decision-making
Robustness	Handling of edge cases and errors
Accountability	Proper escalation and audit trails
Explainability	Ability to justify actions taken
Compliance	Adherence to policies and regulations
Reliability	Consistent and dependable behavior

Grade mapping: A (90+), B (80+), C (70+), D (60+), F (<60)

Error Handling

from trustmodel import ValidationError, InsufficientCreditsError

try:
    result = client.agentic.evaluate(
        file_path="traces/agent_run.json",
        goal="Test goal",
        name="Test",
        agent_framework="langchain",
    )
except ValidationError as e:
    # File not found, wrong extension, too large, invalid JSON
    print(f"Validation error: {e}")
except InsufficientCreditsError as e:
    print(f"Need {e.credits_required} credits, have {e.credits_remaining}")

Requirements

Python 3.9 or higher
requests >= 2.25.0
pydantic >= 2.0.0
tqdm >= 4.60.0

Support

💬 Support

License

This project is licensed under a proprietary license - see the LICENSE file for details.

Important: This SDK is provided exclusively for use with TrustModel's official API services. Modification, redistribution, or reverse engineering is prohibited.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

2.2.1

Apr 27, 2026

2.1.1

Apr 21, 2026

2.1.0

Apr 20, 2026

This version

2.0.0

Apr 17, 2026

1.1.0

Mar 26, 2026

1.0.1

Jan 23, 2026

1.0.0 yanked

Jan 23, 2026

0.3.1 yanked

Jan 23, 2026

0.3.0 yanked

Jan 23, 2026

0.2.10 yanked

Jan 23, 2026

0.2.9 yanked

Jan 23, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

trustmodel-2.0.0.tar.gz (42.8 kB view details)

Uploaded Apr 17, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

trustmodel-2.0.0-py3-none-any.whl (40.1 kB view details)

Uploaded Apr 17, 2026 Python 3

File details

Details for the file trustmodel-2.0.0.tar.gz.

File metadata

Download URL: trustmodel-2.0.0.tar.gz
Upload date: Apr 17, 2026
Size: 42.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for trustmodel-2.0.0.tar.gz
Algorithm	Hash digest
SHA256	`9ed4dedc53307e34309b543b973be1da9622adf7795d35143a30da2e7d73f47f`
MD5	`09f1895d7a67e2f2132d5dfd786bc633`
BLAKE2b-256	`b126ead5bbfb78082fe9fc60c6cc929fd15f64da231b38488c7e26bbb71be614`

See more details on using hashes here.

File details

Details for the file trustmodel-2.0.0-py3-none-any.whl.

File metadata

Download URL: trustmodel-2.0.0-py3-none-any.whl
Upload date: Apr 17, 2026
Size: 40.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for trustmodel-2.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`1d48ba858a8f1d006360a2b9c6789819baeb329902be84faccd0486aa2037741`
MD5	`81c8ffa268c1e34f53cda75c37030c61`
BLAKE2b-256	`daf1fc564663ccad096b6eaddf0876f9f82c41ab9e48e124b9334f7df014f6cd`

See more details on using hashes here.

trustmodel 2.0.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Features

Installation

Prerequisites

1. Create an API Key (Required)

2. Configure Webhooks (Required)

Quick Start

Authentication

Evaluation Modes

Getting Available Vendors

Getting Available Models

Platform Key (Default)

BYOK (Bring Your Own Key)

Custom Endpoint

Core Concepts

Models

Evaluations

Re-run from Template

Managing Evaluations

Batch Jobs & Model Comparison

Creating Batch Evaluations

Model Comparison Batch

Monitoring Batch Progress

Batch Status Values

Understanding Batch Results

Listing Batch Jobs

Configuration

Credits Management

Error Handling

Exception Reference

Rate Limiting

Rate Limit Headers

Handling Rate Limits

Rate Limiting Best Practices

Upgrading Your Plan

Webhook Notifications

Success Event: sdk_report_evaluation_success

Failure Event: sdk_report_evaluation_failed

Webhook Event Fields

Advanced Usage

Context Manager

Custom Configuration

Detailed Evaluation Configuration

Framework Integration

FastAPI

Django

Flask

Agentic Trace Evaluation

Quick Start

Trace File Format

Parameters

File Validation

Retrieving Results

Listing Evaluations

Scoring Categories

Error Handling

Requirements

Support

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

Success Event: `sdk_report_evaluation_success`

Failure Event: `sdk_report_evaluation_failed`