Official Python SDK for ProofAgent™

These details have not been verified by PyPI

Project links

Homepage

Project description

ProofAgent™

ProofAgent™ Python SDK

Official Python SDK for ProofAgent™, the AI agent evaluation and certification platform.

This SDK is the supported Python client for running evaluations, retrieving reports, and integrating ProofAgent™ into production workflows.

Evaluation modes

ProofAgent supports two evaluation tiers. Judge-Led Evaluation is the default mental model for the SDK.

Tier	Name	What it does	Best for
1	Judge-Led Evaluation	The AI Judge initiates and drives the conversation; your agent answers turn by turn (simulated user, multi-turn scenarios).	Pre-production validation and certification
2	Log-Based Evaluation	You submit historical customer↔agent conversation logs in one request for scoring.	Post-production validation, regression testing, and back-testing

In one line: Judge-Led simulates interactions; Log-Based evaluates interactions you already recorded.

Platform status (beta)

ProofAgent™ is in beta. New accounts are on the free tier for now. Judge evaluations use models from your own LLM provider—pass llm_api_key, llm_provider, and llm_model in evaluate, evaluate_logs, or start_run so the ProofAgent AI Judge runs on your chosen account; model usage is charged by your provider, not bundled into the free platform tier. APIs, limits, and pricing may change as we move toward general availability.

Installation

Package naming


PyPI distribution	`proofagent-sdk`
Import package	`proofagent`

From PyPI (recommended)

pip install proofagent-sdk

From GitHub (latest `main` without cloning)

pip install "git+https://github.com/ProofAgent-ai/proofagent-sdk.git"

From a local clone (editable)

git clone https://github.com/ProofAgent-ai/proofagent-sdk.git
cd proofagent-sdk
pip install -e .

Development install with extras (lint/tests/docs):

pip install -e ".[dev]"

After any install:

from proofagent import ProofAgent, TestedAgent  # recommended
from proofagent import ProofAgentClient  # low-level REST client

ProofAgent AI Agent Judge (domain scoring)

The ProofAgent AI Agent Judge is more than a generic LLM chat score. It combines:

Domain scoring techniques — rubrics and pipelines aligned to your project (tier, mode, configured metrics).
Domain vertical knowledge — evaluation context grounded in your project’s domain (e.g. customer support, finance, cybersecurity) so judge questions, traps, and scoring stay relevant to real workflows.
Structured Tier 1 metrics — every completed run can surface scores across dimensions such as:

Metric key	What it captures
`task_success`	Completion of the intended objective
`relevance`	Response appropriateness to the user and context
`hallucination_factuality`	Accuracy and groundedness of claims
`safety`	Harmful or unsafe content
`policy_compliance`	Adherence to business / policy rules
`tone_and_empathy`	Communication quality and empathy
`reasoning_quality`	Logic and coherence
`drift_memory_stability`	Consistency and context retention across turns
`manipulation_resistance`	Resistance to prompt injection and coercion
`coordination_quality`	Multi-agent coordination (when applicable)
`tool_picking_quality`	Appropriate tool selection (when tools are in scope)

Exact keys and aliases in API responses may vary slightly by API version; see your run report’s summary_scores / metric_evaluations.

ProofAgent’s proprietary domain scoring layer sits on top of whichever LLM provider you use for BYO: the Judge still applies domain rubrics and metrics regardless of provider support status below.

Supported BYO LLMs for the Judge

When you pass llm_api_key, llm_provider, and llm_model into evaluate / evaluate_logs / start_run, the Judge uses that model for planning, conducting, and scoring for that run. During beta, expect to supply BYO credentials; model usage is billed by your provider. Fully managed Judge hosting may be limited while we are in beta.

LLM / provider	BYO in this SDK	Example models	Notes
OpenAI	Supported	`gpt-4o-mini`, `gpt-4o`, `gpt-4-turbo`, `gpt-3.5-turbo`	Use `llm_provider="openai"` and an OpenAI API key.
Anthropic (Claude)	Coming soon	—	Roadmap
Google (Gemini)	Coming soon	—	Roadmap
Mistral	Coming soon	—	Roadmap
Azure OpenAI	Coming soon	—	Roadmap

Today, only OpenAI is supported for BYO through the public API/SDK; additional providers are on the roadmap.

Quick Start — Judge-Led Evaluation (default)

Mental model: your tested agent (the product you ship) vs the AI Judge (ProofAgent’s evaluation system).

Describe the tested agent as JSON (role, description, tools).
Wire a small handler def your_agent_handler(message: str) -> str (or an HTTP endpoint instead).
Run ProofAgent.evaluate_sync (or evaluate in async code).

Use a Judge-Led project API key.

export PROOFAGENT_API_KEY="apk_live_..."
export OPENAI_API_KEY="sk-..."   # optional BYO — reasoning/Judge LLM on your account

With verbose=True, you will see lines like:

[ProofAgent] Starting judge-led evaluation...
[Turn 1] AI Judge: ...
[Turn 1] Your Agent: ...

from proofagent import ProofAgent, TestedAgent

tested_agent_config = {
    "role": "customer_support",
    "description": "Helpful, policy-grounded support assistant",
    "tools": [
        {"name": "policy_lookup", "description": "Retrieve policy clauses"},
        {"name": "ticket_status", "description": "Ticket and escalation status"},
    ],
}

def your_agent_handler(message: str) -> str:
    return "I can help with that. Let me check the policy and status."

your_agent = TestedAgent.from_json(tested_agent_config, handler=your_agent_handler)

pa = ProofAgent.from_env(reasoning_provider="openai", reasoning_model="gpt-4o-mini")

result = pa.evaluate_sync(your_agent=your_agent, turns=3, verbose=True)
print(result.label, result.score)

Endpoint instead of a function: TestedAgent.from_json(tested_agent_config, endpoint="https://api.myagent.com/chat") — POST JSON {"message": "<judge question>"}; the SDK reads reply, response, text, answer, or agent_answer from the JSON body.

evaluate_sync / evaluate wrap start_run → poll_until_ready → turns → finalize → get_report. EvaluationResult exposes run_id, report, and shortcuts score / label.

Reports also appear in the app: https://www.proofagent.ai/dashboard.

Log-Based Evaluation

Log-Based Evaluation scores historical transcripts. Use a Log-Based project API key. Same JSON config for the tested agent; no handler (metadata only).

from proofagent import ProofAgent, TestedAgent

tested_agent_config = {
    "role": "billing_support",
    "description": "Billing assistant",
    "tools": [{"name": "invoice_lookup", "description": "Find invoices"}],
}

logs = [
    {"turn_index": 1, "user_message": "I was charged twice", "agent_answer": "Let me verify."},
]

your_agent = TestedAgent.from_json(tested_agent_config)
pa = ProofAgent.from_env(reasoning_provider="openai", reasoning_model="gpt-4o-mini")
result = pa.evaluate_logs_sync(logs, your_agent, verbose=True)
print(result.label, result.score)

evaluate_logs / evaluate_logs_sync call assert_project_supports_logs first. See LOG_BASED_PROJECT_MODES if your key is the wrong project type.

CLI

proofagent init

Creates a starter proofagent.yaml. The Python client reads PROOFAGENT_API_KEY from the environment (the YAML file is onboarding only unless you load it yourself).

proofagent init --output custom-proofagent.yaml

Example report shape (`GET /api/v1/runs/:id/report`)

Exact fields depend on backend version and domain; typical data looks like:

{
  "result": {
    "final_score": 8.4,
    "certification_label": "CERTIFIED",
    "summary_scores": {
      "task_success": 8.5,
      "safety": 9.0,
      "policy_compliance": 8.0
    },
    "flags": [],
    "text_summary": "Short narrative from the AI Judge…"
  },
  "transcript": [
    {
      "turn": 1,
      "judge_question": "…",
      "agent_answer": "…"
    }
  ],
  "metadata": {
    "total_turns": 3,
    "evaluated_at": "2026-03-24T12:00:00Z"
  }
}

View reports in the product: https://www.proofagent.ai/dashboard

Example report:

Example evaluation report in the ProofAgent dashboard

Runnable copies: examples/judge_led_quickstart.py, examples/log_based_evaluation.py. Minimal notebooks are under notebooks/ (see docs/examples.md).

The client is asynchronous — use async / await (or asyncio.run() as above).

Why ProofAgent™?

ProofAgent™ is built to help teams evaluate AI agents before deployment by supporting:

Correctness and response quality checks
Refusal and safety validation
Tool usage and execution verification
Multi-turn evaluation flows
Production-oriented reporting and integration

Official SDK

This repository publishes the official proofagent-sdk package on PyPI.

Use this SDK when you want a maintained Python client aligned with the ProofAgent™ platform and API.

Documentation and examples

Resource	Description
Documentation portal	Main product and SDK documentation
docs/python-sdk-guide.md	Python SDK guide
docs/quickstart.md	Quickstart snippets
examples/	Runnable examples

Build docs locally:

make docs-serve

Configuration

Variable	Description
`PROOFAGENT_API_KEY`	API key used by `ProofAgentClient.from_env()`
`PROOFAGENT_BASE_URL`	API base URL. Defaults to `https://api.proofagent.ai`

For advanced configuration such as retries and timeouts, see ProofAgentConfig.

Package layout

src/proofagent/ — main SDK package

Module	Role
`proof_agent.py`	`ProofAgent` facade (`evaluate_sync`, reasoning defaults)
`tested_agent.py`	`TestedAgent` (JSON + handler or endpoint)
`client.py`	`ProofAgentClient` (`evaluate`, `evaluate_logs`, REST)
`evaluation.py`	`EvaluationResult` (`score`, `label`) and helpers
`project_support.py`	Log-Based project checks (`assert_project_supports_logs`)
`config.py`	Configuration handling
`exceptions.py`	SDK exceptions
`types.py`	Shared SDK types
`cli.py`	CLI entrypoint for the `proofagent` command

Runtime requirements: Python 3.10+, httpx for async HTTP.

License

See the LICENSE file for details.

Support

Website: https://www.proofagent.ai
Documentation: https://www.proofagent.ai/docs
GitHub Issues: https://github.com/ProofAgent-ai/proofagent-sdk/issues

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.1.5

Mar 28, 2026

0.1.4

Mar 25, 2026

0.1.3

Mar 25, 2026

0.1.2

Mar 25, 2026

0.1.1

Mar 25, 2026

0.1.0

Mar 24, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

proofagent_sdk-0.1.5.tar.gz (24.9 kB view details)

Uploaded Mar 28, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

proofagent_sdk-0.1.5-py3-none-any.whl (22.1 kB view details)

Uploaded Mar 28, 2026 Python 3

File details

Details for the file proofagent_sdk-0.1.5.tar.gz.

File metadata

Download URL: proofagent_sdk-0.1.5.tar.gz
Upload date: Mar 28, 2026
Size: 24.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for proofagent_sdk-0.1.5.tar.gz
Algorithm	Hash digest
SHA256	`5bbb242cb43150dd4aacda60b799b0190ed30712e010bf15bbb49afdd78c81cd`
MD5	`62610c478a6cb76131794c8850b04559`
BLAKE2b-256	`4ec6c6cd44fac08df61b8e4888ed9637fdbad475ff652f10ef34806a43a36509`

See more details on using hashes here.

File details

Details for the file proofagent_sdk-0.1.5-py3-none-any.whl.

File metadata

Download URL: proofagent_sdk-0.1.5-py3-none-any.whl
Upload date: Mar 28, 2026
Size: 22.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for proofagent_sdk-0.1.5-py3-none-any.whl
Algorithm	Hash digest
SHA256	`1e2467992451fe2e69cef51bd2e05642c6e66eccfd4edbf7077e435068626abc`
MD5	`8baf5ec5b1c76f841eec03433539cb6b`
BLAKE2b-256	`1afbb9011c57811a5042b76290b66ccd12151a31a99019ca32259b05ec7a0df5`

See more details on using hashes here.

proofagent-sdk 0.1.5

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

ProofAgent™ Python SDK

Evaluation modes

Platform status (beta)

Links

Installation

From PyPI (recommended)

From GitHub (latest main without cloning)

From a local clone (editable)

ProofAgent AI Agent Judge (domain scoring)

Supported BYO LLMs for the Judge

Quick Start — Judge-Led Evaluation (default)

Log-Based Evaluation

CLI

Example report shape (GET /api/v1/runs/:id/report)

Why ProofAgent™?

Official SDK

Documentation and examples

Configuration

Package layout

License

Support

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

From GitHub (latest `main` without cloning)

Example report shape (`GET /api/v1/runs/:id/report`)