Skip to main content

Python SDK for Agent Diff - test AI agents and train models against replicas of services

Project description

Agent Diff Python SDK

Python SDK for testing AI agents against isolated replicas of production services.

Installation

uv add agent-diff
# or
pip install agent-diff

Quick Start

from agent_diff import AgentDiff

# Self-hosted (defaults to http://localhost:8000)
client = AgentDiff()

# With authentication 
client = AgentDiff(
    api_key="your-api-key",
    base_url="https://your-instance.com"
)

# 1. Create an isolated environment
env = client.init_env(
    templateService="slack",
    templateName="slack_default",
    impersonateUserId="U123456",
    ttlSeconds=1800
)


# 2. Take before snapshot of the environment 
run = client.start_run(envId=env.environmentId)

# 3. Agents does it's thing to replica
# (Use env.environmentUrl to call the service API)

# 4. Compute the diff
diff = client.diff_run(runId=run.runId)

# Inspect changes
diff.diff['inserts']   # New records
diff.diff['updates']   # Modified records
diff.diff['deletes']   # Deleted records

# 5. Cleanup
client.delete_env(envId=env.environmentId)

Environments

Create isolated, ephemeral replicas of services:

env = client.init_env(
    templateService="slack",
    templateName="slack_default",
    impersonateUserId="U123",
    ttlSeconds=3600
)

# Access environment details
env.environmentId
env.environmentUrl
env.expiresAt

# Delete when done
client.delete_env(env.environmentId)

Test Suites

To run evaluations:

suite = client.get_test_suite("slack-bench")
# Returns: {"tests": [{"id": "...", "prompt": "Send hello to #general"}, ...]}


evaluation_results = []

for test in suite['tests']:
    prompt = test['prompt']
    test_id = test['id']

    env = client.init_env(testId = test_id)
    run = client.start_run(envId = env.environmentId, testId = test_id)

    #your LLM/ Agent function - you need to proxy the request on your own for endpoint recived in env.environmentUrl
    ...
    response = await Runner.run(triage_agent, prompt)
    ... 

    evaluation_result = client.evaluate_run(run.runId) #returns score runId, status and Score (0/1)

    evaluation_results.append(evaluation_result) 

    client.delete_env(envId=env.environmentId)

Templates

List and create environment templates:

# List available templates
templates = client.list_templates()

# Create custom template - you can populate the replica and turn it into a template with custom data
custom = client.create_template_from_environment(
    environmentId=env.environmentId,
    service="slack",
    name="my_template",
    description="Custom template",
    visibility="private"  # "private" means only you can view the template
)

License

MIT License - see LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agent_diff-0.1.0.tar.gz (6.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

agent_diff-0.1.0-py3-none-any.whl (6.8 kB view details)

Uploaded Python 3

File details

Details for the file agent_diff-0.1.0.tar.gz.

File metadata

  • Download URL: agent_diff-0.1.0.tar.gz
  • Upload date:
  • Size: 6.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.2

File hashes

Hashes for agent_diff-0.1.0.tar.gz
Algorithm Hash digest
SHA256 284e05d3a5654856b1af324eeb48a3fd7b2646c3615b5cd01956a8de2653bffe
MD5 00f57ea015c8bd9b45de15f58363c26a
BLAKE2b-256 2dbbbca742ab358ab98ace507fb744f498eed142a0a5bd9097cd38e942607561

See more details on using hashes here.

File details

Details for the file agent_diff-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for agent_diff-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 497c8d831b6f34332e0e09efc9d4841ff297a950079f25b95ecf6c0ee5ad68f5
MD5 7600176da1fc118e06fb73e055ace6e5
BLAKE2b-256 655ca899c6488a9fb829425df0cbd21d848aad67a0ddcd1e836c4ec7591125a8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page