Skip to main content

Contract testing for LLM tool calls.

Project description

callspec

Contract testing for LLM tool calls.

pip install callspec
from callspec import Callspec, ToolCall, ToolCallTrajectory
from callspec.providers.mock import MockProvider

provider = MockProvider(
    response_fn=lambda p, m: "Booked flight",
    tool_calls=[
        {"name": "search_flights", "arguments": {"origin": "SFO", "dest": "JFK"}},
        {"name": "book_flight", "arguments": {"flight_id": "UA123"}},
    ],
)

v = Callspec(provider)
response = provider.call("Book me a flight from SFO to JFK")
trajectory = ToolCallTrajectory.from_provider_response(response)

result = (
    v.assert_trajectory(trajectory)
    .calls_tools_in_order(["search_flights", "book_flight"])
    .does_not_call("cancel_flight")
    .argument_not_empty("search_flights", "origin")
    .run()
)
assert result.passed

Your agent calls tools. Those calls are the contract between your code and the model. When you swap models, update a prompt, or change your retrieval pipeline, callspec tells you whether the agent still calls the right tools, in the right order, with the right arguments. No LLM-as-judge. No API calls for evaluation. Deterministic pass/fail that runs in CI.

Docs

License

Apache 2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

callspec-0.1.0.tar.gz (124.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

callspec-0.1.0-py3-none-any.whl (100.0 kB view details)

Uploaded Python 3

File details

Details for the file callspec-0.1.0.tar.gz.

File metadata

  • Download URL: callspec-0.1.0.tar.gz
  • Upload date:
  • Size: 124.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for callspec-0.1.0.tar.gz
Algorithm Hash digest
SHA256 bb683868f551b10bd1ce858d97c907e40574ffd484a81f4a2511583c52473c05
MD5 5122ad10c43bbf46defb519e2033e63b
BLAKE2b-256 7f74f96898cdb2363631ee3363ba330603f4f29a767b5ef35e9f1f8168d16245

See more details on using hashes here.

File details

Details for the file callspec-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: callspec-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 100.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for callspec-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ee0c31d891d39a1cabba5fbdef0bd114c174d543ebe37a0b8e6ad1ca8b105a30
MD5 e75258fd1744d141cf3c9030c9722a90
BLAKE2b-256 4973f2eb3dce028361dcebefd904052d3821e1decefd8fb0fc5f1a40ede401d5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page