Skip to main content

MCP server exposing llm-behave behavioral regression testing as Claude tools

Project description

mcp-llm-behave

MCP server exposing llm-behave behavioral regression testing as callable tools inside Claude Desktop, Claude Code, and any MCP-compatible client.

Runs offline — no API calls, no external services. Uses sentence-transformers for embedding-based similarity.


Tools

Tool What it does
run_behavior_test Assert that a model output matches an expected behavior description
compare_outputs Detect semantic drift between a baseline and a new LLM output
list_builtin_behaviors Browse the built-in behavioral checks shipped with llm-behave

Quickstart — Claude Desktop

Add to your claude_desktop_config.json (no install needed, uvx handles it):

{
  "mcpServers": {
    "mcp-llm-behave": {
      "command": "uvx",
      "args": ["mcp-llm-behave"]
    }
  }
}

Config file location:

  • macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
  • Windows: %APPDATA%\Claude\claude_desktop_config.json

Restart Claude Desktop after editing. The first run downloads the sentence-transformers model (~80 MB) once and caches it.


Quickstart — Claude Code (CLI)

claude mcp add mcp-llm-behave uvx mcp-llm-behave

Install via pip / uv

pip install mcp-llm-behave
# or
uv add mcp-llm-behave

Run the server directly:

mcp-llm-behave

Tool reference

run_behavior_test

Check whether a model output semantically satisfies an expected behavior.

Arguments

Name Type Description
prompt str The original prompt sent to the LLM (used for context/logging)
expected_behavior str Plain-language description of what the output should do
model_output str The actual text returned by the LLM

Returns

{
  "score": 0.82,
  "passed": true,
  "threshold": 0.45
}

compare_outputs

Detect semantic drift between a known-good baseline and a new output. Useful in CI after prompt or model changes.

Arguments

Name Type Description
baseline str The reference / previous LLM output
candidate str The new LLM output to compare

Returns

{
  "similarity_score": 0.91,
  "drift_detected": false,
  "interpretation": "Outputs are nearly identical — no drift."
}

list_builtin_behaviors

Returns the catalog of pre-defined behavioral checks available in llm-behave, with method signatures and descriptions.

Returns — list of objects with name, method, and description keys.


Requirements

  • Python 3.10+
  • No API keys needed
  • ~80 MB disk for the sentence-transformers model (downloaded once on first run)

Development

git clone https://github.com/Swanand33/mcp_llm_behave
cd mcp-llm-behave
uv sync
uv run pytest

License

MIT — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mcp_llm_behave-0.1.0.tar.gz (7.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mcp_llm_behave-0.1.0-py3-none-any.whl (6.4 kB view details)

Uploaded Python 3

File details

Details for the file mcp_llm_behave-0.1.0.tar.gz.

File metadata

  • Download URL: mcp_llm_behave-0.1.0.tar.gz
  • Upload date:
  • Size: 7.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.7 {"installer":{"name":"uv","version":"0.11.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for mcp_llm_behave-0.1.0.tar.gz
Algorithm Hash digest
SHA256 c918c60aa0f6f5c1b614f8cfe4ce527ebbf90d54e1d1bc731487b8d2f22d113a
MD5 fce7fea4fda235941310c17d219f7027
BLAKE2b-256 4f3681ca2efbd58fa3973edcb4added297dd27d4935ef97a84e218a96a45579c

See more details on using hashes here.

File details

Details for the file mcp_llm_behave-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: mcp_llm_behave-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 6.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.7 {"installer":{"name":"uv","version":"0.11.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for mcp_llm_behave-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 be1eca633917af1a6862ab8ea2fd3f8d97d1f2666532f4d34037be7b7b64927c
MD5 c14adbbb692a66b7f33439c930bc9fba
BLAKE2b-256 b3e79433f5ea504cb8d740c84891ba63c46084f6f6930d23fffc2b43d4d86b3e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page