Pytest plugin for testing chatbots and LLM apps — prompt injection, jailbreaks, system-prompt leaks, hallucinations, brand drift.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

pardamike

These details have not been verified by PyPI

Project links

Homepage

Project description

pytest-wardenbot

Pytest plugin for testing chatbots and LLM apps — prompt injection, jailbreaks, system-prompt leaks, hallucinations, brand drift.

📖 Documentation: pardamike.github.io/pytest-wardenbot

Status: pre-release. v0.1.0 is in active development. APIs may change before the first stable release. See BUILD-PLAN.md for the roadmap and what's coming.

What it does

Run pytest against your chatbot and find out if it leaks its system prompt, complies with known jailbreaks, hallucinates business facts, or drifts from your brand voice.

Black-box. Tests run against your live chatbot via HTTP, OpenAI API, Anthropic API, or any object you write a small adapter for.
Deterministic-first. v0.1 ships 30 tests that need zero LLM API spend — regex, substring, and schema checks. Optional LLM-judge tests (DeepEval) ship as an extra for semantic checks.
Agent-ready failures. When a test fails, the failure message includes a structured Markdown remediation prompt you can paste into Cursor or Claude Code.

What "passing" means (and doesn't)

A green run means your chatbot didn't fail any of the bundled 30 attacks in the most overt way. It's a useful smoke test and a regression detector — if a deploy turns a green test red, that's a real signal to investigate.

A green run does not mean your chatbot is secure. Frontier-grade attacks are multi-turn, novel, and adapted to your specific bot — no fixed corpus catches all of them. Treat the shipped suite as a starter set: pair it with periodic red-team exercises (or our Continuous Monitoring service) for the always-on adversarial coverage CI alone can't provide.

Install

pip install pytest-wardenbot

Optional extras for LLM-judge tests or vendor-native adapters:

pip install "pytest-wardenbot[judge]"        # adds DeepEval for semantic checks
pip install "pytest-wardenbot[openai]"       # adds OpenAIChatAdapter + AsyncOpenAIChatAdapter
pip install "pytest-wardenbot[anthropic]"    # adds AnthropicMessagesAdapter + AsyncAnthropicMessagesAdapter

Quickstart (under 60 seconds)

pip install pytest-wardenbot
pytest --wardenbot-quickstart           # generates conftest.py + test_my_bot.py
export CHATBOT_URL=https://your-chatbot.example.com/chat
export CHATBOT_TOKEN=sk-...              # optional
pytest                                   # runs all shipped tests against your bot

--wardenbot-quickstart accepts an industry template:

pytest --wardenbot-quickstart=ecommerce       # adds refund/shipping fact placeholders
pytest --wardenbot-quickstart=saas-support    # adds plan/trial fact placeholders
pytest --wardenbot-quickstart=generic         # default; minimal placeholders

Then edit conftest.py to replace the TODO placeholders with your real business facts and re-run pytest. Worked examples in examples/ cover the basic HTTP setup, a custom OpenAI adapter, and a GitHub Actions workflow.

Manual setup (if you prefer)

Add this to your project's conftest.py:

import os
import pytest
from pytest_wardenbot.adapters.http import HTTPChatbotAdapter

@pytest.fixture
def chatbot():
    return HTTPChatbotAdapter(
        url="https://your-chatbot.example.com/chat",
        headers={"Authorization": f"Bearer {os.environ['CHATBOT_TOKEN']}"},
        request_field="message",
        response_field="reply",
    )

Then run the shipped tests with pytest --pyargs pytest_wardenbot.tests.

When a test fails, read the failure message, paste the agent-ready Markdown into Cursor / Claude Code, ship the fix.

What's in v0.1

Category	Count	Grading	Requires API key?
Prompt-injection / jailbreak resistance	5 prompts × 2 checks = 10	deterministic	no
System-prompt leak elicitation (dedicated extraction prompts)	3	deterministic	no
Refusal-bypass (roleplay / pretext / hypothetical framings)	3	deterministic	no
Off-topic deflection (scoped bots)	2	deterministic	no
Indirect / cross-prompt injection (XPIA)	4	deterministic	no
Encoded-payload jailbreak (Base64 / ROT13 / leet / hex)	4	deterministic	no
Multi-turn jailbreak (priming + payload, needs session-aware adapter)	3	deterministic	no
Canary-token leak (opt-in; you plant the token)	1	deterministic	no
Business-truth verification (parametrized over your facts)	user-supplied	deterministic	no
Semantic checks via DeepEval (5 factories: equivalence, brand, hallucination, off-policy, refusal quality)	user-supplied	LLM-judge	yes, with `[judge]` extra

That's 30 deterministic tests out-of-the-box (plus the opt-in canary leak test, plus your business-truth and judge lists). Tests run in under a second against a real chatbot with zero LLM API spend unless you've opted into the [judge] extra.

See BUILD-PLAN.md for the full roadmap of what's landing in v0.1 vs. deferred to v0.2 (including RAMPART for tool-using agents).

How it's different from related tools

vs Promptfoo (acquired by OpenAI in Feb 2026): Promptfoo is a developer testing CLI. We're a pytest plugin — same tool your existing test suite uses, same CI integration you already have.
vs DeepEval: DeepEval focuses on evaluation metrics (faithfulness, relevancy). We focus on adversarial security probes (jailbreak, system-prompt leak, refusal-bypass) — different problem, complementary tool. (We use DeepEval under the hood for our optional semantic checks.)
vs Garak / PyRIT: Garak and PyRIT are research-grade attack libraries. We package a curated subset as everyday pytest tests with clear failure messages.

License

Apache 2.0. See LICENSE.md.

Powered by

WardenBot AI — continuous external monitoring for AI chatbots. Powered by WardenBot AI

The pytest plugin is the free, open-source slice of our test corpus. Want continuous monitoring across all your bots with daily probes and a dashboard? Tell us about your setup — we open invites in small batches.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

pardamike

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

0.1.2

May 26, 2026

0.1.1

May 26, 2026

This version

0.1.0

May 24, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pytest_wardenbot-0.1.0.tar.gz (81.6 kB view details)

Uploaded May 24, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pytest_wardenbot-0.1.0-py3-none-any.whl (72.2 kB view details)

Uploaded May 24, 2026 Python 3

File details

Details for the file pytest_wardenbot-0.1.0.tar.gz.

File metadata

Download URL: pytest_wardenbot-0.1.0.tar.gz
Upload date: May 24, 2026
Size: 81.6 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for pytest_wardenbot-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`d120843e98d5ac1b4a2cbf211f0c15fa13f2b32ba5c7fced5b043ae7643afac8`
MD5	`59112b86d05f2cb3220b892a11c38122`
BLAKE2b-256	`277cbfc4183b0178ac1490d35b369ec371fa339bd6274117c3e296f0d11a70c9`

See more details on using hashes here.

Provenance

The following attestation bundles were made for pytest_wardenbot-0.1.0.tar.gz:

Publisher: release.yml on pardamike/pytest-wardenbot

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: pytest_wardenbot-0.1.0.tar.gz
- Subject digest: d120843e98d5ac1b4a2cbf211f0c15fa13f2b32ba5c7fced5b043ae7643afac8
- Sigstore transparency entry: 1624197046
- Sigstore integration time: May 24, 2026
Source repository:
- Permalink: pardamike/pytest-wardenbot@3aed70ae85c0b8c3fe9c186e6905f2f75d75e5c8
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/pardamike
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@3aed70ae85c0b8c3fe9c186e6905f2f75d75e5c8
- Trigger Event: push

File details

Details for the file pytest_wardenbot-0.1.0-py3-none-any.whl.

File metadata

Download URL: pytest_wardenbot-0.1.0-py3-none-any.whl
Upload date: May 24, 2026
Size: 72.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for pytest_wardenbot-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e0dd8525c1db612f38394aa7e08099eab255962c7eb6f8798bdc873b4ebebd3f`
MD5	`baac75adaadf6be0b78a54204641f24e`
BLAKE2b-256	`08534e997c1ac6a21cff7cef99e37f97350c0f8b95f72c0c41e65d98166ca067`

See more details on using hashes here.

Provenance

The following attestation bundles were made for pytest_wardenbot-0.1.0-py3-none-any.whl:

Publisher: release.yml on pardamike/pytest-wardenbot

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: pytest_wardenbot-0.1.0-py3-none-any.whl
- Subject digest: e0dd8525c1db612f38394aa7e08099eab255962c7eb6f8798bdc873b4ebebd3f
- Sigstore transparency entry: 1624197128
- Sigstore integration time: May 24, 2026
Source repository:
- Permalink: pardamike/pytest-wardenbot@3aed70ae85c0b8c3fe9c186e6905f2f75d75e5c8
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/pardamike
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@3aed70ae85c0b8c3fe9c186e6905f2f75d75e5c8
- Trigger Event: push

pytest-wardenbot 0.1.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

pytest-wardenbot

What it does

What "passing" means (and doesn't)

Install

Quickstart (under 60 seconds)

Manual setup (if you prefer)

What's in v0.1

How it's different from related tools

License

Powered by

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance