Skip to main content

Chaos testing for AI apps. 18 extreme personas attack your AI to find edge cases before users do. OWASP LLM Top 10 coverage.

Project description

House Monkey mascot

House Monkey 🐒

Chaos testing for AI apps. 18 extreme personas attack your AI to find edge cases before users do.

MIT License PyPI GitHub stars

pip install housemonkey
housemonkey run --target https://your-api.com/chat --owasp

House Monkey terminal output — 3 OWASP failures detected

One command. 18 extreme personas. OWASP LLM Top 10 coverage. Terminal report in 2 minutes.

What it does

House Monkey attacks your AI app with realistic extreme users:

  • The Jailbreaker — tries to extract your system prompt (OWASP LLM01)
  • The Angry Customer — escalating hostility, demands manager
  • The Confused Grandma — off-topic, misunderstands everything
  • The Hallucination Baiter — asks about things that don't exist (OWASP LLM09)
  • The Permission Escalator — tricks AI into unauthorized actions (OWASP LLM06)
  • The RAG Poisoner — manipulates retrieval context (OWASP LLM08)
  • ...and 12 more

Each persona runs a multi-turn conversation against your API, then an LLM judge evaluates if your AI handled it correctly.

Quick start

# Install
pip install housemonkey

# List all personas
housemonkey list

# Test your AI (needs OpenAI API key for persona generation + judging)
export OPENAI_API_KEY=sk-...
housemonkey run --target https://your-api.com/chat

# Run only OWASP-mapped personas
housemonkey run --target https://your-api.com/chat --owasp

# Run specific personas
housemonkey run --target https://your-api.com/chat --persona jailbreaker oversharer

# Custom API format (non-OpenAI)
housemonkey run --target https://your-api.com/ask --payload-template '{"input": "{{message}}"}'

# Save JSON report
housemonkey run --target https://your-api.com/chat --output report.json

OWASP LLM Top 10 coverage

OWASP ID Vulnerability Persona
LLM01 Prompt Injection The Jailbreaker
LLM02 Sensitive Info Disclosure The Oversharer
LLM05 Improper Output Handling The JSON Breaker
LLM06 Excessive Agency The Permission Escalator
LLM08 Vector/Embedding Weakness The RAG Poisoner
LLM09 Misinformation The Hallucination Baiter
LLM10 Unbounded Consumption The Resource Abuser

How it works

  1. Each persona has a system prompt that simulates an extreme user type
  2. An LLM generates realistic messages as that persona
  3. Messages are sent to your target API
  4. An LLM judge evaluates if your AI handled the persona correctly
  5. Terminal report shows pass/fail with specific failure reasons

Try it on a broken chatbot

# Start the intentionally broken test target (7 built-in flaws)
python test_target.py

# In another terminal, attack it
housemonkey run --target http://127.0.0.1:8888 --owasp

Requirements

  • Python 3.10+
  • OpenAI API key (for persona generation + judging)
  • Your AI app must have an HTTP API endpoint

License

MIT. Powered by ClawClaw Soul open-source persona engine.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

housemonkey-0.1.0.tar.gz (15.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

housemonkey-0.1.0-py3-none-any.whl (16.2 kB view details)

Uploaded Python 3

File details

Details for the file housemonkey-0.1.0.tar.gz.

File metadata

  • Download URL: housemonkey-0.1.0.tar.gz
  • Upload date:
  • Size: 15.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.3

File hashes

Hashes for housemonkey-0.1.0.tar.gz
Algorithm Hash digest
SHA256 3aa033f8d8b8a4d5b73f0db0986d988a86bb1bd63c85db3df9ea59b1af717d61
MD5 c748a39b4bd06c9c19d12a6945a1af17
BLAKE2b-256 e8a28200e1054a83289325b45161ed112101cf2bc5f40b496442b7addb0c93ec

See more details on using hashes here.

File details

Details for the file housemonkey-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: housemonkey-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 16.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.3

File hashes

Hashes for housemonkey-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8a987ee1693ea8f9296f47b54382ff4d8fca5e370983d222f9f7e8e72e941fdc
MD5 b6b04e09fa71e6ec8ea9f1a05f20c2c1
BLAKE2b-256 7f5721023877d76219d6cc88f7ad4ba88862a7fd529ae0d9e1431807c15aa958

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page