Skip to main content

LLM testing framework for validating agent behavior and tool usage

Project description

LLM Goose 🪿

LLM-powered testing, traces, and live chat for agent workflows

PyPI npm Python CI Coverage pre-commit


Goose is a Python library, CLI, and web dashboard for testing and debugging LLM agents.
Scaffold a gooseapp/, point Goose at your real query function, write goose.case(...) tests, then expose tools and live chat when you want the full dashboard loop.

Visit the landing page

What it looks like

Goose dashboard overview

Goose testing detail

Why Goose?

  • Natural-language expectations – Describe the behavior you want and let Goose validate it.
  • Tool call assertions – Check what your agent actually did, not just what it said.
  • Full execution traces – Inspect messages, tool calls, tool outputs, and validation results.
  • Live chat for iteration – Try agents in the dashboard while you develop.
  • Hot reload – Re-run against updated code without restarting the app.

Choose your path

If you are starting from zero, the framework-agnostic quickstart is the default path.

Core workflow

  1. Scaffold the app with goose init
  2. Point Goose at query(...) -> AgentResponse in gooseapp/conftest.py
  3. Write cases in gooseapp/tests/ with goose.case(...)
  4. Run the first loop with goose test list and goose test run
  5. Expand into tools, chat, and hot reload through gooseapp/app.py

Install

Required for the first test run:

pip install llm-goose

Optional for the browser UI:

npm install -g @llm-goose/dashboard-cli

goose init creates the path

goose init
gooseapp/
├── README.md
├── __init__.py
├── app.py
├── conftest.py
└── tests/
    ├── __init__.py
    └── test_example.py
  • app.py configures tools, live chat agents, and hot reload
  • conftest.py wires the Goose fixture to your real query function
  • tests/ holds the cases you run from the CLI or dashboard

See docs/goose-init.md for the full scaffold contract.

Minimal first test

from goose.testing import Goose


def test_agent_responds(goose: Goose) -> None:
    goose.case(
        query="Hello, what can you help me with?",
        expectations=[
            "Agent responds with a greeting or acknowledgment",
            "Agent describes its capabilities or offers assistance",
        ],
    )

goose is injected from the fixture you register in gooseapp/conftest.py. The full query -> fixture -> test path is documented in docs/getting-started.md.

Key commands

goose init                  # scaffold gooseapp/
goose test list gooseapp.tests
goose test run gooseapp.tests
goose api
goose-dashboard

Where to go next

License

MIT License – see LICENSE for full text.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_goose-0.2.0.tar.gz (57.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llm_goose-0.2.0-py3-none-any.whl (74.0 kB view details)

Uploaded Python 3

File details

Details for the file llm_goose-0.2.0.tar.gz.

File metadata

  • Download URL: llm_goose-0.2.0.tar.gz
  • Upload date:
  • Size: 57.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.6

File hashes

Hashes for llm_goose-0.2.0.tar.gz
Algorithm Hash digest
SHA256 1322eeaf1f8ac9ec3a72d11707229cb8969ac261313d8d8b9fc41d3a367ad813
MD5 d396532ab3fb78991344e8d2a6f706b2
BLAKE2b-256 101eef07a52d34e25a6da090d6e0e28ea7be4bce325b694f1ea8929fbb640f43

See more details on using hashes here.

File details

Details for the file llm_goose-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: llm_goose-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 74.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.6

File hashes

Hashes for llm_goose-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7e63057df2420d1df3d3e5d72a5146fe3dbf87b022969cd75762cfbef96ca48d
MD5 aaeb0823ead1036cfa3b312ee1cca6fe
BLAKE2b-256 4ed3796f6db2b368302de74ee47e90bcb8cc776f3c4d9758061916e44ca2fbe5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page