Add your description here

Project description

Quack Test

A plugin for pytest to evaluate non-deterministic agent components.

A runber duckie taking a test

Installation & Setup

Simply pip install it.

pip install quack-test

To use the LLM judge you will need to add the required config to your .env file.

# OpenAI Provider Configuration
# Set to "OpenAI" or "AzureOpenAI"
OPENAI_PROVIDER=OpenAI

# Unified Configuration
OPENAI_API_KEY=your_api_key_here
OPENAI_MODEL_NAME=gpt-4
OPENAI_ENDPOINT=https://your-resource-name.openai.azure.com/

Run tests

Simply run your tests using pytest (quack_tests are just pytests)

pytest

Writing Tests

You can simply specify @nondeterministic for components, which are not deterministic and should be run multiple times. Implement everything in a test_*.py file in your test or tests folder.

import random
from quack_test import nondeterministic_fixture, nondeterministic_test

@nondeterministic_fixture(n=5)
def sample_text():
    return f"{random.randint(1, 10)} apples."

@nondeterministic_test(threshold=0.8)
def test_simple_assertion(sample_text):
    apples = float(sample_text.split(" "))
    assert apples > 1, f"Expected more than 1 apple found {apples}."

In practice choosing n=4 or n=5 is recommended. The threshold should be then 0.75 or 0.8, allowing for a single failure. If you are choosing a bigger n, you are rather running a benchmark than just a test. Here we assume, that if a feature is really broken, it does not only fail once, but more often. On the other side, if a feature works robustly, it will should not fail or at maximum once.

Judging via LLM

In many cases it is hard to evaluate with code, if an answer is actually correct. In those cases, use a judge.

from quack_test import judge, nondeterministic_test
# [...]

@nondeterministic_test(threshold=0.8)
def test_judge_with_criterion(sample_text):
    return judge(sample_text, criterion="More than 1 apple")

@nondeterministic_test(threshold=0.8)
def test_judge_with_gt(sample_text):
    return judge(sample_text, gt="N (0-10) apples")

Class Based Tests

Testing in classes can be beneficial in some use-cases. This is not hindered by quack-test.

class TestClassBased:
    @nondeterministic_test(n=3, threshold=0.2)
    def test_assert_only(self):
        # You can also create your own score and explanation
        return random.random(), "The expected value is above 0.2, but was lower."

Negative Tests

Writing negative tests that must fail is important. In quack test, you can simply specify that a test has to fail via should_fail=True.

@nondeterministic_test(threshold=0.8, should_fail=True)
def test_negative(sample_text):
    return judge(sample_text, criterion="Has only coal")

Showcase for Error Messages

The following example code will fail and cause an error message.

@nondeterministic_test(threshold=0.8)
def test_assertion_demo_failure(sample_text):
    assert False, "This message is shown in the pytest summary."
    # -> FAILED test/test_example.py::test_assertion_demo_failure - AssertionError: Test failed to meet success threshold. Score: 0.0 (required: 0.8), Success rate: 0.00% (0/5), This message is shown in the pytest summary.

@nondeterministic_test(threshold=0.8)
def test_judge_demo_failure(sample_text):
    return judge(sample_text, criterion="Has only coal")
    # -> FAILED test/test_example.py::test_judge_demo_failure - AssertionError: Test failed to meet success threshold. Score: 0.0 (required: 0.8), Success rate: 0.00% (0/5), Text: '3 apples.' Criterion: 'Has only coal'

Advanced Judge Setup

If you do not want to setup the judge via the .env file as shown above, you can also set it up via code. In this example we hardcode a local ollama instance. Of course you should NEVER put your api key in the code, but if you manage your secrets differently than in a .env this can come in clutch to pass your secrets to the judge.

def setup_module():
    configure_judge(
        provider="OpenAI",
        api_key="ollama",
        model_name="qwen3:4b",
        endpoint="http://localhost:11434/v1"
    )

Have fun quack testing your agents!

Project details

Release history Release notifications | RSS feed

This version

0.3.1

Mar 18, 2026

0.3.0

Feb 2, 2026

0.2.0

Nov 25, 2025

0.1.0

Nov 20, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

quack_test-0.3.1.tar.gz (7.6 kB view details)

Uploaded Mar 18, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

quack_test-0.3.1-py3-none-any.whl (8.2 kB view details)

Uploaded Mar 18, 2026 Python 3

File details

Details for the file quack_test-0.3.1.tar.gz.

File metadata

Download URL: quack_test-0.3.1.tar.gz
Upload date: Mar 18, 2026
Size: 7.6 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for quack_test-0.3.1.tar.gz
Algorithm	Hash digest
SHA256	`10da6b1043ecaf500c24adc02b5b1058749dfc2c2b5f0b18d5f7d8a689e52f96`
MD5	`62bc9636003cc9bee5576537275fb34c`
BLAKE2b-256	`1ccaf2c8be2d439fb02fb3b62f85d6d2d3c40bdc3f56d3462e94aa23c55cd216`

See more details on using hashes here.

Provenance

The following attestation bundles were made for quack_test-0.3.1.tar.gz:

Publisher: python-publish.yml on penguinmenac3/quack-test

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: quack_test-0.3.1.tar.gz
- Subject digest: 10da6b1043ecaf500c24adc02b5b1058749dfc2c2b5f0b18d5f7d8a689e52f96
- Sigstore transparency entry: 1126844649
- Sigstore integration time: Mar 18, 2026
Source repository:
- Permalink: penguinmenac3/quack-test@81a48405c9243f7943eafdf498e9197c11768299
- Branch / Tag: refs/tags/V0.3.1
- Owner: https://github.com/penguinmenac3
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-publish.yml@81a48405c9243f7943eafdf498e9197c11768299
- Trigger Event: release

File details

Details for the file quack_test-0.3.1-py3-none-any.whl.

File metadata

Download URL: quack_test-0.3.1-py3-none-any.whl
Upload date: Mar 18, 2026
Size: 8.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for quack_test-0.3.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`469b4da4183bceb0c5ab9aa4124aea884e076c863863f81249c31fcaa46088ad`
MD5	`fb9eadfb8a656895f09b212b06eaaa2c`
BLAKE2b-256	`0cac6b6970ed86663926052a4acc2459b3b68ee08e30db63a1637e2a10d57e13`

See more details on using hashes here.

Provenance

The following attestation bundles were made for quack_test-0.3.1-py3-none-any.whl:

Publisher: python-publish.yml on penguinmenac3/quack-test

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: quack_test-0.3.1-py3-none-any.whl
- Subject digest: 469b4da4183bceb0c5ab9aa4124aea884e076c863863f81249c31fcaa46088ad
- Sigstore transparency entry: 1126844830
- Sigstore integration time: Mar 18, 2026
Source repository:
- Permalink: penguinmenac3/quack-test@81a48405c9243f7943eafdf498e9197c11768299
- Branch / Tag: refs/tags/V0.3.1
- Owner: https://github.com/penguinmenac3
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-publish.yml@81a48405c9243f7943eafdf498e9197c11768299
- Trigger Event: release

quack-test 0.3.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

Quack Test

Installation & Setup

Run tests

Writing Tests

Judging via LLM

Class Based Tests

Negative Tests

Showcase for Error Messages

Advanced Judge Setup

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance