Skip to main content

Semantic Unit Testing

Project description

Semantic Unit Testing

What's semantic unit testing?

Semantic unit testing is a testing approach that evaluates whether a function's implementation aligns with its documented behavior. The code is analyzed using LLMs to assess whether the implementation matches the expected behavior described in the docstring.

Here's an example of how to use it

from suite import suite

tester = suite(model_name="openai/o3-mini")

def multiply(x: int, y: int):
    """Multiplies x by y

    Args:
        x (int): value
        y (int): value
    """
    return x + y

result = tester(multiply)
print(result)

# {'reasoning': "The function's docstring states that it should multiply x by y. 
#   However, the implementation returns x + y, which is addition instead of multiplication. 
#   Therefore, the implementation does not correctly fulfill what is described in the docstring.",
# 'passed': False}

In this example, the implementation of multiply contains an error (it uses addition instead of multiplication). When the tester is called with the multiply function, it evaluates the implementation against the docstring, providing feedback on any discrepancies. This process helps ensure that the function behaves as expected and adheres to its documentation.

Why?

  • Comprehensive Coverage: Traditional unit testing focuses on specific inputs and outputs, covering only a small surface of the code. suite, on the other hand, evaluates the semantic correctness of functions by analyzing their implementation against their documentation.
  • No need to write tests by hand: Writing tests by hand can be tiring and non-exhaustive. By using LLMs, we can avoid having to write specific examples one by one. This not only saves time but also ensures that a wider range of scenarios and edge cases are considered, leading to more robust testing outcomes.
  • Enhanced Reasoning with LLMs: By passing code and context to LLMs, Suite enables a deeper level of reasoning about the function's behavior. This capability allows for more nuanced evaluations.

How?

This library uses llm package by Simon Willison. When testing a method, its source code, docstring, and the dependencies information (any other method used by the code under testing) are retrieved and passed to an LLM for evaluation. Then, the LLM decides if the evaluation is correct or not.

Since we're using llm library we can use any supported model. From my experience, reasoning models that support structured outputs are the ones that work the best (eg: o3-mini).

Usage

To use the suite module, you can create an instance of the suite or async_suite class, depending on your needs. You will then pass the function you want to test, and suite will evaluate its implementation against its docstring, providing feedback on any discrepancies.

You have a couple of examples in the examples folder.

The intended usage of this package is for testing, so you could do something like

# tests/test_multiply.py

from package import multiply
from suite import suite

tester = suite(model="openai/o3-mini")

def test_multiply():
    assert tester(multiply)

Since suite also supports async operations you can use pytest-asyncio to speed up your tests (you don't need to run them sequentially since the bottlenck is not your laptop but the LLM provider).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

suite-1.0.1.tar.gz (9.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

suite-1.0.1-py3-none-any.whl (7.7 kB view details)

Uploaded Python 3

File details

Details for the file suite-1.0.1.tar.gz.

File metadata

  • Download URL: suite-1.0.1.tar.gz
  • Upload date:
  • Size: 9.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for suite-1.0.1.tar.gz
Algorithm Hash digest
SHA256 44df40f94b4d0414b5a783e7f4147abd3a60997a3cc1e8a24af798cf6cea764e
MD5 b9180d60309e9d9bd328330dd1c05948
BLAKE2b-256 1252bc841706d3aafa99e936d793ee9605c07c2bad0c6324a5a3db8542ed70af

See more details on using hashes here.

Provenance

The following attestation bundles were made for suite-1.0.1.tar.gz:

Publisher: python-publish.yml on alexmolas/suite

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file suite-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: suite-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 7.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for suite-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 e0121a7846bbd6a2123d7484e1fe102aedd3e0b44e25eed30cf2182c9ed037f6
MD5 20b924e56e5783bde17cb63c5f1d6559
BLAKE2b-256 879e5979421674acee11d6443fc624f33ed3b0c16df999c66d50fb5d869eb2b4

See more details on using hashes here.

Provenance

The following attestation bundles were made for suite-1.0.1-py3-none-any.whl:

Publisher: python-publish.yml on alexmolas/suite

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page