Semantic testing framework for LLM applications

Project description

llm_app_test

A semantic testing framework for LLM applications that uses LLMs to validate semantic equivalence in test outputs.

✨ Test your LLM apps in minutes, not hours

🚀 CI/CD ready out of the box

💰 Cost-effective testing solution

🔧 No infrastructure needed

You can click here to go straight to the docs.

Important note: This is still being tested prior to release on PyPI, please refer to the branch: feat/find-a-way-to-break-this and look in the tests directory to see where we're at on the dumbest edge cases (e.g. 100 emojis).

What llm_app_test Does

Tests LLM applications (not the LLMs themselves)
Validates system message + prompt template outputs
Ensures semantic equivalence of responses
Tests the parts YOU control in your LLM application

What llm_app_test Doesn't Do

Test LLM model performance (that's the provider's responsibility)
Validate base model capabilities
Test model reliability
Handle model safety features

Screenshots

What if you could just use:

semantic_assert.assert_semantic_match(
        actual=actual_output,
        expected_behavior=expected_behavior
    )

and get a pass/fail to test your LLM apps? Well, that's what I'm trying to do. Anyway, seeing is believing so:

Here's llm_app_test passing a test case:

Here's llm_app_test failing a test case (and providing the reason why it failed):

Finally, here's llm_app_test passing a test case with a complex reasoning chain with the simple, natural language instruction of:

A complex, multi-step, scientific explanation.
Must maintain logical consistency across all steps.

Why llm_app_test?

Testing LLM applications is challenging because:

Outputs are non-deterministic
Semantic meaning matters more than exact matches
Traditional testing approaches don't work well
Integration into CI/CD pipelines is complex

llm_app_test solves these challenges by:

Using LLMs to evaluate semantic equivalence
Providing a clean, maintainable testing framework
Offering simple CI/CD integration
Supporting multiple LLM providers

When to Use llm_app_test

Testing application-level LLM integration
Validating prompt engineering
Testing system message effectiveness
Ensuring consistent response patterns

When Not to Use llm_app_test

Testing base LLM performance
Evaluating model capabilities
Testing model safety features

Quick Example

from llm_app_test.semanticassert.semantic_assert import SemanticAssertion

semantic_assert = SemanticAssertion() 
semantic_assert.assert_semantic_match(actual="Hello Alice, how are you?", 
                                      expected_behavior="A polite greeting addressing Alice" 
                                      )

Installation

pip install git+https://github.com/Shredmetal/llmtest.git

Documentation

Full documentation available at: https://Shredmetal.github.io/llmtest/

Installation Guide
Quick Start Guide
API Reference
Best Practices
CI/CD Integration
Configuration Options

License

MIT

Contributing

This project is at an early stage and aims to be an important testing library for LLM applications.

Want to contribute? Great! Some areas we're looking for help:

Additional LLM provider support
Performance optimizations
Test coverage improvements
Documentation
CI/CD integration examples
Test result caching
Literally anything else you can think of, I'm all out of ideas, I'm not even sure starting this project was a smart one.

Please:

Fork the repository
Create a feature branch
Submit a Pull Request

For major changes, please open an issue first to discuss what you would like to change, or YOLO in a PR, bonus points if you can insult me in a way that makes me laugh.

Please adhere to clean code principles and include appropriate tests... or else. 🗡️

Contact

morganj.lee01@gmail.com

Project details

Release history Release notifications | RSS feed

0.1.0b4 pre-release

Nov 18, 2024

0.1.0b3.post3 pre-release

Nov 17, 2024

0.1.0b3.post2 pre-release

Nov 17, 2024

This version

0.1.0b3.post1 pre-release

Nov 17, 2024

0.1.0b3 pre-release

Nov 17, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_app_test-0.1.0b3.post1.tar.gz (10.9 kB view details)

Uploaded Nov 17, 2024 Source

Built Distribution

llm_app_test-0.1.0b3.post1-py3-none-any.whl (12.6 kB view details)

Uploaded Nov 17, 2024 Python 3

File details

Details for the file llm_app_test-0.1.0b3.post1.tar.gz.

File metadata

Download URL: llm_app_test-0.1.0b3.post1.tar.gz
Upload date: Nov 17, 2024
Size: 10.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.12.5

File hashes

Hashes for llm_app_test-0.1.0b3.post1.tar.gz
Algorithm	Hash digest
SHA256	`2624978b812c88143ba8d18ed7c4af7c07cc71b249a53bb4bb81558d5d849d44`
MD5	`a8ce931d15730b5e1fc1dbffd83d26c5`
BLAKE2b-256	`5287d1409043e576c153868745df0f09a8181c31640da2c1564aef80a4970706`

See more details on using hashes here.

File details

Details for the file llm_app_test-0.1.0b3.post1-py3-none-any.whl.

File metadata

Download URL: llm_app_test-0.1.0b3.post1-py3-none-any.whl
Upload date: Nov 17, 2024
Size: 12.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.12.5

File hashes

Hashes for llm_app_test-0.1.0b3.post1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d4ba8af166bfbbf944c5074373e3c3bde13f08abb32bb41e4c33ff4dd4bf6784`
MD5	`fceb174090ee01abafff2da01332326e`
BLAKE2b-256	`53fcb1c22829ea8051ad1b6bc7815ee06b6417ea22ce80d810978cd4d9472e01`