Semantic testing framework for LLM applications
Project description
llm_app_test
A semantic testing framework for LLM applications that uses LLMs to validate semantic equivalence in test outputs.
✨ Test your LLM apps in minutes, not hours
🚀 CI/CD ready out of the box
💰 Cost-effective testing solution
🔧 No infrastructure needed
You can click here to go straight to the docs.
Due to the number of downloads I am seeing on pypistats.org, I am including these instructions in case a beta update breaks something on your end:
Emergency Rollback Instructions
If you experience issues with version 0.1.0b4, you can roll back to the previous stable version (0.1.0b3.post3) using one of these methods:
Method 1: Direct Installation of Previous Version
pip uninstall llm-app-test
pip install llm-app-test==0.1.0b3.post3
Method 2: Force Reinstall (if Method 1 fails)
pip install --force-reinstall llm-app-test==0.1.0b3.post3
Verification
After rolling back, verify the installation:
import llm_app_test
print(llm_app_test.version) # Should show 0.1.0b3.post3
What llm_app_test Does
- Tests LLM applications (not the LLMs themselves)
- Validates system message + prompt template outputs
- Ensures semantic equivalence of responses
- Tests the parts YOU control in your LLM application
What llm_app_test Doesn't Do
- Test LLM model performance (that's the provider's responsibility)
- Validate base model capabilities
- Test model reliability
- Handle model safety features
Screenshots
What if you could just use:
semantic_assert.assert_semantic_match(
actual=actual_output,
expected_behavior=expected_behavior
)
and get a pass/fail to test your LLM apps? Well, that's what I'm trying to do. Anyway, seeing is believing so:
Here's llm_app_test passing a test case:
Here's llm_app_test failing a test case (and providing the reason why it failed):
Finally, here's llm_app_test passing a test case with a complex reasoning chain with the simple, natural language instruction of:
A complex, multi-step, scientific explanation.
Must maintain logical consistency across all steps.
Why llm_app_test?
Testing LLM applications is challenging because:
- Outputs are non-deterministic
- Semantic meaning matters more than exact matches
- Traditional testing approaches don't work well
- Integration into CI/CD pipelines is complex
llm_app_test solves these challenges by:
- Using LLMs to evaluate semantic equivalence
- Providing a clean, maintainable testing framework
- Offering simple CI/CD integration
- Supporting multiple LLM providers
When to Use llm_app_test
- Testing application-level LLM integration
- Validating prompt engineering
- Testing system message effectiveness
- Ensuring consistent response patterns
When Not to Use llm_app_test
- Testing base LLM performance
- Evaluating model capabilities
- Testing model safety features
Quick Example
from llm_app_test.semanticassert.semantic_assert import SemanticAssertion
semantic_assert = SemanticAssertion()
semantic_assert.assert_semantic_match(actual="Hello Alice, how are you?",
expected_behavior="A polite greeting addressing Alice"
)
Installation
pip install llm-app-test
Documentation
Full documentation available at: https://Shredmetal.github.io/llmtest/
- Installation Guide
- Quick Start Guide
- API Reference
- Best Practices
- CI/CD Integration
- Configuration Options
License
MIT
Contributing
This project is at an early stage and aims to be an important testing library for LLM applications.
Want to contribute? Great! Some areas we're looking for help:
- Additional LLM provider support
- Performance optimizations
- Test coverage improvements
- Documentation
- CI/CD integration examples
- Test result caching
- Literally anything else you can think of, I'm all out of ideas, I'm not even sure starting this project was a smart one.
Please:
- Fork the repository
- Create a feature branch
- Submit a Pull Request
For major changes, please open an issue first to discuss what you would like to change, or YOLO in a PR, bonus points if you can insult me in a way that makes me laugh.
Please adhere to clean code principles and include appropriate tests... or else. 🗡️
Reporting Issues
If you encounter issues:
- Create an issue on our GitHub repository
- Include your Python version and environment details
- Describe the problem you encountered with version 0.1.0b4
🆘 Support
- Discord: Join our community
- Issues: GitHub Issues
- Documentation: Full Docs
- Email: morganj.lee01@gmail.com
⚠️ Important Note About Rate Limits - If Running Large Numbers of Tests:
Anthropic Rate limits:
Tier 1:
Model | Maximum Requests per minute (RPM) | Maximum Tokens per minute (TPM) | Maximum Tokens per day (TPD) |
---|---|---|---|
Claude 3.5 Sonnet 2024-10-22 | 50 | 40,000 | 1,000,000 |
Claude 3.5 Sonnet 2024-06-20 | 50 | 40,000 | 1,000,000 |
Claude 3 Opus | 50 | 20,000 | 1,000,000 |
Tier 2:
Model | Maximum Requests per minute (RPM) | Maximum Tokens per minute (TPM) | Maximum Tokens per day (TPD) |
---|---|---|---|
Claude 3.5 Sonnet 2024-10-22 | 1,000 | 80,000 | 2,500,000 |
Claude 3.5 Sonnet 2024-06-20 | 1,000 | 80,000 | 2,500,000 |
Claude 3 Opus | 1,000 | 40,000 | 2,500,000 |
OpenAI Rate Limits
Tier 1
Model | RPM | RPD | TPM | Batch Queue Limit |
---|---|---|---|---|
gpt-4o | 500 | - | 30,000 | 90,000 |
gpt-4o-mini | 500 | 10,000 | 200,000 | 2,000,000 |
gpt-4o-realtime-preview | 100 | 100 | 20,000 | - |
gpt-4-turbo | 500 | - | 30,000 | 90,000 |
Tier 2:
Model | RPM | TPM | Batch Queue Limit |
---|---|---|---|
gpt-4o | 5,000 | 450,000 | 1,350,000 |
gpt-4o-mini | 5,000 | 2,000,000 | 20,000,000 |
gpt-4o-realtime-preview | 200 | 40,000 | - |
gpt-4-turbo | 5,000 | 450,000 | 1,350,000 |
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file llm_app_test-0.1.0b4.tar.gz
.
File metadata
- Download URL: llm_app_test-0.1.0b4.tar.gz
- Upload date:
- Size: 13.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8a75fcedb49952d790ce11a0f01141903308fe203069addf616e89b9962ccf1e |
|
MD5 | 9d543477556d982e6485212ea781d733 |
|
BLAKE2b-256 | 1e995f10821bf7e5ea8394ae0be140e3458ae914a384f1a53a485bda9a1b0d68 |
File details
Details for the file llm_app_test-0.1.0b4-py3-none-any.whl
.
File metadata
- Download URL: llm_app_test-0.1.0b4-py3-none-any.whl
- Upload date:
- Size: 14.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8408ecb9e2aa3744f3136f4e65f32b8f32b13b81e0ca306c9e0eff8bc583f436 |
|
MD5 | abe74be3b2f9f7255318c7434d01dbc2 |
|
BLAKE2b-256 | 40a5383d3d6c8c3629218ae5c1cd07c1c42f47c3cccc85ab146908ea620c03bb |