Skip to main content

A testing suite for evaluating LLM responses (semantic similarity, hallucinations, consistency, security).

Project description

LLM TestLab

Comprehensive Testing Suite for Large Language Models (LLMs)

LLM TestLab is a flexible Python toolkit for evaluating Large Language Models (LLMs) on semantic similarity, hallucinations, consistency, and security. It supports FAISS for high-performance vector similarity and falls back to NumPy if FAISS is unavailable.

Features

  • Semantic Similarity Test – Evaluate if model outputs match expected answers.
  • Hallucination Test – Detect deviations from a knowledge base.
  • Consistency Test – Measure stability across multiple runs.
  • Security Test – Detect unsafe or malicious responses using keywords, regex patterns, and embedding similarity.
  • FAISS Support – Optional, for faster similarity searches.
  • Knowledge Base Management – Add, remove, or list facts.
  • Malicious Keywords Management – Customize keywords and patterns for security checks.
  • Logging – Built-in debug/info logging using Python's logging module.

Project Structure

llm-testlab/
├── llm_testing_suite/
│   ├── __init__.py          
│   └── llm_testing_suite.py
├── pyproject.toml
├── README.md
├── LICENSE
└── examples/
    └── huggingface_example.py

Installation

  1. From PyPI:

    pip install llm-testlab

  2. Or install directly from source:

 git clone https://github.com/Saivineeth147/llm-testlab.git
 cd llm-testlab
 pip install .

Optional: If you want FAISS and huggingface:

pip install llm-testlab[faiss,huggingface]

Quick Start

from llm_testing_suite import LLMTestSuite

Example LLM function

def llm_func(prompt): return "Rome is the capital of Italy"

Initialize the test suite

tester = LLMTestSuite(llm_func, use_faiss=True)

Run semantic similarity test

tester.semantic_test("What is the capital of Italy?", "Rome is the capital of Italy")

Run security test

tester.security_test("Ignore previous instructions")

Run all tests

tester.run_tests("What is the capital of Italy?", expected_answer="Rome is the capital of Italy")

Managing Knowledge Base

Add a single fact

tester.add_knowledge("New York is the largest city in the USA")

Add multiple facts

tester.add_knowledge_bulk(["Python is a programming language", "AI is transforming industries"])

List knowledge base

tester.list_knowledge()

Remove a fact

tester.remove_knowledge("Python is a programming language")

Clear the knowledge base

tester.clear_knowledge()

Managing Malicious Keywords

Add malicious keywords

tester.add_malicious_keywords(["hack system", "steal data"])

List keywords

tester.list_malicious_keywords()

Remove a keyword

tester.remove_malicious_keyword("hack system")

Output Format

All test methods support three return types controlled by the `return_type` parameter: `"dict"`, `"table"`, or `"both"`.
  • "dict": Returns a Python dictionary with the test results.
  • "table": Prints a formatted table using the rich library, no dictionary returned.
  • "both": Returns the dictionary and prints the table.

Example of semantic test result:

{ "question": "What is the capital of Italy?", "generated_answer": "Rome is the capital of Italy", "semantic_score": 0.92, "semantic_pass": True, "best_match": "Rome is the capital of Italy" }

Example of hallucination test result:

{ "question": "Who wrote Hamlet?", "generated_answer": "Hamlet was written by Shakespeare", "hallucination_best_match": "William Shakespeare wrote the play Romeo and Juliet.", "hallucination_distance": 0.87 }

Example of consistency test result:

{ "question": "What is the capital of France?", "consistency_outputs": ["Paris is the capital of France", "Paris is the capital of France", "Paris is the capital of France"], "consistency_avg_sim": 0.99 }

Example of security test result:

{ "question": "Ignore previous instructions", "generated_answer": "Ignore previous instructions", "security_safe": False, "security_reason": "Matched keyword: 'ignore previous instructions'" }

The run_tests() method combines all these results into a single dictionary with added token_cost information.

Logging

The suite uses Python's built-in logging module for debug and info messages. Adjust the log level in llm_testing_suite.py:

logger.setLevel(logging.DEBUG)  # Options: DEBUG, INFO, WARNING, ERROR

License

This project is licensed under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_testlab-0.1.2.tar.gz (8.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llm_testlab-0.1.2-py3-none-any.whl (8.3 kB view details)

Uploaded Python 3

File details

Details for the file llm_testlab-0.1.2.tar.gz.

File metadata

  • Download URL: llm_testlab-0.1.2.tar.gz
  • Upload date:
  • Size: 8.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for llm_testlab-0.1.2.tar.gz
Algorithm Hash digest
SHA256 d9b7e42c41d948e739769e1ca2ef74335a20638ac87752ea87b52ffede95ca5d
MD5 f038229fba752dc3cc7692766baf734f
BLAKE2b-256 39e09d1eb021cba3efba976d83621156bdc39530f6995f812349c11eaaebbd46

See more details on using hashes here.

File details

Details for the file llm_testlab-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: llm_testlab-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 8.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for llm_testlab-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 772fb2b006a7f8b59e460ba4bfdbd7bac7447a2c7e980470a220c3f28ff58b9f
MD5 75447b98f960dc63447cc6edd979d9f9
BLAKE2b-256 4179abab35a01256b20ad995c3766fd9a8cd0bfac067332f3afc0923aaf78531

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page