Library to test prompt quality

These details have not been verified by PyPI

Project description

LLM Prompt Evaluation Tool

This tool allows you to evaluate how well your LLM responses match an ideal answer by comparing generated questions and answers. You can use the tool either by calling the comparison functions manually in your code or by passing a CSV file containing test cases. The tool also supports visualizing your test scores on a chart, with results automatically grouped by different prompt IDs.

Key Parameters

ideal_answer (required):
The reference or "ideal" answer that your LLM response is compared against.
Example:
```
"Blockchain is like a digital ledger that everyone can see but no one can change."
```
llm_response (required):
Your LLM's response.
optional_params (optional):
A JSON-like dictionary that may include extra details for the test. It has the following structure:
- prompt: A string with the prompt to be used (if you want to override the prompt_id prompt).
- context: (Optional) Additional context for the prompt.
- prompt_id: (Optional) A unique identifier for the prompt to be fetched online from Lamoom Service.
  Example:
```
{
  "prompt": "Explain blockchain to a beginner.",
  "context": {},
  "prompt_id": "beginner_blockchain"
}
```

Using the Tool

1. Manual Testing

You can manually call the compare() method by passing the required ideal_answer and llm_response and (optionally) optional_params. Each call will automatically accumulate the test results based on the provided (or default) prompt_id from optional_params.

Example:

from lamoom_cicd import TestLLMResponsePipe
import time

ideal_answer = (
    "Blockchain is like a digital notebook that everyone can see, but no one can secretly change. "
    "Imagine a shared Google Doc where every change is recorded forever, and no one can edit past entries."
)
optional_params = {
    "prompt_id": f"test-{time.now()}"
}

lamoom_pipe = TestLLMResponsePipe(openai_key=os.environ.get("OPENAI_KEY"))
# When llm_response is not passed, it defaults to None.
result = lamoom_pipe.compare(ideal_answer, "Your LLM response here", optional_params=optional_params)

# Print individual question details
for question in result.questions:
    print(question.to_dict())

# Print overall score details
print(result.score.to_dict())

2. Testing with CSV

You can also pass multiple test cases using a CSV file. The CSV file should contain the following columns:

ideal_answer: (Required) The ideal answer text.
llm_response: (Required) LLM response to compare with.
optional_params: (Optional) A JSON string containing the optional parameters.

Multiple rows can be included, and you can use different prompt_id values to test various prompts.

Example CSV Content: IMPORTANT: take notice of double quotes when putting json in a csv file!

ideal_answer,llm_response, optional_params
"blockchain_prompt","Blockchain is a secure, immutable digital ledger.","Blockchain is like a shared Google Doc that records every change.","{""prompt_id"": ""google_doc_blockchain""}",

Usage Example:

csv_file_path = "test_data.csv"
lamoom_pipe = TestLLMResponsePipe(openai_key=os.environ.get("OPENAI_KEY"))
accumulated_results = lamoom_pipe.compare_from_csv("test_prompt", csv_file_path)

3. Visualizing Test Scores

After running tests (whether manually or using a CSV), the results are automatically accumulated by prompt_id. To see a visual chart of test scores, use the provided visualization function.

Example:

lamoom_pipe.visualize_test_results()

This function will generate a line chart with the x-axis representing the test instance number (as integers) and the y-axis representing the score percentage. Each line on the chart corresponds to a different prompt_id.

Summary

ideal_answer, llm_response are required parameters.
optional_params are optional, with optional_params offering extra configuration (like a custom prompt and a unique prompt_id for tests).
You can compare responses either manually or via CSV (which supports multiple test cases).
The tool accumulates results for each prompt_id across multiple calls.
Use the visualization function to see your test scores on an easy-to-read chart.

Enjoy using the tool to refine and evaluate your LLM prompts!

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.1.7

Mar 4, 2025

0.1.6

Feb 28, 2025

0.1.5

Feb 27, 2025

This version

0.1.4

Feb 24, 2025

0.1.3

Feb 24, 2025

0.1.2

Feb 24, 2025

0.1.1

Feb 24, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lamoom_cicd-0.1.4.tar.gz (7.2 kB view details)

Uploaded Feb 24, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

lamoom_cicd-0.1.4-py3-none-any.whl (8.9 kB view details)

Uploaded Feb 24, 2025 Python 3

File details

Details for the file lamoom_cicd-0.1.4.tar.gz.

File metadata

Download URL: lamoom_cicd-0.1.4.tar.gz
Upload date: Feb 24, 2025
Size: 7.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.5

File hashes

Hashes for lamoom_cicd-0.1.4.tar.gz
Algorithm	Hash digest
SHA256	`21d868e227663e251befe71b7786776fc67837a0d163a48cd2650f3813ff9643`
MD5	`b16807b2b4b64be99097973068ccb56d`
BLAKE2b-256	`cd99e8e684e9ecc08783656135f99284710e5d51a198c2fa52f592925631c4df`

See more details on using hashes here.

File details

Details for the file lamoom_cicd-0.1.4-py3-none-any.whl.

File metadata

Download URL: lamoom_cicd-0.1.4-py3-none-any.whl
Upload date: Feb 24, 2025
Size: 8.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.5

File hashes

Hashes for lamoom_cicd-0.1.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b04a22e893cb089a098aa05041db37a70fe015e5d19528a2a17b197ad45e64a6`
MD5	`8dd300c5f689cbb69934d2eacf5c8525`
BLAKE2b-256	`0261f359e9420180fff5c14117072074e669724e90980b3e382f25049278a327`

See more details on using hashes here.

lamoom-cicd 0.1.4

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

LLM Prompt Evaluation Tool

Key Parameters

Using the Tool

1. Manual Testing

2. Testing with CSV

3. Visualizing Test Scores

Summary

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes