Testing framework for LLM Plugins
Project description
Permutate
Permutate is an automated testing framework for LLM Plugins.
ChatGPT Ignited LLM Plugins
ChatGPT spread like wildfire but it had some limitations, notably, it couldn’t access private data/systems. But this limitation was resolved with the release of OpenAI Plugins. This enabled developers to connect their favorite applications to ChatGPT. Unfortunately, in the rush to release plugins, quality assurance lacked.
From a software quality perspective, several common problems surfaced:
- Despite the plugin being “installed” in a user’s environment, the plugin wasn’t consistently activated by the user’s text.
- When it was activated, the plugin wasn’t called correctly, leading to undesirable results.
Ultimately, plugin developers chose to remove the bulk of their features just to get basic functions to run correctly. 🙁
Introducing Permutate
Permutate is an automated testing framework for LLM Plugins.
Permutate allows development teams to:
- Define a set of reusable tests for plugins
- Describe the tests using a standard, open format
- Use open source software (Permutate) to execute the tests
- See the results of individual test cases as well as summary statistics
The Permutation Problem
When users give prompts (instructions to an LLM via chat, etc.), they use a variety of ways of describing what they want. Each sentence variation might work or fail. The goal is to get as many of them to succeed as possible.
Some technology (the tool selector) must determine what the intent of the command was (aka, intent detection). Additionally, the command might have extra data like “in the morning” or “once per week”. This natural language needs to be mapped back to an API. The Tool Selector must do more than just ‘find the right tool’, it must map language to an API and call it perfectly.
So, here we go. Given J variations of sample input text, and K variations of "installed" plugins, we use a tool selector and evaluate the performance:
- Is the correct plugin selected?
- Is the correct API operation selected?
- Are the API parameters filled in correctly?
- What was the cost to solve?
- And, what was the round-trip latency?
Tool Selectors
To satisfy these concerns, developers will use a Tool Selector service. Here, they pass in the text, and it identifies the correct plugin to use, the right operations, etc. In some cases, they might return the necessary source code to call the API, with all of the parameters filled in.
To make life simple, we created OpenPlugin. This is optional. This allows plugin service providers to offer their best implementation possible. If an implementation isn’t giving you the accuracy or performance you need, try another. But more importantly, it allows you to test plugins using basic CI/CD principles.
Is this just for OpenAI?
No. OpenAI hasn’t (yet) made their tool selector service available to the public. We encourage all vendors to make their tool selector service available. This allows for headless automation testing, and without it, we can anticipate poor plugin quality.
Until OpenAI makes their Tool Selector service available to the public, you have two options:
- Manual Testing
- UI Testing (e.g., Selenium Hell).
Getting started
Installation
To install using pip, run:
pip install permutate
You can verify you have permutate installed by running:
permutate --help
Credentials
Before you run the application, be sure you have credentials configured.
export OPENAI_API_KEY=<your key> // if you want to use OpenAI LLM
export COHERE_API_KEY=<your key> // if you wan to use Cohere LLM
export GOOGLE_APPLICATION_CREDENTIALS=<credential_file_path: /usr/app/application_default_credentials.json> // if you want to use Google LLM
Create your test file
Sample test file: https://raw.githubusercontent.com/LegendaryAI/permutate/main/tests/files/plugin_test.yaml
version: 1.0.0
name: klarna_plugin_test
config:
use_openplugin_library: true
langchain_tool_selector: http://localhost:8006/api/langchain/run-plugin
imprompt_tool_selector: http://localhost:8006/api/imprompt/run-plugin
auto_translate_to_languages:
- English
- Spanish
test_plugin:
manifest_url: https://www.klarna.com/.well-known/ai-plugin.json
plugin_groups:
- plugin_group:
name: my_group1
plugins:
- plugin:
manifest_url: https://www.klarna.com/.well-known/ai-plugin.json
- plugin_group:
name: my_group2
plugins:
- plugin:
manifest_url: https://www.klarna.com/.well-known/ai-plugin.json
- plugin:
manifest_url: https://api.imprompt.ai/plugin/users/2/blogwriter/.well-known/ai-plugin.json
permutations:
- permutation:
name: permutation1
llm:
provider: OpenAIChat
model_name: gpt-3.5-turbo
temperature: 0
max_tokens: 1024
top_p: 1
frequency_penalty: 0
presence_penalty: 0
n: 1
best_of: 1
tool_selector:
provider: Langchain
pipeline_name: zero-shot-react-description
plugin_group_name: my_group1
- permutation:
name: permutation2
llm:
provider: OpenAIChat
model_name: gpt-3.5-turbo
temperature: 0
max_tokens: 1024
top_p: 1
frequency_penalty: 0
presence_penalty: 0
n: 1
best_of: 1
tool_selector:
provider: Imprompt
pipeline_name: default
plugin_group_name: my_group1
test_cases:
- test_case:
name: test1
prompt: Show me 5 T shirts from Klarna
expected_plugin_used: KlarnaProducts
expected_api_used: https://www.klarna.com/us/shopping/public/openai/v0/products
expected_parameters:
q: t shirt
size: 5
expected_response: List of 5 T shirts with URL
- test_case:
name: test2
prompt: Get me 5 oranges
expected_plugin_used: None
expected_api_used: None
expected_parameters:
num_shirts: None
expected_response: None
Run your test file
Usage: permutation run [TEST_FILE_PATH] [OPTIONS]
Run a permutation batch
Arguments:
test_file_path Plugin test setup file.
default: /permutate/workspace/plugin_test.yaml
Options:
--help show this help message and exit
--save-to-html --no-save-to-html Save the results of the permutation run to an html file.
default: save-to-html
--save-to-csv --no-save-to-csv Save the results of the permutation run to a csv file.
default: no-save-to-csv
--output-directory Path to the directory where the output files will be saved.
default: /permutate/workspace/output/
Examples:
This command will run the tests defined in the plugin_test.yaml file and save the results to a csv file and an html file in the directory pointed by the flag --output-directory.
permutate run tests/files/plugin_test.yaml --output-directory tests/files/output/ --save-to-csv --save-to-html
This command will run the tests on a sample test file provided in the package and save the results to an html file. This command can be used to see the sample output.
permutate run
Docker
docker run -v /LOCALPATH/plugin_test.yaml:/usr/app/plugin_test.yaml -e "OPENAI_KEY=your-key" -e "COHERE_API_KEY=your-key" -e "GOOGLE_APPLICATION_CREDENTIALS=your-file-path" shrikant14/permutate:latest
Output
You can save your permutate run output to:
-
HTML Report:
You can save your permutation run output to an HTML Report that presents the results of the permutation run in a structured and visually appealing format.
Sample report: https://raw.githubusercontent.com/LegendaryAI/permutate/main/docs/sample_result.html
https://raw.githubusercontent.com/LegendaryAI/permutate/main/docs/sample_result_screenshot.png
-
CSV Report.
You can save your permutation run output to two csv files: one for the permutation run summary and one for the permutation run details.
Sample summary: https://raw.githubusercontent.com/LegendaryAI/permutate/main/docs/sample_result_summary.csv
Sample details: https://raw.githubusercontent.com/LegendaryAI/permutate/main/docs/sample_result_details.csv
(THIS PROJECT IS NOT RELEASED YET). More docs coming soon!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file permutate-0.0.10.tar.gz
.
File metadata
- Download URL: permutate-0.0.10.tar.gz
- Upload date:
- Size: 13.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.5.0 CPython/3.10.6 Linux/5.15.0-1038-azure
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f4cb5ad615a5cca61c7058f0b101419ac1b5c29f5b01af4bc1572db94ee510cc |
|
MD5 | 4e0246ebec6057743490e9c227d7ca3b |
|
BLAKE2b-256 | cf981830bdb7cf8bb3f9ca947fb852b20ea4162358a41247bd767a32d5bb3c00 |
File details
Details for the file permutate-0.0.10-py3-none-any.whl
.
File metadata
- Download URL: permutate-0.0.10-py3-none-any.whl
- Upload date:
- Size: 14.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.5.0 CPython/3.10.6 Linux/5.15.0-1038-azure
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 43f86d3130663fdc14d87062658b44ff6f41568b6361b3057c25b8319a3d17d8 |
|
MD5 | dd20993b261f3c9403dd3c8e75209793 |
|
BLAKE2b-256 | 5f2f1232664828cc942d4c2e20d22cb6f42742dad59792a3cf95d8937298eb03 |