Skip to main content

Testing framework for LLM Plugins

Project description

Permutate

Alt text

Permutate is an automated testing framework for LLM Plugins.

ChatGPT Ignited LLM Plugins

ChatGPT spread like wildfire but it had some limitations, notably, it couldn’t access private data/systems. But this limitation was resolved with the release of OpenAI Plugins. This enabled developers to connect their favorite applications to ChatGPT. Unfortunately, in the rush to release plugins, quality assurance lacked.

From a software quality perspective, several common problems surfaced:

  • Despite the plugin being “installed” in a user’s environment, the plugin wasn’t consistently activated by the user’s text.
  • When it was activated, the plugin wasn’t called correctly, leading to undesirable results.

Ultimately, plugin developers chose to remove the bulk of their features just to get basic functions to run correctly. 🙁


Alt text


Introducing Permutate

Permutate is an automated testing framework for LLM Plugins.

Permutate allows development teams to:

  • Define a set of reusable tests for plugins
  • Describe the tests using a standard, open format
  • Use open source software (Permutate) to execute the tests
  • See the results of individual test cases as well as summary statistics

The Permutation Problem

When users give prompts (instructions to an LLM via chat, etc.), they use a variety of ways of describing what they want. Each sentence variation might work or fail. The goal is to get as many of them to succeed as possible.

Some technology (the tool selector) must determine what the intent of the command was (aka, intent detection). Additionally, the command might have extra data like “in the morning” or “once per week”. This natural language needs to be mapped back to an API. The Tool Selector must do more than just ‘find the right tool’, it must map language to an API and call it perfectly.

So, here we go. Given J variations of sample input text, and K variations of "installed" plugins, we use a tool selector and evaluate the performance:

  1. Is the correct plugin selected?
  2. Is the correct API operation selected?
  3. Are the API parameters filled in correctly?
  4. What was the cost to solve?
  5. And, what was the round-trip latency?

Tool Selectors

To satisfy these concerns, developers will use a Tool Selector service. Here, they pass in the text, and it identifies the correct plugin to use, the right operations, etc. In some cases, they might return the necessary source code to call the API, with all of the parameters filled in.

To make life simple, we created OpenPlugin. This is optional. This allows plugin service providers to offer their best implementation possible. If an implementation isn’t giving you the accuracy or performance you need, try another. But more importantly, it allows you to test plugins using basic CI/CD principles.

Is this just for OpenAI?

No. OpenAI hasn’t (yet) made their tool selector service available to the public. We encourage all vendors to make their tool selector service available. This allows for headless automation testing, and without it, we can anticipate poor plugin quality.

Until OpenAI makes their Tool Selector service available to the public, you have two options:

  1. Manual Testing
  2. UI Testing (e.g., Selenium Hell).

Getting started

Installation

To install using pip, run:

pip install permutate

You can verify you have permutate installed by running:

permutate --help

Credentials

Before you run the application, be sure you have credentials configured.

export OPENAI_API_KEY=<your key> // if you want to use OpenAI LLM
export COHERE_API_KEY=<your key> // if you wan to use Cohere LLM
export GOOGLE_APPLICATION_CREDENTIALS=<credential_file_path: /usr/app/application_default_credentials.json> // if you want to use Google LLM

Create your test file

Sample test file: https://raw.githubusercontent.com/LegendaryAI/permutate/main/tests/files/plugin_test.yaml

version: 1.0.0
name: klarna_plugin_test
config:
  use_openplugin_library: true
  langchain_tool_selector: http://localhost:8006/api/langchain/run-plugin
  imprompt_tool_selector: http://localhost:8006/api/imprompt/run-plugin
  auto_translate_to_languages:
    - English
    - Spanish
test_plugin:
    manifest_url: https://www.klarna.com/.well-known/ai-plugin.json
plugin_groups:
  - plugin_group:
    name: my_group1
    plugins:
      - plugin:
        manifest_url: https://www.klarna.com/.well-known/ai-plugin.json
  - plugin_group:
    name: my_group2
    plugins:
      - plugin:
        manifest_url: https://www.klarna.com/.well-known/ai-plugin.json
      - plugin:
        manifest_url: https://api.imprompt.ai/plugin/users/2/blogwriter/.well-known/ai-plugin.json
permutations:
  - permutation:
    name: permutation1
    llm:
      provider: OpenAIChat
      model_name: gpt-3.5-turbo
      temperature: 0
      max_tokens: 1024
      top_p: 1
      frequency_penalty: 0
      presence_penalty: 0
      n: 1
      best_of: 1
    tool_selector:
      provider: Langchain
      pipeline_name: zero-shot-react-description
      plugin_group_name: my_group1
  - permutation:
    name: permutation2
    llm:
      provider: OpenAIChat
      model_name: gpt-3.5-turbo
      temperature: 0
      max_tokens: 1024
      top_p: 1
      frequency_penalty: 0
      presence_penalty: 0
      n: 1
      best_of: 1
    tool_selector:
      provider: Imprompt
      pipeline_name: default
      plugin_group_name: my_group1
test_cases:
  - test_case:
    name: test1
    prompt: Show me 5 T shirts from Klarna
    expected_plugin_used: KlarnaProducts
    expected_api_used: https://www.klarna.com/us/shopping/public/openai/v0/products
    expected_parameters:
      q: t shirt
      size: 5
    expected_response: List of 5 T shirts with URL
  - test_case:
    name: test2
    prompt: Get me 5 oranges
    expected_plugin_used: None
    expected_api_used: None
    expected_parameters:
      num_shirts: None
    expected_response: None

Run your test file

Usage: permutation run [TEST_FILE_PATH] [OPTIONS]

Run a permutation batch

Arguments:
  test_file_path        Plugin test setup file.
                        default: /permutate/workspace/plugin_test.yaml

Options:
  --help                                   show this help message and exit
  --save-to-html  --no-save-to-html        Save the results of the permutation run to an html file.
                                           default: save-to-html
  --save-to-csv   --no-save-to-csv         Save the results of the permutation run to a csv file.
                                           default: no-save-to-csv
  --output-directory                       Path to the directory where the output files will be saved.
                                           default: /permutate/workspace/output/

Examples:

This command will run the tests defined in the plugin_test.yaml file and save the results to a csv file and an html file in the directory pointed by the flag --output-directory.

permutate run tests/files/plugin_test.yaml --output-directory tests/files/output/ --save-to-csv --save-to-html 

This command will run the tests on a sample test file provided in the package and save the results to an html file. This command can be used to see the sample output.

permutate run

Docker

docker run -v /LOCALPATH/plugin_test.yaml:/usr/app/plugin_test.yaml -e "OPENAI_KEY=your-key" -e "COHERE_API_KEY=your-key" -e "GOOGLE_APPLICATION_CREDENTIALS=your-file-path" shrikant14/permutate:latest
Output

You can save your permutate run output to:

  1. HTML Report:

    You can save your permutation run output to an HTML Report that presents the results of the permutation run in a structured and visually appealing format.

    Sample report: https://raw.githubusercontent.com/LegendaryAI/permutate/main/docs/sample_result.html

    https://raw.githubusercontent.com/LegendaryAI/permutate/main/docs/sample_result_screenshot.png Alt text

  2. CSV Report.

    You can save your permutation run output to two csv files: one for the permutation run summary and one for the permutation run details.

    Sample summary: https://raw.githubusercontent.com/LegendaryAI/permutate/main/docs/sample_result_summary.csv

    Sample details: https://raw.githubusercontent.com/LegendaryAI/permutate/main/docs/sample_result_details.csv

(THIS PROJECT IS NOT RELEASED YET). More docs coming soon!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

permutate-0.0.10.tar.gz (13.6 kB view details)

Uploaded Source

Built Distribution

permutate-0.0.10-py3-none-any.whl (14.0 kB view details)

Uploaded Python 3

File details

Details for the file permutate-0.0.10.tar.gz.

File metadata

  • Download URL: permutate-0.0.10.tar.gz
  • Upload date:
  • Size: 13.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.5.0 CPython/3.10.6 Linux/5.15.0-1038-azure

File hashes

Hashes for permutate-0.0.10.tar.gz
Algorithm Hash digest
SHA256 f4cb5ad615a5cca61c7058f0b101419ac1b5c29f5b01af4bc1572db94ee510cc
MD5 4e0246ebec6057743490e9c227d7ca3b
BLAKE2b-256 cf981830bdb7cf8bb3f9ca947fb852b20ea4162358a41247bd767a32d5bb3c00

See more details on using hashes here.

File details

Details for the file permutate-0.0.10-py3-none-any.whl.

File metadata

  • Download URL: permutate-0.0.10-py3-none-any.whl
  • Upload date:
  • Size: 14.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.5.0 CPython/3.10.6 Linux/5.15.0-1038-azure

File hashes

Hashes for permutate-0.0.10-py3-none-any.whl
Algorithm Hash digest
SHA256 43f86d3130663fdc14d87062658b44ff6f41568b6361b3057c25b8319a3d17d8
MD5 dd20993b261f3c9403dd3c8e75209793
BLAKE2b-256 5f2f1232664828cc942d4c2e20d22cb6f42742dad59792a3cf95d8937298eb03

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page