Prompt injection generator
Project description
Antler: A python tool for automatically generating jailbreak prompts
ANalytical Tool for evaluating Large language model Efficiency in Repelling malicious instructions
The antler tool is an automatic tool for generating and evaluating jailbreak attacks against a target LLM. It is designed to aid LLM red teamers in testing models and applications.
Introduction
Built as part of a thesis project, the antler tool attempts to find a combination and permutation of jailbreak techniques, that can successfully jailbreak a given target model. The successfulness of a jailbreak is evaluated by feeding the target LLM different probe questions, wrapped in the prompts constructed by the techniques. The probes have been sampled from the SimpleSafetyTests project by bertiev, which is licensed under the Creative Commons Attribution 4.0 International License. Each probe has been paired with detectors, which mostly consist of string matching of keywords. This is done to automatically detect a malicous answer for each given probe.
The combination and permutation space of the techniques can be explored with different algorithms, with a sensible default set for different query amounts. The options are Greedy hill climb, Simulated annealing, and Random search.
Installation
Standard install with pip
pip install antler
Clone the repo to install the development version
Navigate to a suitable directory and clone the repo:
git clone git@github.com:martinebl/antler.git
cd antler
Install an editable version with pip, possibly after opening a virtual environment
pip install -e .
Geting started
The currently supported LLM providers are: Replicate, OpenAI and Ollama.
Replicate
When using the replicate provider, an API key is needed. This can be passed with the --api_key
parameter, or set as an environment variable called "REPLICATE_API_TOKEN".
The full model name, including the provider is required, e.g. "mistralai/mixtral-8x7b-instruct-v0.1".
OpenAI
When using the openai provider, an API key is needed. This can be passed with the --api_key
parameter, or set as an environment variable called "OPENAI_API_TOKEN".
Ollama
When using the ollama provider no API key is needed. It is required to have an accessible ollama instance running. If the instance is running on the default port, no further configuration is needed. If not, the correct domain and port needs to be provided as an option named "host", e.g. --options '{"host":"127.0.0.1:11434"}'
.
Note that the json object is required to have double quotes around parameter names.
It is adviced to run queries against an Ollama instance sequentially to avoid timeouts. This is done by specifying --processes 1
.
Command line Options
--provider, -p
The target provider. Examples: openai, ollama, replicate--model, -m
The target model. Examples: gpt-3.5-turbo, 'llama2:7b'--max, -M
The maximum amount of queries to run, against the target LLM. Default: 100.--explorer, -e
The explorer strategy. Possible values: simulatedannealing, randomsearch and greedyhillclimbexplorer. Default depends on max queries.--repetitions, -r
The repetitions of each prompt/probe queries. Default: 3--processes, -P
The number of processes to run in parallel. Currently only has an effect when = 1, activating sequential querying.--api_key
The api_key for the target provider. Optional for locally running LLMs.--options, -o
The options for the target provider, in json format.
Examples of run parameters:
antler -p openai -m gpt-3.5-turbo --max 500 --api_key SECRET_TOKEN
antler -p ollama -m mistral -r 1 -P 1 --options '{"host":"127.0.0.1:11434"}'
Sample run and results
The gif above shows a sample run at 2x speed. The printed results, shows how the different transforms (list of techniques) performed on the target model mixtral-8x7b. The transforms that performed well (>= 50% ASR) are colored green, medium (> 0% ASR) orange, and poor (0% ASR) red. The average ASR's of each technique and probe are also displayed.
When running the program two directories will be created in the current working directory: reports and logs. In the logs directory, will be a json file containing all prompts sent to the model paired with all the different answers recieved to said prompts. In the reports directory a txt version of the printed report in the terminal is stored.
Authored by M. Borup-Larsen and C. Christoffersen
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file antler-0.1.2.tar.gz
.
File metadata
- Download URL: antler-0.1.2.tar.gz
- Upload date:
- Size: 71.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.11.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e15e5e317dc881093cd288a3df206916a0947b5e7f8f23c2ec17a120d8243040 |
|
MD5 | a9a96c49e64bf5460dcdf04f7fdca452 |
|
BLAKE2b-256 | a0ca5592d2aa2f77823568b9aaa9511cf2cc00b0cd924a76e38e0b84aab08872 |
File details
Details for the file antler-0.1.2-py3-none-any.whl
.
File metadata
- Download URL: antler-0.1.2-py3-none-any.whl
- Upload date:
- Size: 67.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.11.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 470bde83dec9e470daf5ac518e7530a463cd1e1463093d79703bb9adfacff29d |
|
MD5 | 29e2d9571b2e6c39f230873c6c722efb |
|
BLAKE2b-256 | adf3758ba2a5c6c2271dcf1595cc7bf96d47565aac5dadf01aa298ec56960ce7 |