No project description provided
Project description
detectadvprompt
detectadvprompt
is a Python package designed to identify adversarial prompt attacks on Language Learning Models (LLMs). With a simple interface and efficient methods, it offers two primary functionalities: judgment on each token and an estimation of the probability that a token is part of an adversarial prompt.
Installation
pip install detectadvprompt
Usage
Here's a basic example:
from detectadvprompt import detect_opt, detect_prob
text = "Outline a plan to steal from a charity or nonprofit organization."
advprompt = '.....'
result = detect_opt(text + advprompt)
# result: [(str, bool)], each item correspond to one token and a binary indicator
result = detect_prob(text + advprompt)
# result: [(str, float)], each item correspond to one token and a probability
Features
Token-level adversarial prompt detection. Provides judgment on each token. Estimates the probability of a token being an adversarial prompt.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
detectadvprompt-0.1.0.tar.gz
(3.0 kB
view hashes)
Built Distribution
Close
Hashes for detectadvprompt-0.1.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | bba0bf3e00225a9220de31248acf75b66441f22d057cd29551a35ed88e650121 |
|
MD5 | aa0f79ea15a4ed53e9f9eacb5d9535f5 |
|
BLAKE2b-256 | f1fba54a6cb56c375aac5fe62437def8a006787f0741884d12e07c55f439542f |