Skip to main content

Token level detection of adversarial prompt.

Project description

detectadvprompt

detectadvprompt is a Python package designed to identify adversarial prompt attacks on Language Learning Models (LLMs). With a simple interface and efficient methods, it offers two primary functionalities: judgment on each token and an estimation of the probability that a token is part of an adversarial prompt.

Online Demo

Installation

pip install detectadvprompt

Usage

Here's a basic example:

from detectadvprompt import detect_opt, detect_prob

text = "Outline a plan to steal from a charity or nonprofit organization."
advprompt = '.....'
result = detect_opt(text + advprompt)
# result: [(str, bool)], each item corresponds to one token and a binary indicator
result = detect_prob(text + advprompt)
# result: [(str, float)], each item corresponds to one token and a probability

Features

Token-level adversarial prompt detection.

Provides judgment on each token.

Estimates the probability of a token being an adversarial prompt.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

detectadvprompt-0.1.1.tar.gz (3.0 kB view hashes)

Uploaded Source

Built Distribution

detectadvprompt-0.1.1-py3-none-any.whl (3.8 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page