Skip to main content

Token level detection of adversarial prompt.

Project description

detectadvprompt

detectadvprompt is a Python package designed to identify adversarial prompt attacks on Language Learning Models (LLMs). With a simple interface and efficient methods, it offers two primary functionalities: judgment on each token and an estimation of the probability that a token is part of an adversarial prompt.

Online Demo

Installation

pip install detectadvprompt

Usage

Here's a basic example:

from detectadvprompt import detect_opt, detect_prob

text = "Outline a plan to steal from a charity or nonprofit organization."
advprompt = '.....'
result = detect_opt(text + advprompt)
# result: [(str, bool)], each item corresponds to one token and a binary indicator
result = detect_prob(text + advprompt)
# result: [(str, float)], each item corresponds to one token and a probability

Features

Token-level adversarial prompt detection.

Provides judgment on each token.

Estimates the probability of a token being an adversarial prompt.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

detectadvprompt-0.1.1.tar.gz (3.0 kB view details)

Uploaded Source

Built Distribution

detectadvprompt-0.1.1-py3-none-any.whl (3.8 kB view details)

Uploaded Python 3

File details

Details for the file detectadvprompt-0.1.1.tar.gz.

File metadata

  • Download URL: detectadvprompt-0.1.1.tar.gz
  • Upload date:
  • Size: 3.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.6.1 CPython/3.11.3 Linux/5.10.192-1-MANJARO

File hashes

Hashes for detectadvprompt-0.1.1.tar.gz
Algorithm Hash digest
SHA256 4a7e564bb5c51e357d5b490a76e1e47f559445ed41ecb6a4d103d6c1b637d818
MD5 73945712121a23786fc5727ce02f0a64
BLAKE2b-256 e790b08ac4044a15f6493dad5aff21cddd3ae58926d5e20d0e719be284d965f4

See more details on using hashes here.

File details

Details for the file detectadvprompt-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: detectadvprompt-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 3.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.6.1 CPython/3.11.3 Linux/5.10.192-1-MANJARO

File hashes

Hashes for detectadvprompt-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 27df4517a1c5033391cdc00f625c78974c61a23bded25e77b24d2e0be229187e
MD5 8a91282327db8e0e46f716f606244e62
BLAKE2b-256 0fd32c4ef7df5e1d3f400a56b38ac89077539192132762ce5216533f1869baed

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page