Curated prompt injection attack database for defensive AI security research

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

perfecxion

These details have not been verified by PyPI

Project description

Prompt Injection Attack Database

A curated, searchable database of prompt injection attacks for defensive AI security research.

Built by Scott Thornton

What is this?

3,900+ prompt injection attacks from 20 source datasets, deduplicated via SHA256 content hashing, classified by technique and severity, and searchable via FTS5 full-text search. A quality scoring engine identifies and filters noise, leaving ~1,300 high-signal attack prompts.

Think of it as Exploit-DB for prompt injection — a structured, searchable, testable collection of real-world attack techniques.

Features

Full-text search via SQLite FTS5 with Porter stemming
SHA256 content deduplication — no duplicate prompts
OWASP LLM Top 10 (2025) mapping on all categories
MITRE ATLAS technique IDs for threat model interoperability
Quality scoring engine — 60+ regex patterns detect real attacks vs. noise
Data curation pipeline — audit and remove non-attack content
Test result tracking — record effectiveness against specific models
Export to JSON, JSONL, or CSV
pip-installable with prompt-db CLI

Quick Start

# Install
pip install -e .

# Build the database from JSON sources
prompt-db build --data-dir . --output prompts.db

# Run quality curation (removes noise)
prompt-db --db prompts.db curate

# View statistics
prompt-db --db prompts.db stats

# Search for attacks
prompt-db --db prompts.db search "ignore previous instructions"
prompt-db --db prompts.db search "system prompt" --technique prompt_extraction

# Export high-quality attacks
prompt-db --db prompts.db export --min-score 8 --format jsonl -o attacks.jsonl

# View details of a specific prompt
prompt-db --db prompts.db info 147

Data Sources

Source	Count	Avg Quality	Type
jailbreak-llms	~1,000	High	Jailbreak prompts from Discord/Reddit
elite_custom_prompts	120	High	Hand-crafted advanced attacks
benign-malicious-classification	~120	High	Labeled attack/benign pairs
lakera-gandalf	~40	Medium	Gandalf challenge prompts
prompt-injection-research	~17	Medium	Research-derived attacks
+ 15 other sources	—	Varies	Mixed quality, filtered by curation

After quality curation, ~1,300 prompts remain from an initial 3,900+.

Attack Techniques

Technique	Description	OWASP
`prompt_injection`	Direct instruction manipulation	LLM01
`jailbreak`	Bypass safety guardrails	LLM01
`prompt_extraction`	Extract system prompts/instructions	LLM01, LLM06
`data_exfiltration`	Leak training data or PII	LLM06
`multi_turn_attack`	Multi-step conversation manipulation	LLM01
`obfuscation`	Encoding/obfuscation techniques	LLM01
`payload_splitting`	Split malicious payload across messages	LLM01
`adversarial_attack`	Adversarial perturbation attacks	LLM01

Python Library

from prompt_database import PromptDatabase

with PromptDatabase("prompts.db") as db:
    # Full-text search
    results = db.search("ignore previous instructions", limit=10)

    # Filter by technique and sophistication
    advanced = db.filter_prompts(
        technique="jailbreak",
        min_sophistication=8,
        complexity="advanced",
    )

    # Record test results
    db.add_test_result(
        prompt_id=147,
        target_model="claude-sonnet-4-5",
        actual_prompt="Ignore all previous instructions...",
        result="FAIL",  # Model refused — defense worked
        confidence_score=0.95,
        tool_used="manual",
    )

    # Export for external tools
    prompts = db.export_prompts(min_sophistication=7, verified_only=False)

    # Database statistics
    stats = db.stats()
    print(f"Total: {stats['total_prompts']}, Verified: {stats['verified']}")

CLI Reference

Command	Description
`prompt-db build`	Build database from JSON source files
`prompt-db stats`	Show database statistics
`prompt-db search <query>`	Full-text search with filters
`prompt-db info <id>`	Detailed view of a single prompt
`prompt-db export`	Export to JSON/JSONL/CSV
`prompt-db audit`	Data quality audit by source
`prompt-db curate`	Remove noise, flag high-quality prompts

Global options: --db <path> (or PROMPT_DB_PATH env var), --version

Schema

The SQLite database uses the following core tables:

prompts — Main prompt storage with content hash, technique, complexity, sophistication score
categories — OWASP LLM Top 10 categories with MITRE ATLAS IDs
tags — Flexible tagging (attack patterns, techniques)
test_results — Empirical test data (model, result, confidence, latency)
prompt_variations — Generated/manual attack variations
prompts_fts — FTS5 full-text search index

Project Structure

prompt-database/
├── src/prompt_database/
│   ├── __init__.py           # Package entry, exports PromptDatabase
│   ├── db.py                 # Core database class (search, CRUD, export)
│   ├── cli.py                # Click CLI (build, stats, search, export, audit, curate)
│   ├── ingest.py             # JSON ingestion pipeline with category/tag seeding
│   ├── quality.py            # Quality scoring engine (60+ attack patterns)
│   └── schema.sql            # SQLite schema (FTS5, content hashing, versioning)
├── tests/
│   ├── test_db.py            # 11 tests: schema, CRUD, search, dedup, stats
│   └── test_quality.py       # 8 tests: attack detection, noise filtering
├── curated_advanced_prompts_v2.json   # 3,863 curated prompts from 20 sources
├── elite_custom_prompts.json          # 120 hand-crafted advanced attacks
├── pyproject.toml                     # Package config (pip install -e .)
└── README.md

Development

# Install with dev dependencies
make dev

# Run tests
make test

# Lint & format
make lint
make format

# Build database, curate, and view stats
make curate
make stats

# Clean generated files
make clean

Or without make:

pip install -e ".[dev]"
pytest tests/ -v
ruff check src/ tests/

See examples/basic_usage.py for Python library usage.

Roadmap

~~Export plugins for Garak, ps-fuzz~~ (done)
~~GitHub Actions CI/CD~~ (done)
Automated testing against model APIs (record real success rates)
RAG-powered attack variant generation
Web UI for browsing and contributing
CI/CD quality gates on PR submissions
Model vulnerability leaderboard

Responsible Use

This database is for defensive security research only. See SECURITY.md for full policy. By using this tool, you agree to use it only for authorized security testing, developing defenses, and academic research.

License

MIT — see LICENSE

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

perfecxion

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.0

Mar 30, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

prompt_database-0.1.0.tar.gz (6.0 MB view details)

Uploaded Mar 30, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

prompt_database-0.1.0-py3-none-any.whl (31.6 kB view details)

Uploaded Mar 30, 2026 Python 3

File details

Details for the file prompt_database-0.1.0.tar.gz.

File metadata

Download URL: prompt_database-0.1.0.tar.gz
Upload date: Mar 30, 2026
Size: 6.0 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for prompt_database-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`0b7b7b59416ca08424489ad1495d66f77b9bbe6d27d41b4e272267e340e4c21c`
MD5	`3845eecc8b0576d8a45dda438c16fb35`
BLAKE2b-256	`8b458a7d000ebb583d29e7373b996d531f7987759f31b7a66fde1013f58012f6`

See more details on using hashes here.

Provenance

The following attestation bundles were made for prompt_database-0.1.0.tar.gz:

Publisher: publish.yml on scthornton/prompt-database

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: prompt_database-0.1.0.tar.gz
- Subject digest: 0b7b7b59416ca08424489ad1495d66f77b9bbe6d27d41b4e272267e340e4c21c
- Sigstore transparency entry: 1200405777
- Sigstore integration time: Mar 30, 2026
Source repository:
- Permalink: scthornton/prompt-database@aef41797fe5a014fb89c4057cf8bddc81336af10
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/scthornton
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@aef41797fe5a014fb89c4057cf8bddc81336af10
- Trigger Event: push

File details

Details for the file prompt_database-0.1.0-py3-none-any.whl.

File metadata

Download URL: prompt_database-0.1.0-py3-none-any.whl
Upload date: Mar 30, 2026
Size: 31.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for prompt_database-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`75cd6cf1d359deb370ae627d1d8a0f05186d9073b8d7f44094d01a0923387ff1`
MD5	`bd2c48fcab2e827345f83aff3a2a73d1`
BLAKE2b-256	`a142a07de5b518e81611aa45aa70f2fc3a8e28fc3953dba800e623961cb6a010`

See more details on using hashes here.

Provenance

The following attestation bundles were made for prompt_database-0.1.0-py3-none-any.whl:

Publisher: publish.yml on scthornton/prompt-database

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: prompt_database-0.1.0-py3-none-any.whl
- Subject digest: 75cd6cf1d359deb370ae627d1d8a0f05186d9073b8d7f44094d01a0923387ff1
- Sigstore transparency entry: 1200405865
- Sigstore integration time: Mar 30, 2026
Source repository:
- Permalink: scthornton/prompt-database@aef41797fe5a014fb89c4057cf8bddc81336af10
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/scthornton
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@aef41797fe5a014fb89c4057cf8bddc81336af10
- Trigger Event: push

prompt-database 0.1.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

Prompt Injection Attack Database

What is this?

Features

Quick Start

Data Sources

Attack Techniques

Python Library

CLI Reference

Schema

Project Structure

Development

Roadmap

Responsible Use

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance