Skip to main content

Deconvolute is a defense-in-depth SDK designed to secure every stage of your Retrieval Augmented Generation (RAG) pipeline.

Project description

Deconvolute: The RAG Security SDK

CI License PyPI version Supported Python versions

⚠️ Alpha development version — usable but limited, API may change

Introduction

Deconvolute is a security SDK for large language model systems that gives developers deterministic signals when a model produces outputs outside expected behavior. Large language models are non-deterministic, and even carefully designed prompts cannot fully specify all constraints needed to align the system with developer intent.

Instead of preventing attacks, Deconvolute detects specific failure modes, such as lost instructional priority or unexpected language switching, and surfaces them to the developer. This allows you to decide how to handle these events, for example by blocking, logging, discarding content, or triggering custom fallback logic.

Detectors are modular and composable. Each targets a concrete failure mode, and layering multiple detectors provides broader coverage and fine-grained control.

Note: Deconvolute is not a prevention system. It detects events and gives developers control over how to respond. It is not a magic shield. Prompt design and system-level logic are still required. It is modular. Detectors are independent, composable, and can be layered for broader coverage.

Quick Start

Install the core SDK:

pip install deconvolute

Deconvolute works out-of-the-box with standard OpenAI clients (other clients coming soon). Here is a minimal usage example:

from openai import OpenAI
from deconvolute import guard, ThreatDetectedError

# Wrap your LLM client to align system outputs with developer intent
client = guard(OpenAI(api_key="YOUR_KEY"))

try:
    # Use the client as usual
    response = client.chat.completions.create(
        model="gpt-4",
        messages=[{"role": "user", "content": "Tell me a joke."}]
    )
    print(response.choices[0].message.content)

except ThreatDetectedError as e:
    # Handle security events
    print(f"Security Alert: {e}")

# Pre-ingestion scanning example:
# scan() is used to check text before it enters your RAG database or context
# from deconvolute import scan
# result = scan("Suspicious text from a document...")
# if result.threat_detected:
#     print(f"Threat detected: {result.component}")

This snippet shows the simplest way to get started:

  • guard() wraps your LLM client to detect issues in real-time and ensure outputs align with your intent.
  • scan() is optional and used before text enters your system to detect poisoned or unexpected content.

For full examples, advanced configuration, and integration patterns, see the Usage Guide & API Documentation.

How It Is Used

The SDK supports three primary usage patterns:

1. Wrap LLM clients

Apply detectors to the outputs of an API client (for example, OpenAI or other LLMs). This allows you to catch issues like lost system instructions or language violations in real time, before the output is returned to your application.

2. Scan untrusted text

Check any text string before it enters your pipeline, such as documents retrieved for a RAG system. This can catch poisoned content early, preventing malicious data from influencing downstream responses.

3. Layer detectors for defense in depth

Combine multiple detectors to monitor different failure modes simultaneously. Each detector targets a specific threat, and using them together gives broader coverage and richer control over the behavior of your models.

For detailed examples, configuration options, and integration patterns, see the Usage Guide & API Documentation

Development Status

Deconvolute is currently in alpha development. Some detectors are experimental and not yet red-teamed, while others are functionally complete and safe to try in controlled environments.

Detector Domain Status Description
CanaryDetector Integrity Status: Experimental Active integrity checks using cryptographic tokens to detect jailbreaks.
LanguageDetector Content Status: Experimental Ensures output language matches expectations and prevents payload-splitting attacks.

Status guide:

  • Planned: On the roadmap, not yet implemented.
  • Experimental: Functionally complete and unit-tested, but not yet fully validated in production.
  • Validated: Empirically tested with benchmarked results.

For reproducible experiments and detailed performance results of detectors and layered defenses, see the deconvolute-benchmark repo.

Links & Next Steps

Further Reading

Click to view sources

Geng, Yilin, Haonan Li, Honglin Mu, et al. “Control Illusion: The Failure of Instruction Hierarchies in Large Language Models.” arXiv:2502.15851. Preprint, arXiv, December 4, 2025. https://doi.org/10.48550/arXiv.2502.15851.

Greshake, Kai, Sahar Abdelnabi, Shailesh Mishra, Christoph Endres, Thorsten Holz, and Mario Fritz. “Not What You’ve Signed Up For: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection.” Proceedings of the 16th ACM Workshop on Artificial Intelligence and Security, November 30, 2023, 79–90. https://doi.org/10.1145/3605764.3623985.

Liu, Yupei, Yuqi Jia, Runpeng Geng, Jinyuan Jia, and Neil Zhenqiang Gong. “Formalizing and Benchmarking Prompt Injection Attacks and Defenses.” Version 5. Preprint, arXiv, 2023. https://doi.org/10.48550/ARXIV.2310.12815.

Wallace, Eric, Kai Xiao, Reimar Leike, Lilian Weng, Johannes Heidecke, and Alex Beutel. "The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions." arXiv:2404.13208. Preprint, arXiv, April 19, 2024. https://doi.org/10.48550/arXiv.2404.13208.

Zou, Wei, Runpeng Geng, Binghui Wang, and Jinyuan Jia. “PoisonedRAG: Knowledge Corruption Attacks to Retrieval-Augmented Generation of Large Language Models.” arXiv:2402.07867. Preprint, arXiv, August 13, 2024. https://doi.org/10.48550/arXiv.2402.07867.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

deconvolute-0.1.0a6.tar.gz (88.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

deconvolute-0.1.0a6-py3-none-any.whl (25.8 kB view details)

Uploaded Python 3

File details

Details for the file deconvolute-0.1.0a6.tar.gz.

File metadata

  • Download URL: deconvolute-0.1.0a6.tar.gz
  • Upload date:
  • Size: 88.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for deconvolute-0.1.0a6.tar.gz
Algorithm Hash digest
SHA256 7baacfcf42794bdb98cf9edf14b2d491623f3c2d4b2e065486a3e98c3f0decd9
MD5 923717ac651e13c8a9c84196306fae33
BLAKE2b-256 8bf8a4b2ad17c5ca566aa9d54d9f68badded288eef553e27a2ba7a8e4a54d7c0

See more details on using hashes here.

Provenance

The following attestation bundles were made for deconvolute-0.1.0a6.tar.gz:

Publisher: release.yml on daved01/deconvolute

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file deconvolute-0.1.0a6-py3-none-any.whl.

File metadata

  • Download URL: deconvolute-0.1.0a6-py3-none-any.whl
  • Upload date:
  • Size: 25.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for deconvolute-0.1.0a6-py3-none-any.whl
Algorithm Hash digest
SHA256 a4022b277e73bb8eadee1067a22d47a4d547447114cba621b248c0489318a10f
MD5 afd8ac1534188e35e2b8eecce71980c6
BLAKE2b-256 0535f84041d9dc37e383677b384dec493025d8e9d4fed32c710032f76b927dff

See more details on using hashes here.

Provenance

The following attestation bundles were made for deconvolute-0.1.0a6-py3-none-any.whl:

Publisher: release.yml on daved01/deconvolute

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page