Skip to main content

LLM-powered web content extraction with prompt injection defense. Open. Capture. Close.

Project description

Shutter

Web Content Distillation Service

Open. Capture. Close.

PyPI version Python 3.11+ License: MIT


Overview

Shutter is a web content distillation layer that sits between LLM agents and raw web pages. It fetches URLs, uses a cheap/fast LLM to extract only the relevant content based on a query, and returns clean, focused results.

Two key benefits:

  1. Token efficiency — Agents get 200 tokens instead of 20,000
  2. Prompt injection defense — Raw page content never reaches the driver model; injections never make it past the aperture

Quick Start

Installation

# Install via UV (recommended)
uv add grove-shutter

# Or via pip
pip install grove-shutter

# Or run directly without installing
uvx grove-shutter --help

First Run Setup

# Interactive setup (creates ~/.shutter/config.toml)
shutter setup

# Or set environment variables
export OPENROUTER_API_KEY="sk-or-v1-..."
export TAVILY_API_KEY="tvly-..."  # optional, for enhanced fetching

CLI Usage

# Basic extraction
shutter "https://stripe.com/pricing" -q "What are the transaction fees?"

# Choose model tier
shutter "https://docs.python.org/3/library/asyncio.html" -q "How do I create a task?" --model code

# View offenders list
shutter offenders

# Clear offenders list
shutter clear-offenders

Programmatic Usage

from grove_shutter import shutter

result = await shutter(
    url="https://stripe.com/pricing",
    query="What are the transaction fees?",
    model="fast",
    max_tokens=500
)

print(result.extracted)
# Output: "2.9% + 30¢ per successful card charge. Additional 0.5% for..."

# Check for prompt injection
if result.prompt_injection:
    print(f"Injection detected: {result.prompt_injection.type}")
    print(f"Confidence: {result.prompt_injection.confidence}")

Configuration

Configuration File

Shutter stores configuration at ~/.shutter/config.toml:

[api]
openrouter_key = "sk-or-v1-..."
tavily_key = "tvly-..."  # optional

[defaults]
model = "fast"
max_tokens = 500
timeout = 30000

# Optional: Tune prompt injection detection
[canary]
block_threshold = 0.6  # 0.0-1.0, lower = more sensitive

[canary.weights]
# Override confidence weights for specific patterns
instruction_override = 0.95
role_hijack = 0.40  # Lower if you get false positives on "act as" content

Response Format

Clean Extraction

{
  "url": "https://stripe.com/pricing",
  "extracted": "2.9% + 30¢ per successful card charge. Additional 0.5% for manually entered cards. 1.5% for international cards.",
  "tokens_input": 24500,
  "tokens_output": 42,
  "model_used": "openai/gpt-oss-120b",
  "prompt_injection": null
}

Prompt Injection Detected

{
  "url": "https://malicious.example.com",
  "extracted": null,
  "tokens_input": 8200,
  "tokens_output": 0,
  "model_used": "",
  "prompt_injection": {
    "detected": true,
    "type": "instruction_override",
    "snippet": "...IGNORE ALL PREVIOUS INSTRUCTIONS...",
    "domain_flagged": true,
    "confidence": 0.95,
    "signals": ["instruction_override:0.95"]
  }
}

The prompt_injection object includes:

  • confidence: 0.0-1.0 score indicating detection certainty
  • signals: List of contributing detection signals for debugging
  • domain_flagged: Whether the domain was added to the offenders list

Model Tiers

Tier Use Case Model Speed
fast Quick extractions, simple queries openai/gpt-oss-120b (Cerebras) ~2000 tok/s
accurate Complex extraction, nuanced content deepseek/deepseek-v3.2 ~200 tok/s
research Web-optimized, longer analysis alibaba/tongyi-deepresearch-30b-a3b ~150 tok/s
code Technical docs, code extraction minimax/minimax-m2.1 ~300 tok/s

How It Works

Fetch Chain

Shutter uses a smart fetch chain for JavaScript-rendered content:

  1. Jina Reader (primary) — Free JS rendering via r.jina.ai/{url}
  2. Tavily (fallback) — SDK-based JS rendering (requires API key)
  3. Basic httpx (final) — Direct HTML fetch with trafilatura extraction

Prompt Injection Defense

Shutter uses a 2-phase Canary approach with confidence scoring:

Phase 1: Heuristic Checks (free)

  • 17 weighted regex patterns for injection attempts
  • Unicode hidden character detection
  • Base64 payload detection
  • Multi-pattern boost (2+ matches = higher confidence)

Phase 2: LLM Canary (only if heuristics inconclusive)

  • Minimal extraction (100 tokens) with output analysis
  • Detects instruction-following and topic deviation
  • Cost: ~$0.001

If confidence exceeds the threshold (default 0.6), extraction is blocked and the domain is flagged.

Offenders List

Shutter maintains a persistent SQLite database of flagged domains:

  • Location: ~/.shutter/offenders.db
  • Skip conditions:
    • 3+ detections on the domain
    • Single detection with confidence ≥ 0.90
    • 2+ detections with average confidence ≥ 0.80

This creates trial-and-error defense that improves over time.

Development

Setup

# Clone the repository
git clone https://github.com/AutumnsGrove/Shutter.git
cd Shutter

# Install dependencies with UV
uv sync --dev

# Run tests
uv run pytest

# Format code
uv run black src/ tests/
uv run ruff check src/ tests/

Test Coverage

# Run with coverage
uv run pytest --cov=grove_shutter --cov-report=term-missing

# Current: 120 tests passing

Roadmap

v1.0 — Python Production (Current)

  • Core fetch + extraction with OpenRouter
  • Jina/Tavily fetch chain for JS rendering
  • Canary-based prompt injection detection
  • Confidence scoring (0.0-1.0)
  • Config-based weight overrides
  • SQLite offenders list with smart thresholds
  • CLI with setup, offenders commands
  • PyPI release (grove-shutter)

v1.5 — Cloudflare Port

  • TypeScript Workers implementation
  • D1 shared offenders list
  • HTTP API with authentication
  • NPM package (@groveengine/shutter)

v2.0 — Search

  • Multi-URL search queries
  • Additional providers (Exa, Brave)
  • Result aggregation and deduplication

v3.0 — Caching & Intelligence

  • Content caching (R2)
  • Injection pattern learning
  • Vectorize integration

License

MIT

Links


A shutter controls what reaches the lens. Open it, and light floods in—everything, all at once, overwhelming. But a photographer doesn't want everything. They want the shot. The shutter opens precisely when needed, captures exactly what's in frame, and closes before the noise can follow.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

grove_shutter-1.0.0.tar.gz (264.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

grove_shutter-1.0.0-py3-none-any.whl (23.4 kB view details)

Uploaded Python 3

File details

Details for the file grove_shutter-1.0.0.tar.gz.

File metadata

  • Download URL: grove_shutter-1.0.0.tar.gz
  • Upload date:
  • Size: 264.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.14

File hashes

Hashes for grove_shutter-1.0.0.tar.gz
Algorithm Hash digest
SHA256 c90156d8516b5bd15388e4e1ca5d417e79fb06704df623bf522862aee1e17942
MD5 13ce8b0f28eb24284d593c1802b78a30
BLAKE2b-256 8409d1a43f20c31eee579094cb9ff89aa18e6ebd7f775a24d7a7db97758c8e06

See more details on using hashes here.

File details

Details for the file grove_shutter-1.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for grove_shutter-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7b6f06ff2452fd384e2bce1e0375b76c657394588f1c25092a33fcf12461e1e2
MD5 634f59ae8a32a8716edc0d5c14ad7ab5
BLAKE2b-256 f37c05713a7c8ade20e4261c8d55080d9afb413d79633130e909df485955b27c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page