LLM-powered web content extraction with prompt injection defense. Open. Capture. Close.
Project description
Shutter
Web Content Distillation Service
Open. Capture. Close.
Overview
Shutter is a web content distillation layer that sits between LLM agents and raw web pages. It fetches URLs, uses a cheap/fast LLM to extract only the relevant content based on a query, and returns clean, focused results.
Two key benefits:
- Token efficiency — Agents get 200 tokens instead of 20,000
- Prompt injection defense — Raw page content never reaches the driver model; injections never make it past the aperture
Quick Start
Installation
# Install via UV (recommended)
uv add grove-shutter
# Or via pip
pip install grove-shutter
# Or run directly without installing
uvx grove-shutter --help
First Run Setup
# Interactive setup (creates ~/.shutter/config.toml)
shutter setup
# Or set environment variables
export OPENROUTER_API_KEY="sk-or-v1-..."
export TAVILY_API_KEY="tvly-..." # optional, for enhanced fetching
CLI Usage
# Basic extraction
shutter "https://stripe.com/pricing" -q "What are the transaction fees?"
# Choose model tier
shutter "https://docs.python.org/3/library/asyncio.html" -q "How do I create a task?" --model code
# View offenders list
shutter offenders
# Clear offenders list
shutter clear-offenders
Programmatic Usage
from grove_shutter import shutter
result = await shutter(
url="https://stripe.com/pricing",
query="What are the transaction fees?",
model="fast",
max_tokens=500
)
print(result.extracted)
# Output: "2.9% + 30¢ per successful card charge. Additional 0.5% for..."
# Check for prompt injection
if result.prompt_injection:
print(f"Injection detected: {result.prompt_injection.type}")
print(f"Confidence: {result.prompt_injection.confidence}")
Configuration
Configuration File
Shutter stores configuration at ~/.shutter/config.toml:
[api]
openrouter_key = "sk-or-v1-..."
tavily_key = "tvly-..." # optional
[defaults]
model = "fast"
max_tokens = 500
timeout = 30000
# Optional: Tune prompt injection detection
[canary]
block_threshold = 0.6 # 0.0-1.0, lower = more sensitive
[canary.weights]
# Override confidence weights for specific patterns
instruction_override = 0.95
role_hijack = 0.40 # Lower if you get false positives on "act as" content
Response Format
Clean Extraction
{
"url": "https://stripe.com/pricing",
"extracted": "2.9% + 30¢ per successful card charge. Additional 0.5% for manually entered cards. 1.5% for international cards.",
"tokens_input": 24500,
"tokens_output": 42,
"model_used": "openai/gpt-oss-120b",
"prompt_injection": null
}
Prompt Injection Detected
{
"url": "https://malicious.example.com",
"extracted": null,
"tokens_input": 8200,
"tokens_output": 0,
"model_used": "",
"prompt_injection": {
"detected": true,
"type": "instruction_override",
"snippet": "...IGNORE ALL PREVIOUS INSTRUCTIONS...",
"domain_flagged": true,
"confidence": 0.95,
"signals": ["instruction_override:0.95"]
}
}
The prompt_injection object includes:
- confidence: 0.0-1.0 score indicating detection certainty
- signals: List of contributing detection signals for debugging
- domain_flagged: Whether the domain was added to the offenders list
Model Tiers
| Tier | Use Case | Model | Speed |
|---|---|---|---|
fast |
Quick extractions, simple queries | openai/gpt-oss-120b (Cerebras) |
~2000 tok/s |
accurate |
Complex extraction, nuanced content | deepseek/deepseek-v3.2 |
~200 tok/s |
research |
Web-optimized, longer analysis | alibaba/tongyi-deepresearch-30b-a3b |
~150 tok/s |
code |
Technical docs, code extraction | minimax/minimax-m2.1 |
~300 tok/s |
How It Works
Fetch Chain
Shutter uses a smart fetch chain for JavaScript-rendered content:
- Jina Reader (primary) — Free JS rendering via
r.jina.ai/{url} - Tavily (fallback) — SDK-based JS rendering (requires API key)
- Basic httpx (final) — Direct HTML fetch with trafilatura extraction
Prompt Injection Defense
Shutter uses a 2-phase Canary approach with confidence scoring:
Phase 1: Heuristic Checks (free)
- 17 weighted regex patterns for injection attempts
- Unicode hidden character detection
- Base64 payload detection
- Multi-pattern boost (2+ matches = higher confidence)
Phase 2: LLM Canary (only if heuristics inconclusive)
- Minimal extraction (100 tokens) with output analysis
- Detects instruction-following and topic deviation
- Cost: ~$0.001
If confidence exceeds the threshold (default 0.6), extraction is blocked and the domain is flagged.
Offenders List
Shutter maintains a persistent SQLite database of flagged domains:
- Location:
~/.shutter/offenders.db - Skip conditions:
- 3+ detections on the domain
- Single detection with confidence ≥ 0.90
- 2+ detections with average confidence ≥ 0.80
This creates trial-and-error defense that improves over time.
Development
Setup
# Clone the repository
git clone https://github.com/AutumnsGrove/Shutter.git
cd Shutter
# Install dependencies with UV
uv sync --dev
# Run tests
uv run pytest
# Format code
uv run black src/ tests/
uv run ruff check src/ tests/
Test Coverage
# Run with coverage
uv run pytest --cov=grove_shutter --cov-report=term-missing
# Current: 120 tests passing
Roadmap
v1.0 — Python Production (Current)
- Core fetch + extraction with OpenRouter
- Jina/Tavily fetch chain for JS rendering
- Canary-based prompt injection detection
- Confidence scoring (0.0-1.0)
- Config-based weight overrides
- SQLite offenders list with smart thresholds
- CLI with setup, offenders commands
- PyPI release (
grove-shutter)
v1.5 — Cloudflare Port
- TypeScript Workers implementation
- D1 shared offenders list
- HTTP API with authentication
- NPM package (
@groveengine/shutter)
v2.0 — Search
- Multi-URL search queries
- Additional providers (Exa, Brave)
- Result aggregation and deduplication
v3.0 — Caching & Intelligence
- Content caching (R2)
- Injection pattern learning
- Vectorize integration
License
MIT
Links
- Repository: github.com/AutumnsGrove/Shutter
- Documentation: docs/SPEC.md
- Issues: github.com/AutumnsGrove/Shutter/issues
A shutter controls what reaches the lens. Open it, and light floods in—everything, all at once, overwhelming. But a photographer doesn't want everything. They want the shot. The shutter opens precisely when needed, captures exactly what's in frame, and closes before the noise can follow.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file grove_shutter-1.0.0.tar.gz.
File metadata
- Download URL: grove_shutter-1.0.0.tar.gz
- Upload date:
- Size: 264.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.6.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c90156d8516b5bd15388e4e1ca5d417e79fb06704df623bf522862aee1e17942
|
|
| MD5 |
13ce8b0f28eb24284d593c1802b78a30
|
|
| BLAKE2b-256 |
8409d1a43f20c31eee579094cb9ff89aa18e6ebd7f775a24d7a7db97758c8e06
|
File details
Details for the file grove_shutter-1.0.0-py3-none-any.whl.
File metadata
- Download URL: grove_shutter-1.0.0-py3-none-any.whl
- Upload date:
- Size: 23.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.6.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7b6f06ff2452fd384e2bce1e0375b76c657394588f1c25092a33fcf12461e1e2
|
|
| MD5 |
634f59ae8a32a8716edc0d5c14ad7ab5
|
|
| BLAKE2b-256 |
f37c05713a7c8ade20e4261c8d55080d9afb413d79633130e909df485955b27c
|