Skip to main content

Local-first context sanitizer — cloak your secrets before they reach the cloud

Project description

Cloq

Cloq

Your secrets stay local. Your LLM gets the context.

A local-first context sanitizer that sits between your IDE and the cloud LLM. It detects API keys, PII, and internal IPs — replaces them with reversible tags — and restores them in the response. Nothing sensitive ever leaves your machine.

CI Python License Stars PRs Welcome

Quick Start · How It Works · Features · Providers · Configuration · Contributing


🎯 The Problem

Developers paste secrets into LLM prompts every day. AWS keys, database credentials, customer emails, internal IPs — most don't even realize it. Enterprise security teams know, and they block AI tools entirely because of it.

Cloq eliminates this risk without changing your workflow.

- "Fix the bug. DB is at 10.0.1.50:5432 and key is AKIAIOSFODNN7EXAMPLE"
+ "Fix the bug. DB is at [INTERNAL_IP_1] and key is [AWS_ACCESS_KEY_1]"
                      Cloud LLM only sees sanitized tags
                     Real values restored locally on response

⚡ The Breakthrough: De-Identified Prompt Caching (DPC)

Traditional LLM prompt caches miss as soon as a variable, key, port, IP, or file path changes.

Because Cloq sanitizes these variables into uniform tags first, it acts as a semantic normalization layer. Identical coding templates are matched locally, cutting development LLM costs by up to 80%!

How DPC Normalizes & Caches Your Prompts:

Developer A: "Fix the bug in 10.0.1.50:5432 with key AKIAIOSFODNN7EXAMPLE"
  ↳ Normalizes to: "Fix the bug in [INTERNAL_IP_1] with key [AWS_ACCESS_KEY_1]"
  ↳ CACHE MISS: Sent to Upstream Cloud LLM. Response cached locally as a template.

Developer B: "Fix the bug in 192.168.1.12:5432 with key AKIAI7YYYDNN7ANOTHER"
  ↳ Normalizes to: "Fix the bug in [INTERNAL_IP_1] with key [AWS_ACCESS_KEY_1]"
  ↳ CACHE HIT! Cloq instantly restores Developer B's variables locally in 4ms at $0 cost.

No cloud upstream call, zero token usage, completely private, and blazing fast.


🚀 Quick Start

# Install
pip install cloq

# Start the local proxy
cloq start

# Point your LLM client to Cloq
export OPENAI_BASE_URL=http://localhost:8989

# Launch the live Developer HUD & Savings Dashboard in another terminal!
cloq dashboard

# Done. Use your tools as normal.

No code changes. No new SDK. Just redirect the base URL.


⚙️ How It Works

Cloq runs as a transparent local proxy on your machine. Every LLM API call passes through it automatically.

┌──────────┐         ┌─────────────────────────┐         ┌──────────┐
│          │         │       Cloq Proxy         │         │          │
│  Your    │  ────►  │                           │  ────►  │  Cloud   │
│  IDE /   │         │  1. Intercept request     │         │  LLM     │
│  CLI /   │         │  2. Detect secrets + PII  │         │  (GPT,   │
│  App     │  ◄────  │  3. Replace → [TAG_1]     │  ◄────  │  Claude, │
│          │         │  4. Forward sanitized     │         │  Gemini) │
│          │         │  5. Restore real values   │         │          │
└──────────┘         └─────────────────────────┘         └──────────┘
                      ↑ Everything stays local

Step by step:

  1. Intercept — Cloq captures the outgoing API request
  2. Detect — A pipeline of detectors scans all text fields for sensitive data
  3. Tag — Each detected entity gets a semantic tag: [AWS_KEY_1], [EMAIL_ADDRESS_1], [INTERNAL_IP_1]
  4. Forward — The sanitized (safe) request goes to the cloud LLM
  5. Restore — When the LLM responds, tags are replaced back with real values
  6. Return — Your tool receives the complete, restored response

The same value always maps to the same tag within a session, so the LLM can reason about relationships ("use [DB_HOST_1] with [AWS_KEY_1]") without ever seeing the real data.


✨ Features

🔐 Detection Engine

Three pluggable detectors that run as a pipeline:

Detector What It Finds Examples
Secrets API keys, tokens, credentials AWS AKIA..., GitHub ghp_..., Stripe sk_live_..., Google AIza..., Slack xox..., JWTs, private keys (RSA/EC/PGP), connection strings (Postgres, MongoDB, Redis, JDBC)
PII Personal data Emails, phone numbers, credit cards (Visa/MC/Amex), SSNs, IBAN codes. Uses Microsoft Presidio when installed, falls back to regex
Network Infrastructure details Private IPs (RFC 1918: 10.x, 172.16-31.x, 192.168.x), IPv6 link-local, localhost, internal hostnames via configurable domain patterns

Plus entropy-based detection for generic high-entropy strings that don't match known patterns.

🔄 Reversible Tagging

Not just <REDACTED> — Cloq uses semantic, indexed tags that preserve meaning for the LLM:

[AWS_ACCESS_KEY_1]    →  The LLM knows this is a credential
[INTERNAL_IP_1]       →  The LLM knows this is a host address
[EMAIL_ADDRESS_2]     →  The LLM can distinguish between two emails

Same value always maps to the same tag (idempotent within a session).

📡 Streaming Support

Full SSE streaming support with intelligent buffering — handles tags that are split across chunk boundaries. Adds < 50ms latency.

🔌 Plugin Architecture

Add your own detectors for organization-specific patterns:

from cloq.detection.base import BaseDetector, DetectionResult

class MyDetector(BaseDetector):
    name = "my_detector"

    def detect(self, text: str) -> list[DetectionResult]:
        # Your custom detection logic
        ...

📋 Audit Logging

JSON Lines audit log that records what type of data was sanitized, but never the actual values:

{"action":"sanitized","entity_type":"AWS_ACCESS_KEY","detector":"secrets","tag":"[AWS_ACCESS_KEY_1]","timestamp":"2025-07-15T10:30:00Z"}

🔌 Supported Providers

Works with any LLM provider. Just set the base URL to http://localhost:8989:

Provider Format Status
OpenAI (GPT-4o, o1, o3) /v1/chat/completions
Anthropic (Claude 3.5/4) /v1/messages
Google Gemini :generateContent
Azure OpenAI /openai/deployments/*/chat/completions
Groq OpenAI-compatible
Together AI OpenAI-compatible
Ollama OpenAI-compatible
Any OpenAI-compatible API /v1/chat/completions

🛠️ CLI Commands

cloq start                  # Start the proxy server
cloq start --port 9090      # Custom port
cloq start --verbose        # Debug logging

cloq scan path/to/file.py   # Scan a file for secrets (standalone)
cloq status                 # Check if proxy is running + stats
cloq test                   # Run a self-test with sample data

cloq config init            # Generate a .cloq.yml template
cloq config show            # Show resolved configuration

Example scan output:

               Scan Results: credentials.env
┏━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━┓
┃ #  ┃ Type           ┃ Value                ┃ Score ┃ Detector ┃
┡━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━┩
│ 1  │ INTERNAL_IP    │ 10.0●●●●●●5432       │   95% │ network  │
│ 2  │ AWS_ACCESS_KEY │ AKIA●●●●●●●●●●●●MPLE │   98% │ secrets  │
│ 3  │ EMAIL_ADDRESS  │ dev@●●●●●●●.com      │   80% │ pii      │
└────┴────────────────┴──────────────────────┴───────┴──────────┘

  3 sensitive item(s) detected
  Scanned in 0.7ms

⚙️ Configuration

cloq config init   # Creates .cloq.yml in your project root
# .cloq.yml
proxy:
  host: 127.0.0.1
  port: 8989

detection:
  secrets:
    enabled: true
    custom_patterns:
      - name: my_internal_token
        regex: "INT-[A-Z0-9]{32}"
        entity_type: INTERNAL_TOKEN
  pii:
    enabled: true
    confidence_threshold: 0.7
    entities: [EMAIL_ADDRESS, PHONE_NUMBER, CREDIT_CARD]
  network:
    enabled: true
    internal_domains:
      - "*.internal.mycompany.com"
      - "*.corp.mycompany.net"

allowlist:
  values: ["api.openai.com", "api.anthropic.com"]

logging:
  audit:
    enabled: true
    path: "~/.cloq/audit.log"

Config priority (highest wins): CLI flags → CLOQ_* env vars → .cloq.yml~/.config/cloq/config.yml → defaults


🐍 Python API

Use Cloq as a library without the proxy:

from cloq.detection.pipeline import DetectionPipeline
from cloq.detection.secrets import SecretsDetector
from cloq.detection.pii import PIIDetector
from cloq.detection.network import NetworkDetector
from cloq.sanitizer.engine import SanitizationSession, sanitize, restore

# Build a detection pipeline
pipeline = DetectionPipeline([
    SecretsDetector(),
    PIIDetector(),
    NetworkDetector(internal_domains=["*.internal.company.com"]),
])

# Detect + sanitize
text = "DB at 10.0.1.50:5432, key AKIAIOSFODNN7EXAMPLE, email dev@corp.com"
results, metrics = pipeline.run(text)

session = SanitizationSession(session_id="req-1")
sanitized = sanitize(text, results, session)
# → "DB at [INTERNAL_IP_1], key [AWS_ACCESS_KEY_1], email [EMAIL_ADDRESS_1]"

# After LLM responds, restore the real values
response = "[INTERNAL_IP_1] is healthy. Use [AWS_ACCESS_KEY_1] to connect."
restored = restore(response, session)
# → "10.0.1.50:5432 is healthy. Use AKIAIOSFODNN7EXAMPLE to connect."

🏢 Enterprise Value

Concern How Cloq Solves It
Data leakage Sensitive data is replaced before it leaves the machine
Compliance Audit log proves what was sanitized without storing secrets
Zero trust Nothing goes to the cloud unredacted — ever
No telemetry Cloq never phones home. Fully offline capable
Custom policies Add organization-specific patterns and domain rules
Developer experience Zero friction — one command, no code changes

🛡️ Security Model

  • Local-only processing — The proxy runs entirely on your machine
  • In-memory sessions — Tag ↔ original mappings are never written to disk
  • Audit without secrets — Logs record entity types and actions, never actual values
  • No telemetry — Zero external network calls from Cloq itself
  • Minimal dependencies — Small attack surface by design

See SECURITY.md for our vulnerability disclosure policy.


📦 Project Structure

src/cloq/
├── cli/           # Typer + Rich CLI (start, scan, status, test, config)
├── config/        # Pydantic v2 config schema + YAML/env loader
├── detection/     # Pluggable detector pipeline
│   ├── secrets.py # 15+ regex patterns + Shannon entropy
│   ├── pii.py     # Presidio integration + regex fallback
│   └── network.py # RFC 1918 IPs, internal hostnames
├── proxy/         # FastAPI + httpx async proxy server
│   ├── providers.py  # OpenAI, Anthropic, Google, Azure adapters
│   └── streaming.py  # SSE streaming with cross-boundary restoration
├── sanitizer/     # Reversible tag↔original engine + session store
└── logging/       # JSON Lines audit logger

🤝 Contributing

We welcome contributions of all kinds!

git clone https://github.com/CodeBase-X1/cloq.git
cd cloq
pip install -e ".[dev]"
make test     # Run 55 tests
make lint     # Ruff linting
make format   # Auto-format

See CONTRIBUTING.md for full guidelines.


🗺️ Roadmap

  • Core detection engine (secrets, PII, network)
  • Reversible sanitization with semantic tags
  • Multi-provider proxy (OpenAI, Anthropic, Google, Azure)
  • SSE streaming support
  • CLI with Rich terminal output
  • VS Code extension
  • JetBrains plugin
  • Local ML model integration (spaCy NER)
  • Web dashboard for monitoring
  • Docker image for team deployment
  • GDPR/HIPAA compliance report generation

Apache 2.0 · Built by the CodeBase-X1 community

If Cloq helped you, consider giving it a ⭐

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cloq-0.1.0.tar.gz (37.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cloq-0.1.0-py3-none-any.whl (46.8 kB view details)

Uploaded Python 3

File details

Details for the file cloq-0.1.0.tar.gz.

File metadata

  • Download URL: cloq-0.1.0.tar.gz
  • Upload date:
  • Size: 37.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.5

File hashes

Hashes for cloq-0.1.0.tar.gz
Algorithm Hash digest
SHA256 2b77eeea2509411df7eca96c4f98c3234aee1a9ed455916181ee778c7888bed6
MD5 54fed58f3c1b85de1544d955710408ac
BLAKE2b-256 90ba46f29043a5f9c9f8544c76f3b690135d8fa2a6e09d06d09657f17cc3fb77

See more details on using hashes here.

File details

Details for the file cloq-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: cloq-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 46.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.5

File hashes

Hashes for cloq-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e9307d15922648894d7f39c7377b8f335854db90e32cb8fb430960f2a17290fb
MD5 0a784a8d8732eb99bc52d5fab3e9848c
BLAKE2b-256 9fbb0e6d97a8fb07df12140e3d04b9f136860fe90ce62614ec38bd7fdd5c0d92

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page