Skip to main content

Local-first context sanitizer — cloak your secrets before they reach the cloud

Project description

Cloq

Cloq

Your secrets stay local. Your LLM gets the context.

A local-first context sanitizer that sits between your IDE and the cloud LLM. It detects API keys, PII, and internal IPs — replaces them with reversible tags — and restores them in the response. Nothing sensitive ever leaves your machine.

CI Python License Stars PRs Welcome

Quick Start · How It Works · Features · Providers · Configuration · Contributing


🎯 The Problem

Developers paste secrets into LLM prompts every day. AWS keys, database credentials, customer emails, internal IPs — most don't even realize it. Enterprise security teams know, and they block AI tools entirely because of it.

Cloq eliminates this risk without changing your workflow.

- "Fix the bug. DB is at 10.0.1.50:5432 and key is AKIAIOSFODNN7EXAMPLE"
+ "Fix the bug. DB is at [INTERNAL_IP_1] and key is [AWS_ACCESS_KEY_1]"
                      Cloud LLM only sees sanitized tags
                     Real values restored locally on response

⚡ The Breakthrough: De-Identified Prompt Caching (DPC)

Traditional LLM prompt caches miss as soon as a variable, key, port, IP, or file path changes.

Because Cloq sanitizes these variables into uniform tags first, it acts as a semantic normalization layer. Identical coding templates are matched locally, cutting development LLM costs by up to 80%!

How DPC Normalizes & Caches Your Prompts:

Developer A: "Fix the bug in 10.0.1.50:5432 with key AKIAIOSFODNN7EXAMPLE"
  ↳ Normalizes to: "Fix the bug in [INTERNAL_IP_1] with key [AWS_ACCESS_KEY_1]"
  ↳ CACHE MISS: Sent to Upstream Cloud LLM. Response cached locally as a template.

Developer B: "Fix the bug in 192.168.1.12:5432 with key AKIAI7YYYDNN7ANOTHER"
  ↳ Normalizes to: "Fix the bug in [INTERNAL_IP_1] with key [AWS_ACCESS_KEY_1]"
  ↳ CACHE HIT! Cloq instantly restores Developer B's variables locally in 4ms at $0 cost.

No cloud upstream call, zero token usage, completely private, and blazing fast.


🚀 Quick Start

# Install
pip install cloq

# Start the local proxy
cloq start

# Point your LLM client to Cloq
export OPENAI_BASE_URL=http://localhost:8989

# Launch the live Developer HUD & Savings Dashboard in another terminal!
cloq dashboard

# Done. Use your tools as normal.

No code changes. No new SDK. Just redirect the base URL.


⚙️ How It Works

Cloq runs as a transparent local proxy on your machine. Every LLM API call passes through it automatically.

┌──────────┐         ┌─────────────────────────┐         ┌──────────┐
│          │         │       Cloq Proxy         │         │          │
│  Your    │  ────►  │                           │  ────►  │  Cloud   │
│  IDE /   │         │  1. Intercept request     │         │  LLM     │
│  CLI /   │         │  2. Detect secrets + PII  │         │  (GPT,   │
│  App     │  ◄────  │  3. Replace → [TAG_1]     │  ◄────  │  Claude, │
│          │         │  4. Forward sanitized     │         │  Gemini) │
│          │         │  5. Restore real values   │         │          │
└──────────┘         └─────────────────────────┘         └──────────┘
                      ↑ Everything stays local

Step by step:

  1. Intercept — Cloq captures the outgoing API request
  2. Detect — A pipeline of detectors scans all text fields for sensitive data
  3. Tag — Each detected entity gets a semantic tag: [AWS_KEY_1], [EMAIL_ADDRESS_1], [INTERNAL_IP_1]
  4. Forward — The sanitized (safe) request goes to the cloud LLM
  5. Restore — When the LLM responds, tags are replaced back with real values
  6. Return — Your tool receives the complete, restored response

The same value always maps to the same tag within a session, so the LLM can reason about relationships ("use [DB_HOST_1] with [AWS_KEY_1]") without ever seeing the real data.


✨ Features

🔐 Detection Engine

Three pluggable detectors that run as a pipeline:

Detector What It Finds Examples
Secrets API keys, tokens, credentials AWS AKIA..., GitHub ghp_..., Stripe sk_live_..., Google AIza..., Slack xox..., JWTs, private keys (RSA/EC/PGP), connection strings (Postgres, MongoDB, Redis, JDBC)
PII Personal data Emails, phone numbers, credit cards (Visa/MC/Amex), SSNs, IBAN codes. Uses Microsoft Presidio when installed, falls back to regex
Network Infrastructure details Private IPs (RFC 1918: 10.x, 172.16-31.x, 192.168.x), IPv6 link-local, localhost, internal hostnames via configurable domain patterns

Plus entropy-based detection for generic high-entropy strings that don't match known patterns.

🔄 Reversible Tagging

Not just <REDACTED> — Cloq uses semantic, indexed tags that preserve meaning for the LLM:

[AWS_ACCESS_KEY_1]    →  The LLM knows this is a credential
[INTERNAL_IP_1]       →  The LLM knows this is a host address
[EMAIL_ADDRESS_2]     →  The LLM can distinguish between two emails

Same value always maps to the same tag (idempotent within a session).

📡 Streaming Support

Full SSE streaming support with intelligent buffering — handles tags that are split across chunk boundaries. Adds < 50ms latency.

🔌 Plugin Architecture

Add your own detectors for organization-specific patterns:

from cloq.detection.base import BaseDetector, DetectionResult

class MyDetector(BaseDetector):
    name = "my_detector"

    def detect(self, text: str) -> list[DetectionResult]:
        # Your custom detection logic
        ...

📋 Audit Logging

JSON Lines audit log that records what type of data was sanitized, but never the actual values:

{"action":"sanitized","entity_type":"AWS_ACCESS_KEY","detector":"secrets","tag":"[AWS_ACCESS_KEY_1]","timestamp":"2025-07-15T10:30:00Z"}

🔌 Supported Providers

Works with any LLM provider. Just set the base URL to http://localhost:8989:

Provider Format Status
OpenAI (GPT-4o, o1, o3) /v1/chat/completions
Anthropic (Claude 3.5/4) /v1/messages
Google Gemini :generateContent
Azure OpenAI /openai/deployments/*/chat/completions
Groq OpenAI-compatible
Together AI OpenAI-compatible
Ollama OpenAI-compatible
Any OpenAI-compatible API /v1/chat/completions

🛠️ CLI Commands

cloq start                  # Start the proxy server
cloq start --port 9090      # Custom port
cloq start --verbose        # Debug logging

cloq scan path/to/file.py   # Scan a file for secrets (standalone)
cloq status                 # Check if proxy is running + stats
cloq test                   # Run a self-test with sample data

cloq config init            # Generate a .cloq.yml template
cloq config show            # Show resolved configuration

Example scan output:

               Scan Results: credentials.env
┏━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━┓
┃ #  ┃ Type           ┃ Value                ┃ Score ┃ Detector ┃
┡━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━┩
│ 1  │ INTERNAL_IP    │ 10.0●●●●●●5432       │   95% │ network  │
│ 2  │ AWS_ACCESS_KEY │ AKIA●●●●●●●●●●●●MPLE │   98% │ secrets  │
│ 3  │ EMAIL_ADDRESS  │ dev@●●●●●●●.com      │   80% │ pii      │
└────┴────────────────┴──────────────────────┴───────┴──────────┘

  3 sensitive item(s) detected
  Scanned in 0.7ms

⚙️ Configuration

cloq config init   # Creates .cloq.yml in your project root
# .cloq.yml
proxy:
  host: 127.0.0.1
  port: 8989

detection:
  secrets:
    enabled: true
    custom_patterns:
      - name: my_internal_token
        regex: "INT-[A-Z0-9]{32}"
        entity_type: INTERNAL_TOKEN
  pii:
    enabled: true
    confidence_threshold: 0.7
    entities: [EMAIL_ADDRESS, PHONE_NUMBER, CREDIT_CARD]
  network:
    enabled: true
    internal_domains:
      - "*.internal.mycompany.com"
      - "*.corp.mycompany.net"

allowlist:
  values: ["api.openai.com", "api.anthropic.com"]

logging:
  audit:
    enabled: true
    path: "~/.cloq/audit.log"

Config priority (highest wins): CLI flags → CLOQ_* env vars → .cloq.yml~/.config/cloq/config.yml → defaults


🐍 Python API

Use Cloq as a library without the proxy:

from cloq.detection.pipeline import DetectionPipeline
from cloq.detection.secrets import SecretsDetector
from cloq.detection.pii import PIIDetector
from cloq.detection.network import NetworkDetector
from cloq.sanitizer.engine import SanitizationSession, sanitize, restore

# Build a detection pipeline
pipeline = DetectionPipeline([
    SecretsDetector(),
    PIIDetector(),
    NetworkDetector(internal_domains=["*.internal.company.com"]),
])

# Detect + sanitize
text = "DB at 10.0.1.50:5432, key AKIAIOSFODNN7EXAMPLE, email dev@corp.com"
results, metrics = pipeline.run(text)

session = SanitizationSession(session_id="req-1")
sanitized = sanitize(text, results, session)
# → "DB at [INTERNAL_IP_1], key [AWS_ACCESS_KEY_1], email [EMAIL_ADDRESS_1]"

# After LLM responds, restore the real values
response = "[INTERNAL_IP_1] is healthy. Use [AWS_ACCESS_KEY_1] to connect."
restored = restore(response, session)
# → "10.0.1.50:5432 is healthy. Use AKIAIOSFODNN7EXAMPLE to connect."

🏢 Enterprise Value

Concern How Cloq Solves It
Data leakage Sensitive data is replaced before it leaves the machine
Compliance Audit log proves what was sanitized without storing secrets
Zero trust Nothing goes to the cloud unredacted — ever
No telemetry Cloq never phones home. Fully offline capable
Custom policies Add organization-specific patterns and domain rules
Developer experience Zero friction — one command, no code changes

🛡️ Security Model

  • Local-only processing — The proxy runs entirely on your machine
  • In-memory sessions — Tag ↔ original mappings are never written to disk
  • Audit without secrets — Logs record entity types and actions, never actual values
  • No telemetry — Zero external network calls from Cloq itself
  • Minimal dependencies — Small attack surface by design

See SECURITY.md for our vulnerability disclosure policy.


📦 Project Structure

src/cloq/
├── cli/           # Typer + Rich CLI (start, scan, status, test, config)
├── config/        # Pydantic v2 config schema + YAML/env loader
├── detection/     # Pluggable detector pipeline
│   ├── secrets.py # 15+ regex patterns + Shannon entropy
│   ├── pii.py     # Presidio integration + regex fallback
│   └── network.py # RFC 1918 IPs, internal hostnames
├── proxy/         # FastAPI + httpx async proxy server
│   ├── providers.py  # OpenAI, Anthropic, Google, Azure adapters
│   └── streaming.py  # SSE streaming with cross-boundary restoration
├── sanitizer/     # Reversible tag↔original engine + session store
└── logging/       # JSON Lines audit logger

🤝 Contributing

We welcome contributions of all kinds!

git clone https://github.com/CodeBase-X1/cloq.git
cd cloq
pip install -e ".[dev]"
make test     # Run 55 tests
make lint     # Ruff linting
make format   # Auto-format

See CONTRIBUTING.md for full guidelines.


🗺️ Roadmap

  • Core detection engine (secrets, PII, network)
  • Reversible sanitization with semantic tags
  • Multi-provider proxy (OpenAI, Anthropic, Google, Azure)
  • SSE streaming support
  • CLI with Rich terminal output
  • VS Code extension
  • JetBrains plugin
  • Local ML model integration (spaCy NER)
  • Web dashboard for monitoring
  • Docker image for team deployment
  • GDPR/HIPAA compliance report generation

Apache 2.0 · Built by the CodeBase-X1 community

If Cloq helped you, consider giving it a ⭐

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cloq-0.1.1.tar.gz (38.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cloq-0.1.1-py3-none-any.whl (47.8 kB view details)

Uploaded Python 3

File details

Details for the file cloq-0.1.1.tar.gz.

File metadata

  • Download URL: cloq-0.1.1.tar.gz
  • Upload date:
  • Size: 38.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.5

File hashes

Hashes for cloq-0.1.1.tar.gz
Algorithm Hash digest
SHA256 41eb7167368e541116bc3d5474abfaf4c441f6b486ccfb02e658fa95c88647c3
MD5 99cf07e0fcc1dfa9ea4fee5466b02c5e
BLAKE2b-256 7d21fecab65c45535868a278cc1c48880d0af2a66b727ebf9b9bcd8e60f7c914

See more details on using hashes here.

File details

Details for the file cloq-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: cloq-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 47.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.5

File hashes

Hashes for cloq-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 95b60b4bc89c2bedd94912e35e75b25ebf7d4be806cd7cb93646348c61fddfbb
MD5 4e9c62c4ab801b1a3395ca68f7b36b35
BLAKE2b-256 352426bba8cb23dd04e1321b1eb8f74bf35279f4763ec79193a2a5f6bc01fefd

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page