Skip to main content

Local-first runtime governance layer for AI systems

Project description

GuardianRuntime

Guardian Runtime

A Zero-Latency FinOps & Security Firewall for AI Applications.
Intercept every prompt and response locally. Stop data leaks and runaway token costs.

Buy Me A Coffee Python Versions MIT License

๐ŸŒ Website & Docs: https://ashp15205.github.io/guardian-runtime/
๐Ÿ“ฆ Available on PyPI: https://pypi.org/project/guardian-runtime/


๐Ÿ“– Table of Contents


๐Ÿ›‘ The Core Problem: Why You Need Guardian

As AI coding agents (Claude Code, Cursor, Aider) become standard developer tools, they introduce two massive, hidden risks, and one regulatory headache:

๐Ÿ’ธ 1. The FinOps Risk: Cost Runaways

Autonomous agents operate in loops. If an agent gets stuck retrying a bug fix or accidentally dumps a massive 1GB log file into its context window, you can wake up to a $100 API bill overnight. The Problem: You have zero visibility or control over session costs until the provider's bill arrives at the end of the month.

๐Ÿ”’ 2. The Security Risk: Data Exfiltration

Coding agents require full local codebase access to be useful. However, if you accidentally leave an AWS_SECRET_KEY or a database password in a .env file, the agent will silently upload it to a third-party LLM provider (OpenAI, Anthropic). The Problem: Current observability tools (like Langfuse) only log the leak after the credentials have already reached the cloud.

๐Ÿ› 3. The Compliance Risk (Briefly)

Sending unauthorized PII (like SSNs or emails in a test database) to foreign LLM APIs violates GDPR and DPDP regulations.


๐ŸŸข The Solution: Guardian Runtime

Guardian Runtime is a local-first security middleware and FinOps firewall. It runs entirely on your local machine and intercepts LLM traffic before it leaves your infrastructure.

The Problem How Guardian Solves It
Cost Runaways Hard FinOps Budgets & Optimization: Tracks every token you spend locally. You can set a strict "$5.00 per day" limit. Advanced Terse Mode compresses both input context and output responses. In benchmarks across real developer prompts, it reduces output tokens by 40โ€“70% while maintaining full technical accuracy.
Data Exfiltration Zero-Latency Secret Scanners: Scans every prompt for API keys, AWS credentials, and secrets locally. If it detects a secret, it instantly drops the request before it reaches the internet.
Compliance Local PII Blocking: Regex and ML scanners prevent PII from leaving your machine.

๐Ÿ— Architecture & The Security Pipeline

Guardian intercepts traffic at the network layer or via SDK, passing it through a strict verification pipeline before it ever reaches the cloud.

  Agent / Dev                 Guardian Runtime                   Cloud LLM
       โ”‚                             โ”‚                               โ”‚
       โ”‚  1. Prompt + Context        โ”‚                               โ”‚
       โ”‚ โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ถ โ”‚                               โ”‚
       โ”‚                             โ”‚                               โ”‚
       โ”‚                             โ”‚ [Security Firewall]           โ”‚
       โ”‚                             โ”‚ โ”œโ”€ Scan AWS Keys / Secrets    โ”‚
       โ”‚                             โ”‚ โ””โ”€ Block if Threat Detected โ”€โ”€โ”ผโ”€ (Drops Request)
       โ”‚                             โ”‚                               โ”‚
       โ”‚                             โ”‚ [Token Optimizer]             โ”‚
       โ”‚                             โ”‚ โ”œโ”€ Compress Whitespace        โ”‚
       โ”‚                             โ”‚ โ””โ”€ Terse Mode (Output Trim)   โ”‚
       โ”‚                             โ”‚                               โ”‚
       โ”‚                             โ”‚ [FinOps Budget]               โ”‚
       โ”‚                             โ”‚ โ”œโ”€ Check Daily Spend Limit    โ”‚
       โ”‚                             โ”‚ โ””โ”€ Block if $5 Limit Hit โ”€โ”€โ”€โ”€โ”€โ”ผโ”€ (Drops Request)
       โ”‚                             โ”‚                               โ”‚
       โ”‚                             โ”‚  2. Sanitized Prompt          โ”‚
       โ”‚                             โ”‚ โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ถ โ”‚
       โ”‚                             โ”‚                               โ”‚
       โ”‚                             โ”‚  3. LLM Response              โ”‚
       โ”‚                             โ”‚ โ—€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ โ”‚
       โ”‚                             โ”‚                               โ”‚
       โ”‚                             โ”‚ [Output Guard]                โ”‚
       โ”‚                             โ”‚  Audit for Leaked PII/Secrets โ”‚
       โ”‚                             โ”‚                               โ”‚
       โ”‚  4. Safe Response           โ”‚                               โ”‚
       โ”‚ โ—€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ โ”‚                               โ”‚
       โ”‚                             โ”‚                               โ”‚

๐Ÿ”Œ Supported Integrations

Guardian Runtime acts as an HTTP proxy or a native Python SDK, meaning it integrates effortlessly with almost any modern AI tool without modifying their internal code.

  • Visual IDEs: Cursor, Windsurf, VS Code (via Cline/RooCode)
  • Terminal Agents: Claude Code, Aider, GitHub Copilot CLI
  • Frameworks: LangChain, AutoGen, LlamaIndex, CrewAI
  • LLM Providers: OpenAI, Anthropic, Google Gemini (via OpenAI compatibility layer)

๐Ÿš€ Quickstart & Installation

# Core framework only
pip install guardian_runtime

# Or install with specific LLM providers:
pip install "guardian_runtime[openai]"
pip install "guardian_runtime[anthropic]"
pip install "guardian_runtime[gemini]"

# Or install everything (Providers, ML Scanner, Document Converter):
pip install "guardian_runtime[all]"

Done. No signup, no keys, zero configuration required. All monitoring data stays on your local machine in ~/.guardian_runtime/.


๐ŸŽฏ Comprehensive Use Cases (Where & How to Use)

Guardian is designed to be universal. Here are the exact ways to deploy it based on your workflow.

1. Terminal Coding Agents (Claude Code, Aider)

Why use it here? CLI agents operate autonomously. They can accidentally read a .env file containing your production AWS keys and send it to Anthropic/OpenAI as context. Guardian prevents this and ensures the agent doesn't blow your budget.

How to use:

  1. Start the proxy in a background terminal:
    guardian_runtime proxy --port 8080
    
  2. Tell your agent to route traffic through the proxy using environment variables: In PowerShell:
    $env:ANTHROPIC_BASE_URL="http://localhost:8080"
    claude
    
    In Mac/Linux/Git Bash:
    export ANTHROPIC_BASE_URL=http://localhost:8080
    claude
    

2. Visual IDEs (Cursor, Windsurf)

Why use it here? Modern GUI editors like Cursor have deep codebase access. While coding, you might highlight a file containing a secret and ask "explain this file". Guardian stops Cursor from sending that secret to the cloud.

How to use (Cursor Example):

  1. Start the proxy in your terminal: guardian_runtime proxy --port 8080
  2. Open Cursor Settings (Cmd/Ctrl + ,)
  3. Navigate to Models > Override Base URL
  4. Set the Base URL to: http://localhost:8080 (Now all of Cursor's traffic is protected and tracked locally!)

3. Production Python Applications (SDK)

Why use it here? If you are building a production chatbot or RAG pipeline, you must ensure your users cannot perform "jailbreak" prompt injections or trick the LLM into leaking internal system prompts.

How to use: Use Guardian as a drop-in replacement for the OpenAI/Anthropic SDK.

import os
from guardian_runtime import GuardianRuntime, GuardianRuntimeBlockedError

os.environ["OPENAI_API_KEY"] = "sk-proj-..."
gr = GuardianRuntime() # Zero-config initialization

try:
    # Protects user input before sending to OpenAI
    response = gr.complete(
        messages=[{"role": "user", "content": "My AWS Key is AKIAIOSFODNN7EXAMPLE"}],
        raise_on_block=True
    )
    print(response.content)
except GuardianRuntimeBlockedError as e:
    # Fails cleanly in your app instead of leaking the secret!
    print(f"Blocked Locally: {e.response.violations[0].detail}")

4. Agentic Frameworks (LangChain, AutoGen)

Why use it here? Frameworks that spawn multiple communicating agents can rapidly consume tokens. Guardian acts as a central cost-tracking hub for all agent nodes.

How to use: Point your framework's base_url to the local proxy.

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    model="gpt-4o",
    base_url="http://localhost:8080", # Traffic routes through Guardian
    api_key="sk-proj-..."
)
response = llm.invoke("Hello, Guardian!")

5. Data Prep for Web UIs (Document Conversion)

Why use it here? If you use the standard ChatGPT or Claude Web UI, uploading large PDFs eats up your context window quickly because PDFs contain massive amounts of hidden formatting bloat.

How to use: Use the built-in CLI to strip out formatting bloat and compress documents into pure Markdown before manually uploading them.

guardian_runtime convert <path/to/input.pdf> --out <path/to/output.md>

You can now upload cleaned_report.md to ChatGPT, saving huge amounts of context space and preventing hallucination.


๐Ÿ’ป Exhaustive CLI Command Reference

Guardian ships with a powerful suite of offline CLI tools. All data is stored purely locally in ~/.guardian_runtime/. Below is a detailed dive into every command, its flags, and exactly how and why to use it.

guardian_runtime proxy (The Security Firewall)

Starts the local HTTP interception server. This is the core engine for protecting tools that you cannot edit the source code for (like Cursor or Claude Code).

Flags & Options:

  • --port, -p <int>: Port to listen on (Default: 8080).
  • --host <str>: Host to bind to. Use 0.0.0.0 to expose on your local network (Default: 127.0.0.1).
  • --policy <path>: Path to a custom policy.yaml file. If omitted, uses the default Zero-Config policy ($10 budget).
  • --reload: Enables auto-reload if the policy file changes (useful for dev mode).

Example Usage:

$ guardian_runtime proxy --port 8080
  โ›จ  GuardianRuntime Runtime Proxy
  โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
  Listening on : http://127.0.0.1:8080
  Policy       : Default (Zero-Config)
  Dashboard    : guardian_runtime dashboard (run in another terminal)

  Agent setup:
    Claude Code  โ†’  ANTHROPIC_BASE_URL=http://localhost:8080 claude
    Aider        โ†’  OPENAI_BASE_URL=http://localhost:8080 aider
    Cursor       โ†’  Settings โ†’ API Base โ†’ http://localhost:8080

guardian_runtime convert <path> (Document Analysis)

Converts massive PDF, DOCX, and XLSX files into highly compressed, token-optimized Markdown.

Why use this? If you upload a raw PDF to a Web UI (like ChatGPT) or parse it in an agent, you waste thousands of tokens on hidden formatting bloat. This command strips the bloat before it hits the LLM context window.

Arguments & Flags:

  • <path>: The absolute or relative path to the document you want to compress.
  • --out, -o <path>: Output file path for the converted Markdown. If omitted, prints a preview to the terminal.

Example Usage:

$ guardian_runtime convert <path/to/input_file> --out <path/to/output_file.md>
โ›จ GuardianRuntime Document Converter
Processing: input_file...

โœ“ Conversion Complete!
  โ€ข Original File:  input_file
  โ€ข Token Count:    14,205
  โ€ข Saved to:       output_file.md

guardian_runtime scan <text> (Manual Threat Verification)

Performs a local security scan on a specific text string using the ML InputGuard and Regex scanners.

Why use this? Use this to verify exactly what the firewall will catch before you send a massive codebase to an agent, or if you want to test how sensitive the PII/Secret detection is.

Example Usage:

$ guardian_runtime scan "My AWS key is AKIAIOSFODNN7EXAMPLE"
๐Ÿ›‘ Scan failed! Threats detected:
  - [HIGH] secret_detected: AWS Access Key ID found.

guardian_runtime analytics (FinOps Tracking)

Prints a beautiful terminal summary of today's API costs, token usage, and intercepted threats broken down by tool.

Flags:

  • --all: Shows all-time historical analytics instead of just today.

Example Usage:

$ guardian_runtime analytics
  โ›จ  GuardianRuntime Session Analytics (Today)
  โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€

  Claude Code
  Cost:       $2.3100
  Requests:   54
  Blocked:    3 (3 secret_detected)
  Tokens:     82,000

Additional Administration Commands

  • guardian_runtime --help: Prints the global help menu listing all available commands and flags.
  • guardian_runtime dashboard: Launches a beautiful React-based local Web UI tracking costs and threats on port 3000. It visualizes the analytics data with charts.
  • guardian_runtime logs: Tails the local JSONL event stream in real-time (tail -f ~/.guardian_runtime/logs/events.jsonl). Perfect for debugging exactly why a specific prompt was blocked.
  • guardian_runtime init: Generates a boilerplate policy.yaml file in your current directory. Use this if you want to customize budgets, disable ML scanners, or enforce strict enterprise PII blocking.
  • guardian_runtime validate: Checks your policy.yaml for syntax errors before you restart the proxy.
  • guardian_runtime status: Shows the health of the local installation, ML models, and storage directory.
  • guardian_runtime clean: Deletes your entire ~/.guardian_runtime directory. Use this if you want to permanently delete all local analytics, logs, and custom policies.

โš™๏ธ Advanced Configuration (Policy YAML)

Guardian Runtime is perfectly tuned out of the box with a $10 daily budget and strict secret scanning. If you need custom rules, run guardian_runtime init to create a policy.yaml:

version: "1.0"
agents:
  default:
    llm:
      provider: openai
      default_model: gpt-4o

    input_guard:
      scanner_enabled: true
      jailbreak_detection: true
      scanner_action: block 
      
    cost:
      daily_budget: 5.00        # Instantly block if daily spend exceeds $5.00
      max_input_tokens: 20000   # Block massive context windows to save money
      
    optimizer:
      enabled: true
      terse_mode: true          # Slashes output tokens by forcing terse shorthand

๐Ÿ›‘ What happens when Guardian blocks a request?

Where will I see the block?

  • If using the Proxy: You will see the block in the terminal running guardian_runtime proxy, AND inside the UI of the tool you are using (e.g., Claude Code or Aider).
  • If using the Python SDK: It surfaces instantly in your standard Python server logs or terminal.

How is it blocked?

  • Proxy Mode: Guardian returns a graceful error with a clear message. This ensures CLI agents display a clean error message in their chat interface instead of crashing or freezing your session.
  • SDK Mode: Guardian raises a GuardianRuntimeBlockedError exception that can be cleanly caught.

Example Block Message: BadRequestError: ๐Ÿšจ [SECRET_DETECTED] AWS key AKIAIOS... found. Request blocked locally.


๐Ÿ“œ License

Released under the MIT License.
Zero tracking, zero cloud dependencies. Your code is yours.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

guardian_runtime-1.1.1.tar.gz (670.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

guardian_runtime-1.1.1-py3-none-any.whl (58.8 kB view details)

Uploaded Python 3

File details

Details for the file guardian_runtime-1.1.1.tar.gz.

File metadata

  • Download URL: guardian_runtime-1.1.1.tar.gz
  • Upload date:
  • Size: 670.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for guardian_runtime-1.1.1.tar.gz
Algorithm Hash digest
SHA256 5efa3080d52734731920e546451ab49b50c382ac5f32e3747e7b41c6d85f5108
MD5 a546e4774daed8fd319b6fa42be2305b
BLAKE2b-256 c572e274a972576772746288512b047671420bbcd449ede2f2d0dbd8c57cedca

See more details on using hashes here.

File details

Details for the file guardian_runtime-1.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for guardian_runtime-1.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 e316dee90d5907983ef144172a7b6749884be86cc07fa5290ee1de70752e186b
MD5 0325fc903f40044bcfafc14acb09ff73
BLAKE2b-256 62c87a996908e7b22471478a8def2496df41adcd8c9ee9c011e114bebf30506a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page