Privacy Protection Filter - Detect and protect sensitive information in text

Project description

GemFilter

GemFilter 0.2.2 Local first privacy Agent integrations Python 3.11+

A local privacy firewall for coding agents.
GemFilter detects, masks, tracks, and sanitizes sensitive developer data before it reaches LLMs, tools, logs, or agent context.

Quick Start · Agent Privacy Boundary · Masking Modes · Interfaces · Configuration

Overview

GemFilter started as a sensitive-information redactor. In v0.2, it is moving toward a more practical role: a local privacy boundary for AI coding agents.

Coding agents do not only receive user prompts. They inspect repositories, read files, execute shell commands, consume MCP tool results, write transcripts, and echo model responses. Private data can cross any of those boundaries:

Boundary	Example risk	GemFilter protection
User prompt	User pastes an API key into a request	Pre-send filtering
Tool output	Shell output prints `.env` values	Tool-output filtering
Repository context	Config files contain private endpoints	Recursive payload filtering
Model response	LLM echoes a surrogate or generates a new secret	Post-receive sanitization
CLI / HTTP output	The filter itself returns raw matches	Safe serialization by default

GemFilter is local, rule-based, and LLM-independent. It is designed to be understandable, auditable, and easy to integrate into agent workflows.

What It Protects

GemFilter includes built-in rules for common developer privacy risks:

Category	Examples
Credentials	API keys, passwords, bearer tokens, JWTs
Provider tokens	OpenAI, Anthropic, GitHub, npm, PyPI
Cloud secrets	AWS access keys, AWS secret access keys
Local config	`.env` secret assignments, database URLs
Contact data	Email addresses, Chinese and US phone numbers
Personal identifiers	Chinese ID cards, passports, credit cards
Network data	URLs, IPv4, IPv6, MAC addresses

The default output is safe: serialized detections do not include raw sensitive matches unless an explicit unsafe debug flag is used.

Agent Privacy Boundary

GemFilter protects three main runtime paths:

User prompt / context
        |
        v
  pre_send hook
        |
        v
Masked context -----------------------> LLM / agent
        |                                  |
        |                                  v
        |                          model response
        |                                  |
        v                                  v
Tool output / MCP result -----> post_receive sanitizer
        |
        v
filter_tool_output hook

The same local session is used across these paths, so a surrogate generated during pre-send can be reused later when the same value appears in tool output.

Masking Modes

Different coding tasks need different privacy-utility tradeoffs. GemFilter provides three modes.

Mode	Example	Best for
`strict`	`john@example.com` -> `<EMAIL_1>`	Maximum privacy
`balanced`	`john@example.com` -> `<EMAIL_LOCAL_1>@<EMAIL_DOMAIN_1>`	Default coding-agent use
`utility`	`john@example.com` -> `user1@example.test`	Tests and examples that need plausible fake data

Secrets such as API keys, passwords, private keys, bearer tokens, and database URLs remain typed placeholders even in utility-oriented workflows.

Example:

from gemfilter.skill import GemMasker

strict = GemMasker(masking_mode="strict")
balanced = GemMasker(masking_mode="balanced")
utility = GemMasker(masking_mode="utility")

text = "Contact john@example.com with OPENAI_API_KEY=sk-proj-abcdefghijklmnopqrstuvwxyz123456"

print(strict.mask(text)[0])
# Contact <EMAIL_1> with OPENAI_API_KEY=<OPENAI_KEY_1>

print(balanced.mask(text)[0])
# Contact <EMAIL_LOCAL_1>@<EMAIL_DOMAIN_1> with OPENAI_API_KEY=<OPENAI_KEY_1>

print(utility.mask(text)[0])
# Contact user1@example.test with OPENAI_API_KEY=<OPENAI_KEY_1>

Quick Start

Install

pip install gemfilter

For local development:

git clone https://github.com/liangzid/GemFilter.git
cd GemFilter
pip install -e .

CLI

gemfilter filter "Contact user@example.com and OPENAI_API_KEY=sk-proj-abcdefghijklmnopqrstuvwxyz123456"

Output:

Contact [EMAIL] and OPENAI_API_KEY=[OPENAI_API_KEY]

JSON output is safe by default:

gemfilter filter "Contact user@example.com" --json

{
  "text": "Contact [EMAIL]",
  "detections": [
    {
      "rule": "email",
      "start": 8,
      "end": 24,
      "sensitive_type": "contact",
      "replacement": "[EMAIL]",
      "match_length": 16
    }
  ],
  "summary": {
    "email": 1
  }
}

Raw matches require an explicit unsafe opt-in:

gemfilter filter "Contact user@example.com" --json --unsafe-include-matches

Python SDK

from gemfilter import SandFilter

sf = SandFilter()
result = sf.filter("My email is test@example.com, phone 13800138000")

print(result.text)
# My email is [EMAIL], phone [PHONE_CN]

print(result.summary)
# {'email': 1, 'phone_cn': 1}

Agent Hook API

from gemfilter.skill import HookManager

manager = HookManager()

pre = manager.pre_send(
    "Send the report to john@example.com. The token is sk-proj-abcdefghijklmnopqrstuvwxyz123456.",
    session_id="demo",
)

print(pre.payload)
# Send the report to <EMAIL_LOCAL_1>@<EMAIL_DOMAIN_1>. The token is <OPENAI_KEY_1>.

tool = manager.filter_tool_output(
    {
        "tool": "shell",
        "stdout": "DATABASE_URL=postgres://user:pass@db.internal:5432/app",
        "exit_code": 0,
    },
    session_id="demo",
)

print(tool.payload["stdout"])
# DATABASE_URL=<DATABASE_URL_1>

post = manager.post_receive(
    "I saw <EMAIL_LOCAL_1>@<EMAIL_DOMAIN_1> in the logs.",
    session_id="demo",
)

print(post.payload)
# I saw [FILTERED] in the logs.

Interfaces

GemFilter can be used through several local interfaces.

Interface	Command / API	Use case
Python SDK	`SandFilter`	Library filtering
Skill API	`HookManager`	Agent pre-send, tool-output, post-receive hooks
CLI	`gemfilter filter`	Shell workflows and scripts
HTTP server	`gemfilter-server`	Local REST filtering
MCP / Codex schema	`gemfilter_filter_tool_output`	Tool-result filtering for agent contexts

HTTP Server

gemfilter-server --host localhost --port 8080

Endpoints:

Method	Endpoint	Description
`GET`	`/health`	Health check
`GET`	`/rules`	Enabled and disabled rules
`POST`	`/filter`	Filter one text field
`POST`	`/filter/batch`	Filter multiple text fields

Example:

curl -X POST http://localhost:8080/filter \
  -H "Content-Type: application/json" \
  -d '{"text": "Email user@example.com"}'

Agent Integrations

Install GemFilter into the coding-agent project where you want local privacy protection. The installer writes agent-specific hook configuration in the current working directory.

Copy-paste Agent Setup Prompt

If you are already using a coding agent, you can copy this prompt and send it to the agent from the root of your project:

Please install and configure GemFilter for this coding-agent project.

Goal:
- Protect my local privacy before prompts, tool outputs, file contents, shell outputs, or MCP results enter model context.
- Use GemFilter's local hooks where supported.
- Do not print or expose any real secrets while configuring or testing.

Steps:
1. Detect which agent environment this project uses:
   - Claude Code if .claude/ exists or settings should be written to .claude/settings.json.
   - OpenCode if .opencode/ exists or config should be written to .opencode/config.json.
   - Codex/MCP if .codex/ exists or MCP config should be written to .codex/mcp_config.json.
2. Install GemFilter if needed:
   pip install gemfilter
3. Configure the matching adapter:
   - Claude Code:
     python -m gemfilter.skill.install --agent claude_code
   - OpenCode:
     python -m gemfilter.skill.install --agent opencode
   - Codex/MCP:
     python -m gemfilter.skill.install --agent coodex
4. Verify installation:
   python -m gemfilter.skill.install --status
5. Run a safe local smoke test without using real secrets:
   python -m gemfilter.cli filter "Contact user@example.com and OPENAI_API_KEY=sk-proj-abcdefghijklmnopqrstuvwxyz123456"
6. Report exactly:
   - which adapter was installed,
   - which config file changed,
   - whether status checks passed,
   - whether the smoke test masked the email and fake API key.

If multiple agent environments are present, ask me which one to configure before making changes.

Agent	Integration surface	Hook coverage
Claude Code	`settings.json` hooks	pre-send, post-receive, tool-output
OpenCode	plugin hooks	pre-send, post-receive, tool-output
Codex	MCP-style tool/resource schema	filter, restore, tool-output filter

Claude Code

From the root of your coding project:

pip install gemfilter
python -m gemfilter.skill.install --agent claude_code
python -m gemfilter.skill.install --agent claude_code --status

This creates or updates:

.claude/settings.json

Registered hooks:

onBeforeSend   -> gemfilter.skill.hooks.pre_send_hook
onAfterReceive -> gemfilter.skill.hooks.post_receive_hook
onToolOutput   -> gemfilter.skill.hooks.tool_output_hook

Uninstall:

python -m gemfilter.skill.install --agent claude_code --uninstall

OpenCode

From the root of your coding project:

pip install gemfilter
python -m gemfilter.skill.install --agent opencode
python -m gemfilter.skill.install --agent opencode --status

This creates or updates:

.opencode/config.json

Registered hooks:

pre_send     -> gemfilter.skill.hooks.pre_send_hook
post_receive -> gemfilter.skill.hooks.post_receive_hook
tool_output  -> gemfilter.skill.hooks.tool_output_hook

Uninstall:

python -m gemfilter.skill.install --agent opencode --uninstall

Codex / MCP

From the root of your coding project:

pip install gemfilter
python -m gemfilter.skill.install --agent coodex
python -m gemfilter.skill.install --agent coodex --status

This creates or updates:

.codex/mcp_config.json

Registered resources and tools:

gemfilter://filter      -> pre-send filtering
gemfilter://restore     -> response sanitization
gemfilter://tool-output -> tool-output filtering
gemfilter_filter_tool_output

Uninstall:

python -m gemfilter.skill.install --agent coodex --uninstall

Check All Agents

python -m gemfilter.skill.install --status

Note: the Codex adapter is currently named coodex internally for backwards compatibility. The user-facing integration target is Codex/MCP.

Adapter behavior should still be validated against the exact live hook format of each host agent. The internal GemFilter APIs and tests are stable, but host-agent hook contracts can change.

Configuration

GemFilter looks for skill configuration in this order:

GEMFILTER_SKILL_CONFIG
./config/skill.yaml
./gemfilter/skill/config.yaml
~/.gemfilter/skill.yaml

Example:

name: "gemfilter"
version: "1.0.0"
auto_activate: true

notification:
  style: "banner"
  show_types: true
  show_count: true

masking_mode: "balanced"  # strict | balanced | utility
preserve_format: true

filter:
  config_path: null
  auto_update: true
  enabled_types: []
  filter_tool_outputs: true

Tool-output filtering can affect coding-agent utility when public/example strings need to remain exact. Disable it globally:

filter:
  filter_tool_outputs: false

Or skip one structured payload:

manager.filter_tool_output({
    "gemfilter_skip": True,
    "stdout": "public example value that must stay exact",
})

Full details: Configuration Guide

Custom Rules

Add project-specific rules with regex patterns.

from gemfilter import DetectionRule, SandFilter

sf = SandFilter()

sf.add_rule(
    DetectionRule(
        name="student_id",
        pattern=r"STU\d{8}",
        priority=1,
        sensitive_type="education",
        group="custom",
    )
)

print(sf.filter("Student ID: STU20240001").text)
# Student ID: [STUDENT_ID]

YAML configuration:

settings:
  default_processor: rule_name

rules:
  - name: student_id
    pattern: "STU\\d{8}"
    priority: 10
    sensitive_type: education
    group: custom
    processor: replace
    processor_config:
      replacement: "[STUDENT_ID]"

Built-in Rules

Rule	Description
`email`	Email address
`phone_cn`, `phone_us`	Chinese and US phone numbers
`id_card_cn`, `passport`	Personal identifiers
`credit_card`, `bank_account_cn`	Financial identifiers
`password`, `dotenv_secret`	Passwords and `.env` secret assignments
`api_key`, `api_key_generic`	API key assignments and generic `sk-...` keys
`openai_api_key`, `anthropic_api_key`	Provider-specific LLM API keys
`github_token`, `npm_token`, `pypi_token`	Developer platform tokens
`bearer_token`, `jwt`	Bearer tokens and JWTs
`aws_access_key`, `aws_secret_key`	AWS credentials
`private_key`	Private key headers
`database_url`	PostgreSQL, MySQL, MongoDB, Redis URLs
`ipv4`, `ipv6`, `mac_address`, `url`	Network identifiers

Design Principles

Principle	Meaning
Local first	Sensitive text is processed before it leaves the machine.
Safe by default	CLI and HTTP outputs do not reveal raw matches by default.
Agent-aware	Prompt, tool-output, and response paths are treated separately.
Session-aware	Surrogates are reused across a local multi-turn session.
Utility-conscious	Strict, balanced, and utility modes make tradeoffs explicit.
LLM-independent	The core engine is deterministic and rule-based.

Development

Run tests:

python -m pytest -q

Current v0.2 hardening coverage includes:

safe CLI/HTTP serialization
session mapping correctness
strict/balanced/utility surrogate modes
stronger developer secret detection
tool-output filtering
CLI and HTTP smoke tests

Project documentation:

License

MIT. See the repository license for details.

Project details

Release history Release notifications | RSS feed

This version

0.2.2

May 24, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gemfilter-0.2.2.tar.gz (65.4 kB view details)

Uploaded May 24, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

gemfilter-0.2.2-py3-none-any.whl (77.5 kB view details)

Uploaded May 24, 2026 Python 3

File details

Details for the file gemfilter-0.2.2.tar.gz.

File metadata

Download URL: gemfilter-0.2.2.tar.gz
Upload date: May 24, 2026
Size: 65.4 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for gemfilter-0.2.2.tar.gz
Algorithm	Hash digest
SHA256	`0a7638c1c0a726d3d622321633926a2bec599f62ee5377846fe35b6e7a352c76`
MD5	`96c509f391702942099836ae4645875f`
BLAKE2b-256	`fb03be7df16287af2eb3c7aa8762bb71514f82ef3f99c17740b9e22280b46dc2`

See more details on using hashes here.

Provenance

The following attestation bundles were made for gemfilter-0.2.2.tar.gz:

Publisher: publish.yml on liangzid/GemFilter

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: gemfilter-0.2.2.tar.gz
- Subject digest: 0a7638c1c0a726d3d622321633926a2bec599f62ee5377846fe35b6e7a352c76
- Sigstore transparency entry: 1619436462
- Sigstore integration time: May 24, 2026
Source repository:
- Permalink: liangzid/GemFilter@1939bc8fdbf221f312fb93575a7c4f06591d9970
- Branch / Tag: refs/tags/v0.2.2
- Owner: https://github.com/liangzid
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@1939bc8fdbf221f312fb93575a7c4f06591d9970
- Trigger Event: release

File details

Details for the file gemfilter-0.2.2-py3-none-any.whl.

File metadata

Download URL: gemfilter-0.2.2-py3-none-any.whl
Upload date: May 24, 2026
Size: 77.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for gemfilter-0.2.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a50b863aee6ad46962f14082ea06e29b77ba5a6b8d1bb2d24bfcb91367485d70`
MD5	`8ba63508590c33480f609acfe25196c5`
BLAKE2b-256	`dc9ffc0f4191e19f426b62813151592cb9a9770b57efef869f2a95076371d04e`

See more details on using hashes here.

Provenance

The following attestation bundles were made for gemfilter-0.2.2-py3-none-any.whl:

Publisher: publish.yml on liangzid/GemFilter

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: gemfilter-0.2.2-py3-none-any.whl
- Subject digest: a50b863aee6ad46962f14082ea06e29b77ba5a6b8d1bb2d24bfcb91367485d70
- Sigstore transparency entry: 1619436613
- Sigstore integration time: May 24, 2026
Source repository:
- Permalink: liangzid/GemFilter@1939bc8fdbf221f312fb93575a7c4f06591d9970
- Branch / Tag: refs/tags/v0.2.2
- Owner: https://github.com/liangzid
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@1939bc8fdbf221f312fb93575a7c4f06591d9970
- Trigger Event: release

gemfilter 0.2.2

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

GemFilter

Overview

What It Protects

Agent Privacy Boundary

Masking Modes

Quick Start

Install

CLI

Python SDK

Agent Hook API

Interfaces

HTTP Server

Agent Integrations

Copy-paste Agent Setup Prompt

Claude Code

OpenCode

Codex / MCP

Check All Agents

Configuration

Custom Rules

Built-in Rules

Design Principles

Development

License

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance