Policy-driven content scanning and redaction for public publishing and agent output.

These details have not been verified by PyPI

Project description

Content Guard banner

Content Guard

Policy-driven scanning and redaction for public content, publishing pipelines, and agent output.

Python 3.11+ Apache-2.0 license Zero required third-party dependencies Optional OPF backend Markdown aware

Content Guard keeps private infrastructure, secrets, and personal context out of public surfaces before they ship. It is built for Markdown docs, PR bodies, social drafts, generated agent output, and automation pipelines where one sloppy paste can leak more than intended.

It takes the practical parts of the local content scrubber and the useful model-backed idea behind Privacy Filter, then turns them into one maintainable system.

What It Checks

Deterministic rules for infrastructure, secrets, and high-confidence patterns
Optional OPF backend for model-based PII review and redaction
Custom policy files for private names, internal projects, unreleased plans, and environment-specific rules
Blocking, warning, redaction, and allow decisions from one report format
Markdown-aware scanning with frontmatter and allow-comment support

The core package has no required third-party dependencies. OPF is optional and runs through its CLI when available.

Quick Start

Install from a local clone:

python -m pip install -e .

Scan or redact a file:

content-guard scan examples/sample.md --policy policies/public-content.json
content-guard redact examples/sample.md --policy policies/public-content.json
content-guard scan examples/sample.md --json
content-guard scan examples/ --policy policies/public-content.json

Use OPF if it is installed locally:

content-guard redact examples/sample.md --opf

By default, --opf looks for ~/.opf-venv/bin/opf. Override it with:

CONTENT_GUARD_OPF_BIN=/path/to/opf content-guard scan file.md --opf

OPF can also be enabled from a policy file:

{
  "backends": {
    "opf": {
      "enabled": true,
      "action": "warn",
      "device": "cpu"
    }
  }
}

Policies

Policies are JSON so the project stays dependency-free. A policy can set default actions by category, override individual rules, and add private custom regex rules.

{
  "name": "public-content",
  "defaults": {
    "infrastructure": "block",
    "secret": "block",
    "pii": "warn"
  },
  "rules": {
    "email": "warn"
  },
  "custom_rules": [
    {
      "id": "internal-hostname-example",
      "category": "infrastructure",
      "pattern": "\\\\binternal-host\\\\b",
      "replacement": "[redacted-host]"
    }
  ]
}

Actions:

block: fail the scan, usually for publish gates
redact: rewrite matching content
warn: report without failing
allow: ignore matching findings

Bundled Policies

Two bundled policies share the infrastructure category but treat it differently on purpose:

policies/public-repo.json: for technical docs repos. It keeps private-ipv4 (RFC 1918), secrets, PII, and Co-authored-by trailers as hard blocks, but downgrades loopback-ipv4 (127.x), localhost-port, localhost-bare, and port-reference to warnings. README and CONTRIBUTING files often need to discuss localhost, named ports, and 127.0.0.1 for setup instructions. See policies/public-repo.md for the long-form rationale.
policies/public-content.json: for blog posts and social drafts. It keeps the full infrastructure category at block because marketing surfaces have a higher leak risk and should not expose internal addresses or named ports.

Allow Comments

Use a local allow comment on the same line or directly above a line:

<!-- content-guard: allow localhost-bare -->
This tutorial uses localhost as an example.

Use content-guard: allow all sparingly for examples where every finding is intentional.

PR and Git Guards

PR bodies and public repository content are publishing boundaries too. Use stricter policies before copying generated summaries, dogfood notes, local test output, fixtures, or docs into public GitHub surfaces:

content-guard scan examples/pr-body.md --policy policies/pr-draft.json
content-guard diff examples/pr-body.md --policy policies/pr-draft.json
content-guard-pr examples/pr-body.md
content-guard-pr-prepare examples/pr-body.md --json
content-guard-publish-check --pr-body examples/pr-body.md --json
content-guard-n8n-advisory < payload.json
content-guard-n8n-validate --json
content-guard-git --policy policies/public-repo.json
content-guard-git --all-tracked --policy policies/public-repo.json
content-guard-commits --range origin/main..HEAD --policy policies/public-repo.json

See docs/PR_DRAFTS.md and docs/GIT_PUBLIC_REPO_GUARD.md.

Use content-guard-publish-check as the practical local pre-publish wrapper. It prepares a sanitized PR body when --pr-body is provided, scans staged files, scans commit messages, and can optionally scan all tracked files:

content-guard-publish-check --pr-body pr-body.md --json
content-guard-publish-check --pr-body pr-body.md --all-tracked

PR body findings are advisory by default because the wrapper writes a sanitized body and prints publish_body_file. Staged file, commit message, and optional all-tracked blockers fail the command unless --advisory-only is set.

Use content-guard-pr-prepare when a later PR publishing step needs a stable sanitized body path:

content-guard-pr-prepare pr-body.md
gh pr create --body-file .content-guard/pr-drafts/pr-body.public.md

For local run-alongside testing against the legacy scrubber, see docs/DOGFOOD_TEST_REPO.md.

For n8n publish workflows, start with an advisory step that reports findings without mutating live publishes. See docs/N8N_ADVISORY.md and docs/N8N_WORKFLOW_RECIPE.md. Validate cloned workflow wiring with docs/N8N_VALIDATION_PACK.md.

OpenClaw Plugin

Content Guard can also run as an OpenClaw outbound message plugin. The plugin lives in openclaw-plugin/ and shells out to the same Python engine, so OpenClaw messages use the same policy model as publish gates.

See docs/OPENCLAW_PLUGIN.md.

Design Notes

Privacy Filter influenced the optional model-backed PII layer, especially the idea that some personal data detection benefits from context. Content Guard does not copy Privacy Filter code. OPF integration is a subprocess adapter so the deterministic engine remains portable and maintainable.

The deterministic rules are intentionally conservative. Public publishing should fail loudly on infrastructure and secret leakage, while model findings are better treated as review signals until a local policy proves they are reliable enough to block.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.1

Apr 28, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

content_guard-0.1.1.tar.gz (27.3 kB view details)

Uploaded Apr 28, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

content_guard-0.1.1-py3-none-any.whl (28.7 kB view details)

Uploaded Apr 28, 2026 Python 3

File details

Details for the file content_guard-0.1.1.tar.gz.

File metadata

Download URL: content_guard-0.1.1.tar.gz
Upload date: Apr 28, 2026
Size: 27.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for content_guard-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`0980a38118b4a5553e74e00d789aa11c37d289f23c87ea9ec9c4023bdbd97427`
MD5	`29911758d5477c3e736cbbc31dd00c66`
BLAKE2b-256	`2128fa3058805cd117f234db6b33b6b965fca4a94fbf63d19a567ccb02818547`

See more details on using hashes here.

File details

Details for the file content_guard-0.1.1-py3-none-any.whl.

File metadata

Download URL: content_guard-0.1.1-py3-none-any.whl
Upload date: Apr 28, 2026
Size: 28.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for content_guard-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`2b67a2d05df1245b7ba08c3375ce79076f91ce220d890b2631a872ddd8c79bdd`
MD5	`b2d627596d5f28a28cf07b7429abca02`
BLAKE2b-256	`4aa03a733f22dfefdf029df50867fb9e46c75944aab9300f56247f8e5f24f6ca`

See more details on using hashes here.

content-guard 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

Content Guard

What It Checks

Quick Start

Policies

Bundled Policies

Allow Comments

PR and Git Guards

OpenClaw Plugin

Design Notes

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes