CI Gate 01 — PII and API key scanner for Spanforge compliance pipelines

These details have not been verified by PyPI

Project links

Project description

spanforge-secrets

CI Gate 01 for the Spanforge compliance pipeline. Scans prompt files, training data, and arbitrary text/JSON for 10 PII entity types and 5 exposed API key formats. Exits 1 if any violation is found. Outputs structured JSON with hit details, file path, and sensitivity level.

This is a reference implementation built on top of the spanforge framework.

Quick install

pip install spanforge-secrets spanforge

Quick scan

# Scan files
spanforge-secrets scan prompts/ data/training.jsonl

# Scan from stdin
echo "contact ceo@corp.com" | spanforge-secrets scan --stdin

# Verify an HMAC audit-chain log
spanforge-secrets verify-chain audit.jsonl --secret "$HMAC_SECRET"

Exit codes: 0 clean · 1 violations found · 2 usage error · 3 I/O error

Documentation

Document	Description
docs/installation.md	Requirements, install commands, dev setup
docs/quickstart.md	Scan your first file in 2 minutes
docs/tutorial.md	Step-by-step walkthrough of every feature
docs/cli-reference.md	All flags, sub-commands, and exit codes
docs/api-reference.md	Python API — `scan_payload()`, `scan_text()`, `verify_chain_file()`
docs/entity-types.md	All 15 detectable entity types with examples
docs/ci-integration.md	GitHub Actions, GitLab CI, pre-commit hooks
docs/verify-chain.md	HMAC audit-chain verification guide
docs/ignore-patterns.md	`.spanforge-secretsignore` file format
docs/contributing.md	Development workflow and code standards
docs/changelog.md	Version history

Detected entity types

PII (10 types)

Entity type	Sensitivity	Validator
`email`	medium	regex
`phone`	medium	regex
`ssn`	high	regex + SSA validation
`credit_card`	high	regex + Luhn
`ip_address`	low	regex
`uk_national_insurance`	low	regex
`aadhaar`	high	regex + Verhoeff
`pan`	high	regex
`date_of_birth`	medium	regex + calendar check
`address`	medium	regex

API Keys (5 platforms)

Entity type	Sensitivity	Pattern
`openai_api_key`	high	`sk-...` / `sk-proj-...`
`anthropic_api_key`	high	`sk-ant-...`
`aws_access_key_id`	high	`AKIA...` / `ASIA...`
`aws_secret_access_key`	high	context-sensitive 40-char key
`gcp_service_account_key`	high	JSON private key marker

CLI reference

# Scan one or more files (.txt, .json, .jsonl supported)
spanforge-secrets scan path/to/prompt.txt training_data.jsonl

# Scan a directory recursively
spanforge-secrets scan data/

# Read from stdin
echo "contact ceo@corp.com" | spanforge-secrets scan --stdin

# SARIF output for GitHub Advanced Security
spanforge-secrets scan data/ --format sarif > results.sarif

# Scan only staged git changes (pre-commit hook)
spanforge-secrets scan --diff

# Exclude files matching patterns
spanforge-secrets scan data/ --ignore-file ci/secrets-ignore.txt

# Verify an HMAC audit-chain log
spanforge-secrets verify-chain audit.jsonl --secret "$HMAC_SECRET"

Exit codes

Code	Meaning
`0`	All inputs clean
`1`	At least one violation detected
`2`	Usage / argument error
`3`	I/O or format error (unreadable file, bad JSON)

JSON output format

{
  "gate": "CI-Gate-01",
  "clean": false,
  "total_violations": 2,
  "results": [
    {
      "source": "prompts/user_prompt.txt",
      "clean": false,
      "violation_count": 2,
      "scanned_strings": 5,
      "hits": [
        {
          "entity_type": "email",
          "path": "<text>",
          "match_count": 1,
          "sensitivity": "medium",
          "category": "pii"
        },
        {
          "entity_type": "openai_api_key",
          "path": "<text>",
          "match_count": 1,
          "sensitivity": "high",
          "category": "api_key"
        }
      ]
    }
  ]
}

Matched values are never included — only type, path, count, and sensitivity level.

Python API

from spanforge_secrets import scan_payload, scan_text

# Scan a dict payload (parsed training JSONL, config files, etc.)
result = scan_payload({"user": {"email": "alice@example.com"}})
if not result.clean:
    for hit in result.hits:
        print(hit.entity_type, hit.path, hit.sensitivity, hit.category)

# Scan raw text (prompt files, arbitrary strings)
result = scan_text(open("prompt.txt").read(), source="prompt.txt")
print(result.clean, result.violation_count)

# Add custom patterns
import re
result = scan_text(
    "Assigned to EMP-001234.",
    extra_patterns={"employee_id": re.compile(r"\bEMP-\d{6}\b")},
    extra_sensitivity={"employee_id": "medium"},
)

See docs/api-reference.md for the full API including verify_chain_file().

CI integration (GitHub Actions)

- name: Spanforge Secrets Gate
  run: |
    pip install spanforge-secrets spanforge
    spanforge-secrets scan prompts/ data/training.jsonl

The step fails automatically when exit code is 1.

With SARIF upload

- name: Run scan (SARIF)
  run: spanforge-secrets scan prompts/ data/ --format sarif > secrets.sarif
  continue-on-error: true

- name: Upload to GitHub Code Scanning
  uses: github/codeql-action/upload-sarif@v3
  with:
    sarif_file: secrets.sarif

- name: Fail on violations
  run: spanforge-secrets scan prompts/ data/

See docs/ci-integration.md for GitLab CI and pre-commit hook setups.

Pre-commit hook

# .pre-commit-config.yaml
repos:
  - repo: local
    hooks:
      - id: spanforge-secrets
        name: Spanforge Secrets Gate
        language: system
        entry: spanforge-secrets scan --diff
        pass_filenames: false
        stages: [pre-commit]

Platform source

Built on top of:

spanforge.redact — all 10 PII patterns, validators (_is_valid_ssn, _is_valid_date), Luhn check, Verhoeff check
spanforge.signing — verify_chain() for audit-chain verification

Extensions unique to this package: 5 API key patterns, scan_text(), PIIScanHit.category, PIIScanResult.source, file I/O chain verification, and the complete CLI.

This package is a reference implementation of the spanforge framework. spanforge>=2.0.2 is a required runtime dependency.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.0.0

Apr 15, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spanforge_secrets-1.0.0.tar.gz (26.7 kB view details)

Uploaded Apr 15, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

spanforge_secrets-1.0.0-py3-none-any.whl (20.1 kB view details)

Uploaded Apr 15, 2026 Python 3

File details

Details for the file spanforge_secrets-1.0.0.tar.gz.

File metadata

Download URL: spanforge_secrets-1.0.0.tar.gz
Upload date: Apr 15, 2026
Size: 26.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for spanforge_secrets-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`2b85a7632c99889b0c4c0b6cb7c3ae2294df2c4ca71a36e3ad8ee30079dc6232`
MD5	`d7dd5bfa17f39d90933c186406a3ced7`
BLAKE2b-256	`a745172208e900173cd5168ddf5ea3f1739854031da642b2a65ecb73c77789ed`

See more details on using hashes here.

File details

Details for the file spanforge_secrets-1.0.0-py3-none-any.whl.

File metadata

Download URL: spanforge_secrets-1.0.0-py3-none-any.whl
Upload date: Apr 15, 2026
Size: 20.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for spanforge_secrets-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`63296646890a61d164e2c3e82085a67ef0e2159e5ea0a929bfb0e50d8da02f84`
MD5	`76b380ef62d7c0db23ac7d002074bb7e`
BLAKE2b-256	`213e7919f567c7da41c33c29a710f520a5cf4c6b22673d6de6f5de236b571f18`

See more details on using hashes here.

spanforge-secrets 1.0.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

spanforge-secrets

Quick install

Quick scan

Documentation

Detected entity types

PII (10 types)

API Keys (5 platforms)

CLI reference

Exit codes

JSON output format

Python API

CI integration (GitHub Actions)

With SARIF upload

Pre-commit hook

Platform source

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes