Skip to main content

CI Gate 01 — PII and API key scanner for Spanforge compliance pipelines

Project description

spanforge-secrets

CI Gate 01 for the Spanforge compliance pipeline. Scans prompt files, training data, and arbitrary text/JSON for 10 PII entity types and 5 exposed API key formats. Exits 1 if any violation is found. Outputs structured JSON with hit details, file path, and sensitivity level.

This is a reference implementation built on top of the spanforge framework.


Quick install

pip install spanforge-secrets spanforge

Quick scan

# Scan files
spanforge-secrets scan prompts/ data/training.jsonl

# Scan from stdin
echo "contact ceo@corp.com" | spanforge-secrets scan --stdin

# Verify an HMAC audit-chain log
spanforge-secrets verify-chain audit.jsonl --secret "$HMAC_SECRET"

Exit codes: 0 clean · 1 violations found · 2 usage error · 3 I/O error


Documentation

Document Description
docs/installation.md Requirements, install commands, dev setup
docs/quickstart.md Scan your first file in 2 minutes
docs/tutorial.md Step-by-step walkthrough of every feature
docs/cli-reference.md All flags, sub-commands, and exit codes
docs/api-reference.md Python API — scan_payload(), scan_text(), verify_chain_file()
docs/entity-types.md All 15 detectable entity types with examples
docs/ci-integration.md GitHub Actions, GitLab CI, pre-commit hooks
docs/verify-chain.md HMAC audit-chain verification guide
docs/ignore-patterns.md .spanforge-secretsignore file format
docs/contributing.md Development workflow and code standards
docs/changelog.md Version history

Detected entity types

PII (10 types)

Entity type Sensitivity Validator
email medium regex
phone medium regex
ssn high regex + SSA validation
credit_card high regex + Luhn
ip_address low regex
uk_national_insurance low regex
aadhaar high regex + Verhoeff
pan high regex
date_of_birth medium regex + calendar check
address medium regex

API Keys (5 platforms)

Entity type Sensitivity Pattern
openai_api_key high sk-... / sk-proj-...
anthropic_api_key high sk-ant-...
aws_access_key_id high AKIA... / ASIA...
aws_secret_access_key high context-sensitive 40-char key
gcp_service_account_key high JSON private key marker

CLI reference

# Scan one or more files (.txt, .json, .jsonl supported)
spanforge-secrets scan path/to/prompt.txt training_data.jsonl

# Scan a directory recursively
spanforge-secrets scan data/

# Read from stdin
echo "contact ceo@corp.com" | spanforge-secrets scan --stdin

# SARIF output for GitHub Advanced Security
spanforge-secrets scan data/ --format sarif > results.sarif

# Scan only staged git changes (pre-commit hook)
spanforge-secrets scan --diff

# Exclude files matching patterns
spanforge-secrets scan data/ --ignore-file ci/secrets-ignore.txt

# Verify an HMAC audit-chain log
spanforge-secrets verify-chain audit.jsonl --secret "$HMAC_SECRET"

Exit codes

Code Meaning
0 All inputs clean
1 At least one violation detected
2 Usage / argument error
3 I/O or format error (unreadable file, bad JSON)

JSON output format

{
  "gate": "CI-Gate-01",
  "clean": false,
  "total_violations": 2,
  "results": [
    {
      "source": "prompts/user_prompt.txt",
      "clean": false,
      "violation_count": 2,
      "scanned_strings": 5,
      "hits": [
        {
          "entity_type": "email",
          "path": "<text>",
          "match_count": 1,
          "sensitivity": "medium",
          "category": "pii"
        },
        {
          "entity_type": "openai_api_key",
          "path": "<text>",
          "match_count": 1,
          "sensitivity": "high",
          "category": "api_key"
        }
      ]
    }
  ]
}

Matched values are never included — only type, path, count, and sensitivity level.


Python API

from spanforge_secrets import scan_payload, scan_text

# Scan a dict payload (parsed training JSONL, config files, etc.)
result = scan_payload({"user": {"email": "alice@example.com"}})
if not result.clean:
    for hit in result.hits:
        print(hit.entity_type, hit.path, hit.sensitivity, hit.category)

# Scan raw text (prompt files, arbitrary strings)
result = scan_text(open("prompt.txt").read(), source="prompt.txt")
print(result.clean, result.violation_count)

# Add custom patterns
import re
result = scan_text(
    "Assigned to EMP-001234.",
    extra_patterns={"employee_id": re.compile(r"\bEMP-\d{6}\b")},
    extra_sensitivity={"employee_id": "medium"},
)

See docs/api-reference.md for the full API including verify_chain_file().


CI integration (GitHub Actions)

- name: Spanforge Secrets Gate
  run: |
    pip install spanforge-secrets spanforge
    spanforge-secrets scan prompts/ data/training.jsonl

The step fails automatically when exit code is 1.

With SARIF upload

- name: Run scan (SARIF)
  run: spanforge-secrets scan prompts/ data/ --format sarif > secrets.sarif
  continue-on-error: true

- name: Upload to GitHub Code Scanning
  uses: github/codeql-action/upload-sarif@v3
  with:
    sarif_file: secrets.sarif

- name: Fail on violations
  run: spanforge-secrets scan prompts/ data/

See docs/ci-integration.md for GitLab CI and pre-commit hook setups.


Pre-commit hook

# .pre-commit-config.yaml
repos:
  - repo: local
    hooks:
      - id: spanforge-secrets
        name: Spanforge Secrets Gate
        language: system
        entry: spanforge-secrets scan --diff
        pass_filenames: false
        stages: [pre-commit]

Platform source

Built on top of:

  • spanforge.redact — all 10 PII patterns, validators (_is_valid_ssn, _is_valid_date), Luhn check, Verhoeff check
  • spanforge.signingverify_chain() for audit-chain verification

Extensions unique to this package: 5 API key patterns, scan_text(), PIIScanHit.category, PIIScanResult.source, file I/O chain verification, and the complete CLI.

This package is a reference implementation of the spanforge framework. spanforge>=2.0.2 is a required runtime dependency.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spanforge_secrets-1.0.0.tar.gz (26.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

spanforge_secrets-1.0.0-py3-none-any.whl (20.1 kB view details)

Uploaded Python 3

File details

Details for the file spanforge_secrets-1.0.0.tar.gz.

File metadata

  • Download URL: spanforge_secrets-1.0.0.tar.gz
  • Upload date:
  • Size: 26.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for spanforge_secrets-1.0.0.tar.gz
Algorithm Hash digest
SHA256 2b85a7632c99889b0c4c0b6cb7c3ae2294df2c4ca71a36e3ad8ee30079dc6232
MD5 d7dd5bfa17f39d90933c186406a3ced7
BLAKE2b-256 a745172208e900173cd5168ddf5ea3f1739854031da642b2a65ecb73c77789ed

See more details on using hashes here.

File details

Details for the file spanforge_secrets-1.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for spanforge_secrets-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 63296646890a61d164e2c3e82085a67ef0e2159e5ea0a929bfb0e50d8da02f84
MD5 76b380ef62d7c0db23ac7d002074bb7e
BLAKE2b-256 213e7919f567c7da41c33c29a710f520a5cf4c6b22673d6de6f5de236b571f18

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page