CI Gate 01 — PII and API key scanner for Spanforge compliance pipelines
Project description
spanforge-secrets
CI Gate 01 for the Spanforge compliance pipeline.
Scans prompt files, training data, and arbitrary text/JSON for 10 PII entity types and
5 exposed API key formats. Exits 1 if any violation is found. Outputs structured JSON
with hit details, file path, and sensitivity level.
This is a reference implementation built on top of the spanforge framework.
Quick install
pip install spanforge-secrets spanforge
Quick scan
# Scan files
spanforge-secrets scan prompts/ data/training.jsonl
# Scan from stdin
echo "contact ceo@corp.com" | spanforge-secrets scan --stdin
# Verify an HMAC audit-chain log
spanforge-secrets verify-chain audit.jsonl --secret "$HMAC_SECRET"
Exit codes: 0 clean · 1 violations found · 2 usage error · 3 I/O error
Documentation
| Document | Description |
|---|---|
| docs/installation.md | Requirements, install commands, dev setup |
| docs/quickstart.md | Scan your first file in 2 minutes |
| docs/tutorial.md | Step-by-step walkthrough of every feature |
| docs/cli-reference.md | All flags, sub-commands, and exit codes |
| docs/api-reference.md | Python API — scan_payload(), scan_text(), verify_chain_file() |
| docs/entity-types.md | All 15 detectable entity types with examples |
| docs/ci-integration.md | GitHub Actions, GitLab CI, pre-commit hooks |
| docs/verify-chain.md | HMAC audit-chain verification guide |
| docs/ignore-patterns.md | .spanforge-secretsignore file format |
| docs/contributing.md | Development workflow and code standards |
| docs/changelog.md | Version history |
Detected entity types
PII (10 types)
| Entity type | Sensitivity | Validator |
|---|---|---|
email |
medium | regex |
phone |
medium | regex |
ssn |
high | regex + SSA validation |
credit_card |
high | regex + Luhn |
ip_address |
low | regex |
uk_national_insurance |
low | regex |
aadhaar |
high | regex + Verhoeff |
pan |
high | regex |
date_of_birth |
medium | regex + calendar check |
address |
medium | regex |
API Keys (5 platforms)
| Entity type | Sensitivity | Pattern |
|---|---|---|
openai_api_key |
high | sk-... / sk-proj-... |
anthropic_api_key |
high | sk-ant-... |
aws_access_key_id |
high | AKIA... / ASIA... |
aws_secret_access_key |
high | context-sensitive 40-char key |
gcp_service_account_key |
high | JSON private key marker |
CLI reference
# Scan one or more files (.txt, .json, .jsonl supported)
spanforge-secrets scan path/to/prompt.txt training_data.jsonl
# Scan a directory recursively
spanforge-secrets scan data/
# Read from stdin
echo "contact ceo@corp.com" | spanforge-secrets scan --stdin
# SARIF output for GitHub Advanced Security
spanforge-secrets scan data/ --format sarif > results.sarif
# Scan only staged git changes (pre-commit hook)
spanforge-secrets scan --diff
# Exclude files matching patterns
spanforge-secrets scan data/ --ignore-file ci/secrets-ignore.txt
# Verify an HMAC audit-chain log
spanforge-secrets verify-chain audit.jsonl --secret "$HMAC_SECRET"
Exit codes
| Code | Meaning |
|---|---|
0 |
All inputs clean |
1 |
At least one violation detected |
2 |
Usage / argument error |
3 |
I/O or format error (unreadable file, bad JSON) |
JSON output format
{
"gate": "CI-Gate-01",
"clean": false,
"total_violations": 2,
"results": [
{
"source": "prompts/user_prompt.txt",
"clean": false,
"violation_count": 2,
"scanned_strings": 5,
"hits": [
{
"entity_type": "email",
"path": "<text>",
"match_count": 1,
"sensitivity": "medium",
"category": "pii"
},
{
"entity_type": "openai_api_key",
"path": "<text>",
"match_count": 1,
"sensitivity": "high",
"category": "api_key"
}
]
}
]
}
Matched values are never included — only type, path, count, and sensitivity level.
Python API
from spanforge_secrets import scan_payload, scan_text
# Scan a dict payload (parsed training JSONL, config files, etc.)
result = scan_payload({"user": {"email": "alice@example.com"}})
if not result.clean:
for hit in result.hits:
print(hit.entity_type, hit.path, hit.sensitivity, hit.category)
# Scan raw text (prompt files, arbitrary strings)
result = scan_text(open("prompt.txt").read(), source="prompt.txt")
print(result.clean, result.violation_count)
# Add custom patterns
import re
result = scan_text(
"Assigned to EMP-001234.",
extra_patterns={"employee_id": re.compile(r"\bEMP-\d{6}\b")},
extra_sensitivity={"employee_id": "medium"},
)
See docs/api-reference.md for the full API including verify_chain_file().
CI integration (GitHub Actions)
- name: Spanforge Secrets Gate
run: |
pip install spanforge-secrets spanforge
spanforge-secrets scan prompts/ data/training.jsonl
The step fails automatically when exit code is 1.
With SARIF upload
- name: Run scan (SARIF)
run: spanforge-secrets scan prompts/ data/ --format sarif > secrets.sarif
continue-on-error: true
- name: Upload to GitHub Code Scanning
uses: github/codeql-action/upload-sarif@v3
with:
sarif_file: secrets.sarif
- name: Fail on violations
run: spanforge-secrets scan prompts/ data/
See docs/ci-integration.md for GitLab CI and pre-commit hook setups.
Pre-commit hook
# .pre-commit-config.yaml
repos:
- repo: local
hooks:
- id: spanforge-secrets
name: Spanforge Secrets Gate
language: system
entry: spanforge-secrets scan --diff
pass_filenames: false
stages: [pre-commit]
Platform source
Built on top of:
spanforge.redact— all 10 PII patterns, validators (_is_valid_ssn,_is_valid_date), Luhn check, Verhoeff checkspanforge.signing—verify_chain()for audit-chain verification
Extensions unique to this package: 5 API key patterns, scan_text(),
PIIScanHit.category, PIIScanResult.source, file I/O chain verification,
and the complete CLI.
This package is a reference implementation of the spanforge framework.
spanforge>=2.0.2 is a required runtime dependency.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file spanforge_secrets-1.0.0.tar.gz.
File metadata
- Download URL: spanforge_secrets-1.0.0.tar.gz
- Upload date:
- Size: 26.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2b85a7632c99889b0c4c0b6cb7c3ae2294df2c4ca71a36e3ad8ee30079dc6232
|
|
| MD5 |
d7dd5bfa17f39d90933c186406a3ced7
|
|
| BLAKE2b-256 |
a745172208e900173cd5168ddf5ea3f1739854031da642b2a65ecb73c77789ed
|
File details
Details for the file spanforge_secrets-1.0.0-py3-none-any.whl.
File metadata
- Download URL: spanforge_secrets-1.0.0-py3-none-any.whl
- Upload date:
- Size: 20.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
63296646890a61d164e2c3e82085a67ef0e2159e5ea0a929bfb0e50d8da02f84
|
|
| MD5 |
76b380ef62d7c0db23ac7d002074bb7e
|
|
| BLAKE2b-256 |
213e7919f567c7da41c33c29a710f520a5cf4c6b22673d6de6f5de236b571f18
|