Skip to main content

High-speed liability scanner for attested vs unattested data using .f33 sidecars.

Project description

fors33-scanner

CI Release PyPI Docker Tag Docker Pulls License

High-speed file integrity and baseline scanner. Walks one or more roots, measures data gravity (bytes), and classifies large files as attested or unattested based on sibling sidecar presence (.f33, .sig, .asc, .sha256, .sha512, .blake3, .md5, .pem). Emits checksum baselines (Hash Filename format), CSV, or JSON for use with fors33-verifier.

Trust model: The scanner is an O(1) discovery and liability mapping tool based on sidecar presence only. It does not validate Ed25519 signatures or cryptographic proof of baselines. For full cryptographic verification, use fors33-verifier.

For machine parsing, see LLM_CONTEXT.md.

Release notes & version history

0.8.1 (2026-05-10)

  • Backward-compatible stats (pre-0.8.0 accounting): --legacy-scanner-stats or FORS33_SCANNER_LEGACY_STATS=1 restores single-file below-threshold skipped_files counting and treats .blake3 siblings as not conferring external attestation coverage (library: legacy_scanner_stats bundles both; below_threshold_single_file_counts_skipped and recognize_blake3_sidecar remain available separately on scan_roots / execute_scan).

0.8.0 (2026-05-10)

  • Single-file parity with the Docker extension: _scan_single_file uses stat(..., follow_symlinks=False), os.scandir sibling discovery with is_file(follow_symlinks=False), skips when the root basename is itself a recognized sidecar suffix, and applies to scan_roots as well as execute_scan.
  • .blake3 is part of _ATT_EXTS so BLAKE3 companions classify as external attestation coverage.
  • Below-threshold single-file paths no longer bump skipped_files (silent skip, extension-style).

0.7.1 (2026-05-10)

  • has_sidecar on unverified samples: ScanStats.add_unverified_sample(..., *, has_sidecar=False) records "has_sidecar": "true" or "false" on each sampled unattested path (same shape as the L3dgr extension) for downstream re-seal UX.
  • Supply chain: Docker images from publish-fors33-scanner attach SBOM and SLSA provenance (sbom: true, provenance: mode=max). Pin by digest for regulated deployments.

0.7.0 (2026-05-01)

  • Single-file scanning roots, sidecar parity (.f33, .sig, .asc, checksum sidecars, etc.), baseline/JSON/JSONL on single-file paths, stricter --strict-audit and zero-byte threshold behavior.

0.6.0 (2026-04-16)

  • hash_core mmap ceilings, cgroup alignment, worker cap 64, default_dpk_worker_count() / FORS33_DPK_MAX_WORKERS.

0.5.0 and 0.4.0

  • --strict-audit, unverified_paths_sample, --tsa-url, JSONL multi-root metadata, --max-exposure, --emit-jsonl, --max-depth. Full text: CHANGELOG.md.

Release model

  • Docker publish is manual via GitHub Actions workflow_dispatch with explicit version (no leading v, e.g. 0.8.1) and push_latest; it does not run automatically on git tags alone. PyPI releases are Maintainer-driven (python -m build, twine upload) unless your org wires otherwise.

Install

pip install fors33-scanner

Usage

Scan the current directory (default root) with a 1 MB threshold:

fors33-scanner --threshold-mb 1.0

Scan multiple roots:

fors33-scanner --root /var/log --root /data/telemetry --threshold-mb 10

Emit JSON instead of human output (for CI, pipelines):

fors33-scanner --root /data --json

Fail CI/CD when exposure breaches policy threshold:

fors33-scanner --root /data --max-exposure 5.0 --json

Throttle hashing workers for shared runners:

fors33-scanner --root /data --workers 2

Stream SIEM-ready JSONL events (records + summary):

fors33-scanner --root /data --emit-jsonl -

Depth-limit traversal (0=root only, 1=root + direct children):

fors33-scanner --root /data --max-depth 1

Strict audit (fail on permission or file-lock errors instead of skipping):

fors33-scanner --root /data --strict-audit

Single-file scanning:

# Scan a single file
fors33-scanner --root /path/to/file.csv

# Scan a single file with baseline generation
fors33-scanner --root /path/to/file.csv --emit-checksums baseline.txt

# Scan a single file with JSON manifest
fors33-scanner --root /path/to/file.csv --emit-json manifest.json

Single-file mode accepts individual file paths in addition to directories, enabling direct scanning of specific files without directory traversal. By default it recognizes all attestation sidecar extensions (.f33, .sig, .asc, .sha256, .sha512, .blake3, .md5, .pem) for parity with directory scanning.

Pre-0.8.0 stats (below-threshold single-file roots bump skipped_files, and .blake3 siblings are not counted as external attestation):

fors33-scanner --root /data --legacy-scanner-stats

Equivalent: set FORS33_SCANNER_LEGACY_STATS=1.

Record TSA endpoint for tooling that reads FORS33_TSA_URL:

fors33-scanner --tsa-url https://tsa.example.com/rfc3161

Worker count: positive --workers wins; otherwise a positive FORS33_WORKERS; otherwise default_dpk_worker_count() (uses cpu_count and optional FORS33_DPK_MAX_WORKERS). Non-positive values mean auto. Hard cap 64.

Large-file hashing uses FORS33_MMAP_MIN_MB / FORS33_MMAP_MAX_MB (defaults 500 / 4000), clamped to cgroup/RAM ceiling on Linux; optional FORS33_MMAP_PSI_SOME_AVG10_MAX disables mmap under memory pressure.

For production Docker or CI, pin a semver image tag or immutable digest instead of relying on :latest alone.

Generate checksum baseline (sha256, sha512, or blake3 per --algo):

fors33-scanner --root /data --emit-checksums fors33_baseline.sha256
fors33-scanner --root /data --algo sha512 --emit-checksums fors33_baseline.sha512

Emit CSV or JSON baseline (compatible with fors33-verifier):

fors33-scanner --root /data --emit-csv fors33_baseline.csv
fors33-scanner --root /data --emit-json fors33_baseline.json

Add compliance exposure text to human output (default is strictly mathematical):

fors33-scanner --root /data --compliance-report

Exit codes

  • 0: successful scan / threshold not breached
  • 1: exposure threshold breach (--max-exposure)
  • 2: invocation/parameter misuse, or --strict-audit I/O access failure
  • 130: user interrupted scan (Ctrl+C)

Output

Default human output (mathematical only):

[FILE COUNT]    : 14,205
[TOTAL BYTES]   : 2.1 TB
[ATTESTED]      : 48 files, 4.1 GB
[UNATTESTED]    : 264 files, 2.1 TB
[ELAPSED]       : 4.20s

Safety and scope

  • Read-only: does not modify files or sidecars.
  • Scan-only: O(1) discovery; baseline generation uses streaming chunked hashing.
  • Excludes common dirs (.git, node_modules, venv, etc). Respects .f33ignore and --ignore-pattern / --exclude-dir.
  • Legal notice prints to stderr on startup so data/JSON streams on stdout remain parse-safe.
  • See DISCLAIMER.md for enterprise legal/regulatory boundaries.

JSONL contract

  • --emit-jsonl PATH emits one flat JSON object per line.
  • Multi-root scans include both root_index and root_path in each scan_record.
  • timestamp represents hash completion time.
  • Final line is scan_summary with aggregate stats and scan parameters.
  • If --emit-jsonl - and --json are both requested, JSONL takes precedence on stdout.
  • unverified_paths_sample entries (JSON/JSONL consumers): each row includes path, status, and has_sidecar ("true" / "false") since 0.7.1 for integration with seal/re-seal workflows.

Requirements

Python 3.9+. Optional blake3 for BLAKE3 hashing. Linux, macOS, Windows.

License

MIT License. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fors33_scanner-0.8.1.tar.gz (20.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fors33_scanner-0.8.1-py3-none-any.whl (20.8 kB view details)

Uploaded Python 3

File details

Details for the file fors33_scanner-0.8.1.tar.gz.

File metadata

  • Download URL: fors33_scanner-0.8.1.tar.gz
  • Upload date:
  • Size: 20.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for fors33_scanner-0.8.1.tar.gz
Algorithm Hash digest
SHA256 c1ea7acb79605eef69d1071e1edccc1c587b878dfc29375cb0931712c283f621
MD5 537927fb1cec89361f4b734a2bb33b15
BLAKE2b-256 cb964e022c25aa2c17f833d4f9f52a5c2cfd73a10f61cd413db361a92ab4ca5a

See more details on using hashes here.

File details

Details for the file fors33_scanner-0.8.1-py3-none-any.whl.

File metadata

  • Download URL: fors33_scanner-0.8.1-py3-none-any.whl
  • Upload date:
  • Size: 20.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for fors33_scanner-0.8.1-py3-none-any.whl
Algorithm Hash digest
SHA256 1d266aa521191c6316d283f89e86da3456cbf32a57ba764adbdcccc1fce4160b
MD5 b25cd4b7753fab27c3ef37c1bf795ffc
BLAKE2b-256 46b6b7bf6925c9b815c564ba55b288abdbd80bfaf15783e7c0919bff4cadf0cb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page