Skip to main content

Source-control automation -- audit, secret-scan, and remediate Git workspaces

Project description

source-control-automation v3 (Python rewrite)

v3 tests coverage 58% release safe to run

[LOCKED] Safe to run. Bare sca is read-only. It walks your code root, writes JSON + an HTML report to its own output dir, and opens the HTML in your browser -- that's it. The pipeline produces a plan of what would-need-fixing but never executes it. After the report opens, you'll get a y/N prompt to apply the plan; type n (or just Enter) to skip and review first. Destructive operations always require an explicit --apply flag.

Cross-platform Python rewrite of v1's PowerShell framework, with a smaller surface and a few capabilities v1 was missing.

Quick start

cd ~/code        # or wherever your repos live
sca              # produces a report, opens it, asks before fixing anything

That's the whole onboarding. The same sca command works on Linux, macOS, and Windows. On Windows specifically, the auto-open uses explorer.exe <path> so the browser launches in your interactive user session even when sca was started from an elevated PowerShell.

Why v3 exists

v1 (the rest of this repo) is a thorough PowerShell framework -- five numbered orchestration scripts, a Pester suite, ~25 specialized fix scripts at the root, gitignore template library. Real value, real coverage. But:

  • Windows / PowerShell only -- won't run on a Linux dev box or a CI runner that defaults to bash.
  • Surface area -- 128 files for what's logically a 5-stage workflow. The Fix-RemainingIssues.ps1-style scripts are one-off remediations that crept into the tree.
  • Doesn't notice "wrapper repo wrapping nested repos" -- the pattern where a parent .git tracks paths that have their own .git directories. We hit this on the original C:\code\.git and had to extract via git subtree split by hand.
  • No secret scanning during audit -- v1 looks for "sensitive file extensions" (.pfx, .env) but not for content patterns like Obfuscated* config + decode function in same repo, leaked PATs in committed scripts, base64+XOR secrets, etc. We found three leaked PATs and a leaked Azure cert by hand during one audit pass -- those should be detected by the tool.
  • HEAD-only branch view -- only inspects the current branch; can't tell you that you have 8 unpushed feature branches or that main is 63 commits behind your active branch.
  • The tool has its own leaked PAT -- Fix-RemoteUrls.ps1 line 3 hardcodes a github_pat_*. A tool that standardizes source control should not be the place this happens.

v3 is the smaller, opinionated rewrite that keeps v1's good ideas and adds the missing pieces.

Architecture

v3/
+-- README.md              (this file)
+-- pyproject.toml         (sca package metadata)
+-- sca/
|   +-- __init__.py
|   +-- cli.py             (entry: `sca audit`, `sca scan`, `sca branches`, `sca render`)
|   +-- audit.py           (walk tree -> classify repos/orphans/loose files -> JSON)
|   +-- branches.py        (per-repo branch audit: unpushed, main-behind, diverged)
|   +-- secrets.py         (NEW: pattern-scan for leaked PATs, XOR-obfuscated configs, etc.)
|   +-- render.py          (JSON -> single-file HTML report)
|   +-- classify.py        (port v1's 5-state model; repo strategy decision)
|   +-- remediate.py       (port v1's state-based fixes; backup-before-modify)
|   +-- extract.py         (NEW: detect + fix wrapper-repo-wrapping-nested-repos pattern)
|   +-- runner_broker.py   (NEW: JIT self-hosted runner provisioning -- see RUNNER_BROKER.md)
+-- templates/
|   +-- gitignore/         (port v1's library: dotnet, python, node, powershell, etc.)
+-- tests/
    +-- ...                (pytest port of v1's Pester suite)

What carries over from v1

v1 idea v3 home
5-state classification (NoSC / LocalGitOnly / Incomplete / PartialSync / Compliant) sca/classify.py
Dedicated-vs-consolidated repo strategy decision (file count, .sln/.csproj presence) sca/classify.py
.gitignore template library by project type templates/gitignore/
Backup-before-modify (zip the project before destructive ops) sca/remediate.py
Per-state remediation workflows (init / push / commit / sync) sca/remediate.py

What carries over from v2 (the audit scripts written 2026-04-30)

v2 script v3 home
audit-code.py (walk + classify into JSON) sca/audit.py
branch-audit.py (per-branch flags) sca/branches.py
render-audit.py (JSON -> HTML) sca/render.py

Plus: cross-platform support (--root / CODE_ROOT env var) so the same code runs on Linux and Windows.

What's NEW in v3 (not in v1 or v2)

  1. Secret scanning during audit -- sca/secrets.py scans every committed file for:
    • Hardcoded github_pat_..., ghp_..., sk-..., AWS access keys
    • Obfuscated* field names paired with a Get-DecryptedValue (or similar) decode function in same repo (XOR/base64 antipattern)
    • Embedded PFX/PEM blobs
    • SharePoint URLs, tenant/client/list UUIDs that look like real values
    • Any password\s*[:=] patterns inside config files
  2. Wrapper-repo detection + extraction -- sca/extract.py detects the "parent .git wraps child repos that have their own .git" pattern and offers a clean extraction (the git subtree split workflow we did manually).
  3. Visibility-before-push gate -- every git push is preceded by a quick check: is the remote repo public? If yes, run secrets.py before the push. Stops the 68-minute-public-exposure problem we hit with ResetIntuneEnrollment.
  4. Branch-level audit -- already had this in v2; v1 was HEAD-only. Surfaces unpushed branches and main-behind situations as first-class report items.
  5. JIT runner broker -- sca runner-broker polls every repo for queued Linux CI jobs and spawns ephemeral self-hosted runners on demand, so a billing-blocked free tier still gets CI on runs-on: [self-hosted, Linux]. Deployed as a systemd service on dev1. See RUNNER_BROKER.md -- it's the single point of failure for CI across the workspace.

Out of scope for v3

  • Continuous monitoring / dashboard -- v1's Mode 4. The audit is one-shot. If you want recurring runs, schedule via cron / Task Scheduler.
  • Cross-platform support beyond Windows + Linux. macOS should work but isn't tested.
  • Non-GitHub forges (GitLab, Bitbucket, Gitea). v1 had aspirations here; v3 is GitHub-only by design.

Status / progress (this branch)

Module Status Notes
sca/audit.py [DONE] ported from v2 walk + classify dirs -> JSON; cross-platform (--root)
sca/branches.py [DONE] ported from v2 per-repo branch state; unpushed / behind / diverged
sca/render.py [DONE] ported from v2 JSON -> single-file HTML report
sca/secrets.py [DONE] NEW live token regex, XOR-obfuscation pair detector, real-looking GUIDs, SharePoint URLs
sca/classify.py [DONE] NEW 5-state model + 3 extra states v1 didn't have (Empty, LooseFile, WrapperRepo) + dedicated/consolidated decision
sca/extract.py [DONE] NEW wrapper-repo detection + repair (subtree-split / archive-and-delete)
sca/templates.py + templates/gitignore/ [DONE] NEW stack detection (python/node/dotnet/powershell) -> curated gitignore
sca/remediate.py [DONE] NEW per-state plan + executor; backup-before-modify; visibility-before-push gate
sca/cli.py [DONE] NEW sca audit | branches | scan | classify | extract | gitignore | remediate | render | runner-broker
sca/runner_broker.py [DONE] NEW JIT ephemeral runner provisioning; systemd service on dev1; see RUNNER_BROKER.md
tests/ [DONE] NEW 35 pytest tests, all green
pyproject.toml [DONE] pip install -e v3

Smoke runs on C:\code (real workspace):

  • audit + classify: 24 entries -> 20 FullyCompliant, 2 WrapperRepo, 1 IncompleteSourceControl, 1 LooseFile
  • secrets scan on this very repo: caught 16 hardcoded GitHub PATs in v1's root remediation scripts (subsequently rotated and redacted)
  • extract: identified UnicodeReplacementTool/ as the one wrapper-repo situation in the workspace

Open work

  • The visibility-before-push gate in remediate.py calls gh api to determine repo visibility -- that's a network dependency. A --offline mode that errs on the side of "treat as public" would be safer for CI use.
  • sca extract --archive-and-delete works but the subtree-split path needs an integration test against a real wrapper repo (currently only unit-tested via extract.detect).
  • sca render still writes to C:/code/temp/audit-report.html by default -- that path needs to be parameterized or written next to the input JSON.

How to use it

pip install -e v3                       # editable install puts `sca` on PATH

sca audit --root ~/code                 # walks tree, prints JSON
sca audit --root ~/code | sca classify --summary
sca scan ~/code/some-repo               # secret scan one repo
sca extract --root ~/code               # find wrapper-repo situations
sca gitignore ~/code/some-dir --write   # write a stack-aware .gitignore
sca audit --root ~/code | sca remediate --plan       # dry-run plan
sca audit --root ~/code | sca remediate --apply      # actually run it

sca ci --root ~/code --summary          # audit which baseline gates each repo has
sca ci --root ~/code/some-repo --install # install missing gates (idempotent, content-aware)
sca ci --install --baseline semgrep      # add the opt-in SAST gate to cwd's repo

sca runner-broker --once                # one poll: provision runners for queued CI, then exit
sca runner-broker --poll 120            # daemon mode (how it's deployed on dev1)

Set CODE_ROOT to skip --root everywhere.

For the JIT CI runner broker (sca runner-broker) -- what it is, how it's deployed as a systemd service on dev1, and how to troubleshoot stuck CI jobs -- see RUNNER_BROKER.md.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aollivierre_sca-0.2.8.tar.gz (155.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

aollivierre_sca-0.2.8-py3-none-any.whl (129.6 kB view details)

Uploaded Python 3

File details

Details for the file aollivierre_sca-0.2.8.tar.gz.

File metadata

  • Download URL: aollivierre_sca-0.2.8.tar.gz
  • Upload date:
  • Size: 155.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for aollivierre_sca-0.2.8.tar.gz
Algorithm Hash digest
SHA256 afba1a571695c341a93b5ef57744944c205f60294eb5d17ee097b0317efe07e4
MD5 3e17ef936d522042b54118f3e7148a75
BLAKE2b-256 eea6ea7c2e13a7f445d8af6443d0cec3d212eb8e482b33fcf576273ecb62f2b6

See more details on using hashes here.

Provenance

The following attestation bundles were made for aollivierre_sca-0.2.8.tar.gz:

Publisher: release.yml on aollivierre/source-control-automation

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file aollivierre_sca-0.2.8-py3-none-any.whl.

File metadata

  • Download URL: aollivierre_sca-0.2.8-py3-none-any.whl
  • Upload date:
  • Size: 129.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for aollivierre_sca-0.2.8-py3-none-any.whl
Algorithm Hash digest
SHA256 be7de4eada922df61cd3aedcd241eb25b1d9d170117993366ac94aa132cfa3d4
MD5 89a875a27962317792c5de6b202c0cc6
BLAKE2b-256 3264a07f3f6e786c941c9322527afa6154c39c3e7db97b5a56b47bd787897f18

See more details on using hashes here.

Provenance

The following attestation bundles were made for aollivierre_sca-0.2.8-py3-none-any.whl:

Publisher: release.yml on aollivierre/source-control-automation

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page