Skip to main content

Source-control automation -- audit, secret-scan, and remediate Git workspaces

Project description

source-control-automation v3 (Python rewrite)

v3 tests coverage 58% release safe to run

[LOCKED] Safe to run. Bare sca is read-only. It walks your code root, writes JSON + an HTML report to its own output dir, and opens the HTML in your browser -- that's it. The pipeline produces a plan of what would-need-fixing but never executes it. After the report opens, you'll get a y/N prompt to apply the plan; type n (or just Enter) to skip and review first. Destructive operations always require an explicit --apply flag.

Cross-platform Python rewrite of v1's PowerShell framework, with a smaller surface and a few capabilities v1 was missing.

Quick start

cd ~/code        # or wherever your repos live
sca              # produces a report, opens it, asks before fixing anything

That's the whole onboarding. The same sca command works on Linux, macOS, and Windows. On Windows specifically, the auto-open uses explorer.exe <path> so the browser launches in your interactive user session even when sca was started from an elevated PowerShell.

Why v3 exists

v1 (the rest of this repo) is a thorough PowerShell framework -- five numbered orchestration scripts, a Pester suite, ~25 specialized fix scripts at the root, gitignore template library. Real value, real coverage. But:

  • Windows / PowerShell only -- won't run on a Linux dev box or a CI runner that defaults to bash.
  • Surface area -- 128 files for what's logically a 5-stage workflow. The Fix-RemainingIssues.ps1-style scripts are one-off remediations that crept into the tree.
  • Doesn't notice "wrapper repo wrapping nested repos" -- the pattern where a parent .git tracks paths that have their own .git directories. We hit this on the original C:\code\.git and had to extract via git subtree split by hand.
  • No secret scanning during audit -- v1 looks for "sensitive file extensions" (.pfx, .env) but not for content patterns like Obfuscated* config + decode function in same repo, leaked PATs in committed scripts, base64+XOR secrets, etc. We found three leaked PATs and a leaked Azure cert by hand during one audit pass -- those should be detected by the tool.
  • HEAD-only branch view -- only inspects the current branch; can't tell you that you have 8 unpushed feature branches or that main is 63 commits behind your active branch.
  • The tool has its own leaked PAT -- Fix-RemoteUrls.ps1 line 3 hardcodes a github_pat_*. A tool that standardizes source control should not be the place this happens.

v3 is the smaller, opinionated rewrite that keeps v1's good ideas and adds the missing pieces.

Architecture

v3/
+-- README.md              (this file)
+-- pyproject.toml         (sca package metadata)
+-- sca/
|   +-- __init__.py
|   +-- cli.py             (entry: `sca audit`, `sca scan`, `sca branches`, `sca render`)
|   +-- audit.py           (walk tree -> classify repos/orphans/loose files -> JSON)
|   +-- branches.py        (per-repo branch audit: unpushed, main-behind, diverged)
|   +-- secrets.py         (NEW: pattern-scan for leaked PATs, XOR-obfuscated configs, etc.)
|   +-- render.py          (JSON -> single-file HTML report)
|   +-- classify.py        (port v1's 5-state model; repo strategy decision)
|   +-- remediate.py       (port v1's state-based fixes; backup-before-modify)
|   +-- extract.py         (NEW: detect + fix wrapper-repo-wrapping-nested-repos pattern)
+-- templates/
|   +-- gitignore/         (port v1's library: dotnet, python, node, powershell, etc.)
+-- tests/
    +-- ...                (pytest port of v1's Pester suite)

What carries over from v1

v1 idea v3 home
5-state classification (NoSC / LocalGitOnly / Incomplete / PartialSync / Compliant) sca/classify.py
Dedicated-vs-consolidated repo strategy decision (file count, .sln/.csproj presence) sca/classify.py
.gitignore template library by project type templates/gitignore/
Backup-before-modify (zip the project before destructive ops) sca/remediate.py
Per-state remediation workflows (init / push / commit / sync) sca/remediate.py

What carries over from v2 (the audit scripts written 2026-04-30)

v2 script v3 home
audit-code.py (walk + classify into JSON) sca/audit.py
branch-audit.py (per-branch flags) sca/branches.py
render-audit.py (JSON -> HTML) sca/render.py

Plus: cross-platform support (--root / CODE_ROOT env var) so the same code runs on Linux and Windows.

What's NEW in v3 (not in v1 or v2)

  1. Secret scanning during audit -- sca/secrets.py scans every committed file for:
    • Hardcoded github_pat_..., ghp_..., sk-..., AWS access keys
    • Obfuscated* field names paired with a Get-DecryptedValue (or similar) decode function in same repo (XOR/base64 antipattern)
    • Embedded PFX/PEM blobs
    • SharePoint URLs, tenant/client/list UUIDs that look like real values
    • Any password\s*[:=] patterns inside config files
  2. Wrapper-repo detection + extraction -- sca/extract.py detects the "parent .git wraps child repos that have their own .git" pattern and offers a clean extraction (the git subtree split workflow we did manually).
  3. Visibility-before-push gate -- every git push is preceded by a quick check: is the remote repo public? If yes, run secrets.py before the push. Stops the 68-minute-public-exposure problem we hit with ResetIntuneEnrollment.
  4. Branch-level audit -- already had this in v2; v1 was HEAD-only. Surfaces unpushed branches and main-behind situations as first-class report items.

Out of scope for v3

  • Continuous monitoring / dashboard -- v1's Mode 4. The audit is one-shot. If you want recurring runs, schedule via cron / Task Scheduler.
  • Cross-platform support beyond Windows + Linux. macOS should work but isn't tested.
  • Non-GitHub forges (GitLab, Bitbucket, Gitea). v1 had aspirations here; v3 is GitHub-only by design.

Status / progress (this branch)

Module Status Notes
sca/audit.py [DONE] ported from v2 walk + classify dirs -> JSON; cross-platform (--root)
sca/branches.py [DONE] ported from v2 per-repo branch state; unpushed / behind / diverged
sca/render.py [DONE] ported from v2 JSON -> single-file HTML report
sca/secrets.py [DONE] NEW live token regex, XOR-obfuscation pair detector, real-looking GUIDs, SharePoint URLs
sca/classify.py [DONE] NEW 5-state model + 3 extra states v1 didn't have (Empty, LooseFile, WrapperRepo) + dedicated/consolidated decision
sca/extract.py [DONE] NEW wrapper-repo detection + repair (subtree-split / archive-and-delete)
sca/templates.py + templates/gitignore/ [DONE] NEW stack detection (python/node/dotnet/powershell) -> curated gitignore
sca/remediate.py [DONE] NEW per-state plan + executor; backup-before-modify; visibility-before-push gate
sca/cli.py [DONE] NEW sca audit | branches | scan | classify | extract | gitignore | remediate | render
tests/ [DONE] NEW 35 pytest tests, all green
pyproject.toml [DONE] pip install -e v3

Smoke runs on C:\code (real workspace):

  • audit + classify: 24 entries -> 20 FullyCompliant, 2 WrapperRepo, 1 IncompleteSourceControl, 1 LooseFile
  • secrets scan on this very repo: caught 16 hardcoded GitHub PATs in v1's root remediation scripts (subsequently rotated and redacted)
  • extract: identified UnicodeReplacementTool/ as the one wrapper-repo situation in the workspace

Open work

  • The visibility-before-push gate in remediate.py calls gh api to determine repo visibility -- that's a network dependency. A --offline mode that errs on the side of "treat as public" would be safer for CI use.
  • sca extract --archive-and-delete works but the subtree-split path needs an integration test against a real wrapper repo (currently only unit-tested via extract.detect).
  • sca render still writes to C:/code/temp/audit-report.html by default -- that path needs to be parameterized or written next to the input JSON.

How to use it

pip install -e v3                       # editable install puts `sca` on PATH

sca audit --root ~/code                 # walks tree, prints JSON
sca audit --root ~/code | sca classify --summary
sca scan ~/code/some-repo               # secret scan one repo
sca extract --root ~/code               # find wrapper-repo situations
sca gitignore ~/code/some-dir --write   # write a stack-aware .gitignore
sca audit --root ~/code | sca remediate --plan       # dry-run plan
sca audit --root ~/code | sca remediate --apply      # actually run it

Set CODE_ROOT to skip --root everywhere.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aollivierre_sca-0.2.2.tar.gz (115.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

aollivierre_sca-0.2.2-py3-none-any.whl (102.5 kB view details)

Uploaded Python 3

File details

Details for the file aollivierre_sca-0.2.2.tar.gz.

File metadata

  • Download URL: aollivierre_sca-0.2.2.tar.gz
  • Upload date:
  • Size: 115.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for aollivierre_sca-0.2.2.tar.gz
Algorithm Hash digest
SHA256 9e799ff73810f8625481dc59969b8af64de7bdbe9ca2049e616ef0e87608450c
MD5 43d2c12d3aaefafe2c521d25eddb3d8c
BLAKE2b-256 0cf30ff833d7333ef37bc52f0e75ee4267e3da72d4c92027fbb547d3d390adc8

See more details on using hashes here.

Provenance

The following attestation bundles were made for aollivierre_sca-0.2.2.tar.gz:

Publisher: release.yml on aollivierre/source-control-automation

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file aollivierre_sca-0.2.2-py3-none-any.whl.

File metadata

  • Download URL: aollivierre_sca-0.2.2-py3-none-any.whl
  • Upload date:
  • Size: 102.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for aollivierre_sca-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 86d728b9857c46c82fcc2991a1e3c94e91b73359b751b12f22d42fa036b964d4
MD5 17a8a300fd8cdfc2325f68f7323347c3
BLAKE2b-256 d3b2155ef49ce7b2465abf5b06599d28c086305e4b22b67ad2a625e9ebe47a20

See more details on using hashes here.

Provenance

The following attestation bundles were made for aollivierre_sca-0.2.2-py3-none-any.whl:

Publisher: release.yml on aollivierre/source-control-automation

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page