Skip to main content

Detect file coupling risk in pull requests from git co-change history.

Project description

couplingguard — Detect file coupling risk in PRs from git co-change history

coupling badge CI status MIT license coupling cheatsheet

Status: v0.1.0 — first release. The Meru143/couplingguard@v1 tag and the couplingguard PyPI package both ship after the first tagged release lands. Until then, pin to a commit SHA or install from source.

Two files that always break together still ship in the same PR with no one looking at both sides. Your git log has known about this pairing for months. couplingguard surfaces it as a comment on every PR — before the bug, not after the post-mortem.

A free GitHub Action (and GitLab CI integration, and Python CLI) that walks 90 days of git history, builds a normalized co-change matrix, filters to pairs touching your PR's changed files, and posts a collapsible markdown comment with risk badges. Optionally fails CI above a configurable coupling threshold. Suggests reviewers from CODEOWNERS for the coupled files. Edits itself in place on re-push with a 🟡 0.45 → 🔴 0.82 delta line.

Animated walkthrough: git log → co-change matrix → rendered PR comment

📺 Want an MP4 or GIF of the demo? A Remotion project lives at demo/remotion/npm install && npm run build produces a real video. The animated SVG above is the equivalent for inline rendering.

Install in 5 lines

name: Coupling Guard
on:
  pull_request:
    types: [opened, synchronize, reopened]

permissions:
  contents: read
  pull-requests: write

jobs:
  coupling:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0     # required: couplingguard needs the full git log
      - uses: Meru143/couplingguard@v1
        with:
          github_token: ${{ github.token }}

What the PR comment looks like

Real output from running couplingguard against its own repository (synthetic PR over 10 commits of real history, captured from tests/e2e/test_dogfood.py):

🔍 couplingguard — 6 pairs detected, highest risk: 🔴 1.00

File in PR Coupled With Score Risk Co-changes
pyproject.toml skills-lock.json 1.00 🔴 High 2/2 commits
tests/integration/test_github_poster.py tests/integration/test_gitlab_poster.py 1.00 🔴 High 2/2 commits
.gitignore pyproject.toml 0.67 🟡 Medium 2/3 commits
.gitignore skills-lock.json 0.67 🟡 Medium 2/3 commits

Note the paired integration test files at 1.00 — test_github_poster.py and test_gitlab_poster.py always land in the same commit because they cover mirror-image functionality. A reviewer looking only at the GitHub test file would benefit from knowing the GitLab one almost certainly changed too.

Illustrative example showing the score-delta line on re-push and CODEOWNERS-based reviewer suggestions (the names are placeholders — the real action only suggests usernames that actually appear in your CODEOWNERS file):

🔍 couplingguard — 2 pairs detected, highest risk: 🔴 0.82

⚠️ Score changed since last push: 🟡 0.45 → 🔴 0.82 ↑

File in PR Coupled With Score Risk Co-changes
src/payment.py src/billing.py 0.82 🔴 High 41/50 commits
src/payment.py tests/test_billing.py 0.64 🟡 Medium 32/50 commits

Suggested reviewers for coupled files: @alice, @team-payments

The comment is collapsible (<details>-wrapped) and edits itself on every push to the PR with a "score changed" line showing the delta.

Inputs

Input Type Default Description
github_token string ${{ github.token }} Token for PR comment + check
gitlab_token string "" Personal access token for GitLab CI
lookback_days number 90 Days of history to analyze
min_occurrences number 3 Minimum co-change count to include a pair
max_pairs number 10 Maximum pairs shown in the comment
low_threshold number 0.3 Score boundary 🟢 → 🟡
high_threshold number 0.7 Score boundary 🟡 → 🔴
fail_threshold string "" low/medium/high to fail CI; empty disables
exclude string "" Newline-separated glob patterns
publish_dashboard boolean false Generate static dashboard + history + badge artifact
dry_run boolean false Print comment to stdout; don't post

How it works

flowchart TD
    A[git log<br/>lookback_days, no-merges] --> B[co-change matrix<br/>file pairs × commit count]
    B --> C[normalize<br/>score = co_count / max&#40;count_a, count_b&#41;]
    C --> D{filter by<br/>min_occurrences}
    D --> E[PR analyzer<br/>keep pairs touching PR files]
    E --> F[classify risk<br/>🟢 &lt; 0.3 ≤ 🟡 &lt; 0.7 ≤ 🔴]
    F --> G[CODEOWNERS lookup<br/>suggest reviewers]
    G --> H[render markdown<br/>+ hidden JSON marker]
    H --> I[find existing<br/>PR comment by marker]
    I -->|exists| J[edit in place<br/>with delta line]
    I -->|new| K[create issue comment]
    J --> L[fail_threshold check<br/>exit 0 / 1]
    K --> L
    L --> M{publish_dashboard?}
    M -->|yes| N[append history JSON<br/>+ Chart.js HTML<br/>+ shields.io badge]
    M -->|no| O[done]
    N --> O

    style A fill:#fef3c7,stroke:#f59e0b,color:#000
    style C fill:#dbeafe,stroke:#3b82f6,color:#000
    style F fill:#fce7f3,stroke:#ec4899,color:#000
    style L fill:#dcfce7,stroke:#16a34a,color:#000

The key insight is normalization: raw co-change counts inflate for old / large files, while co_count / max(count_a, count_b) produces a 0–1 ratio that's comparable across repos of any size and age.

Local CLI

After v0.1.0 ships on PyPI:

pip install couplingguard
couplingguard --repo . --dry-run --lookback-days 90

Pre-release (install from source):

pip install git+https://github.com/Meru143/couplingguard.git@main
couplingguard --repo . --dry-run --lookback-days 90

The CLI uses the same code as the Action; --dry-run prints the rendered comment to stdout without trying to reach GitHub.

GitLab CI

coupling:
  image: python:3.11
  variables:
    GIT_DEPTH: "0"                     # required: GitLab clones shallow by default
    GITLAB_TOKEN: ${GITLAB_TOKEN}
  script:
    - pip install couplingguard
    - couplingguard --repo .
  only:
    - merge_requests

CI_SERVER_URL, CI_PROJECT_ID, and CI_MERGE_REQUEST_IID are auto-set by every GitLab Runner. GITLAB_TOKEN should be a project access token with the api scope, stored as a masked CI/CD variable.

Permissions

For GitHub Actions, couplingguard needs:

  • contents: read to read the git history.
  • pull-requests: write to post / edit the comment.

For GitLab CI, the GITLAB_TOKEN needs api scope on the project.

When publish_dashboard: true, the action writes coupling-history.json, coupling-dashboard.html, and coupling-score.json to the workspace and uploads them as a GitHub Actions artifact. Nothing is committed back to your repo unless you add an explicit git commit && git push step yourself.

FAQ

Why fetch-depth: 0? Default actions/checkout@v4 does a shallow clone (depth=1). couplingguard needs the full log to count co-changes. If you forget, the action exits 1 with an actionable error rather than producing wrong results.

What is normalization? A pair where a.py was touched 100 times, b.py 5 times, and both together 5 times is not the same as a pair where both were touched 5 times each. Raw count = 5 in both cases. Normalized: 5/100 = 0.05 vs 5/5 = 1.00. The second pair is genuinely coupled; the first is noise.

Does this work on monorepos? Yes. Use exclude to drop noisy paths (docs, migrations) and bump min_occurrences to filter rare pairs. The matrix is built once per run and scales linearly with lookback_days × avg_files_per_commit.

What if my repo has fewer than min_occurrences commits? The action posts an informational comment and exits 0 — no false failures on new repos.

Differentiators

  • vs CodeScene — Free and open source; runs entirely in your CI with no external service. CodeScene is a commercial product with per-seat pricing.
  • vs code-maat — code-maat is a Clojure CLI for post-hoc analysis: you run it against a checked-out repo and read CSV. couplingguard runs at PR time, produces normalized scores, and posts directly to the PR.
  • vs Danger.js — Danger is a framework where you write the analysis rules yourself. couplingguard is a zero-config drop-in.
  • vs CODEOWNERS — Static ownership vs dynamic co-change. Complementary: couplingguard uses CODEOWNERS to suggest better reviewers for the files historically coupled to your PR's files.

Limitations

Known constraints in v0.1.0:

  • Shallow clones are rejected. Detected and surfaced as error E001 with an actionable message. Add fetch-depth: 0 (GitHub) or GIT_DEPTH: "0" (GitLab).
  • PR file cap at 200. PRs touching more than 200 files are truncated with a warning. The pairs analysis is O(200 × matrix_size), so this is a deliberate ceiling.
  • No auto-commit of dashboard files. publish_dashboard: true produces an artifact; pushing the score JSON back to main for badge updates is on the v0.2 roadmap.
  • GitLab self-managed not officially tested. Should work via CI_SERVER_URL but only tested against gitlab.com.
  • Bitbucket / Azure DevOps — not supported in v0.1.0.

Coupling cheatsheet

A one-page reference covering the score formula, risk thresholds, common couplings to look for, and recommended tuning per repo type (solo, small team, monorepo, mature OSS library).

Demo assets

  • Static SVGs (hero banner + animated walkthrough) live in assets/ and are embedded at the top of this README.
  • For MP4 / GIF renders, a Remotion project lives in demo/remotion/npm install && npm run build produces a 1080p video.

Contributing

See CONTRIBUTING.md. Bugs → Issues. Security → SECURITY.md.

Publishing to the GitHub Marketplace

GitHub Marketplace categories and featured tags are configured in the GitHub web UI, not in action.yml. After tagging a release:

  1. Open the new release on the Releases page.
  2. Click Publish this Action to the GitHub Marketplace.
  3. Accept the Marketplace terms.
  4. Choose two categories from the dropdown — recommended: Code quality and Continuous integration.
  5. Add featured tags: code-quality, pull-request, git, coupling, static-analysis.

The branding.icon (git-branch) and branding.color (orange) from action.yml are picked up automatically as the listing badge.

License

MIT. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

couplingguard-0.1.1rc1.tar.gz (59.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

couplingguard-0.1.1rc1-py3-none-any.whl (31.3 kB view details)

Uploaded Python 3

File details

Details for the file couplingguard-0.1.1rc1.tar.gz.

File metadata

  • Download URL: couplingguard-0.1.1rc1.tar.gz
  • Upload date:
  • Size: 59.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for couplingguard-0.1.1rc1.tar.gz
Algorithm Hash digest
SHA256 d2022f0fcacc46034e011829af5e61ee5f5857ca52aaf488a319fb2bce9302e0
MD5 2b4a1bfbd4bfaeff795491f781827feb
BLAKE2b-256 683b7bd8d85e588c72de18d3938823efa576b02f6de816ee636623b259d84d28

See more details on using hashes here.

Provenance

The following attestation bundles were made for couplingguard-0.1.1rc1.tar.gz:

Publisher: release.yml on Meru143/couplingguard

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file couplingguard-0.1.1rc1-py3-none-any.whl.

File metadata

File hashes

Hashes for couplingguard-0.1.1rc1-py3-none-any.whl
Algorithm Hash digest
SHA256 576c611b2a2477ff31b26b92f25057965cbd37aabf719599945f5685ed808824
MD5 4b6e3521635759d01ac3d7565b56d081
BLAKE2b-256 132351868ec4e4e6dd5c5c3e9317abab2f289cfe381e446a30904776c052c260

See more details on using hashes here.

Provenance

The following attestation bundles were made for couplingguard-0.1.1rc1-py3-none-any.whl:

Publisher: release.yml on Meru143/couplingguard

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page