Skip to main content

Semantic linting CLI that detects codebase redundancy created by AI coding agents

Project description

Echo-Guard Logo

Echo-Guard

Semantic linting CLI for AI-generated code redundancy

PyPI Python License CI

What is Echo-Guard?

Echo-Guard is a semantic linting CLI designed to catch the subtle, functional duplication that AI coding agents often introduce.

Unlike traditional linters that focus on syntax errors or style, Echo-Guard analyzes the logic and intent of your code. It identifies "echoes"—blocks of code that perform the same task but might look slightly different—across your entire project, regardless of the file or service they live in.

Why Echo-Guard?

AI-assisted development (Cursor, Claude Code, Copilot) is incredibly fast, but it has a "memory" problem. Agents often generate fresh code for a task that has already been solved elsewhere in your codebase.

Use Echo-Guard to:

  • Kill Hidden Redundancy: Catch duplicate business logic that "grep" or simple string matching would miss.
  • Prevent "AI Rot": Stop your codebase from bloating with slightly different versions of the same utility functions.
  • Keep Your Data Local: Built for privacy-conscious teams. Echo-Guard runs entirely on your machine—no code is ever uploaded to the cloud for analysis, without opt-in for anonymized metadata for improving model.
  • Scale Across Languages: Maintain a DRY (Don't Repeat Yourself) architecture even in polyglot repositories.

Install

Prerequisites

If you don't have pipx installed:

# macOS
brew install pipx && pipx ensurepath

# Linux / WSL
python3 -m pip install --user pipx && pipx ensurepath

# Windows (PowerShell)
pip install pipx
pipx ensurepath

Install Echo Guard

pipx install "echo-guard[languages,mcp]"

Upgrade

pipx upgrade echo-guard

To upgrade to a specific version:

pipx install "echo-guard[languages,mcp]" --force --pip-args="echo-guard==0.3.0"

Getting Started

echo-guard setup

The setup wizard handles everything:

  1. Directory selection — choose which directories to scan (interactive arrow-key selector)
  2. Language detection — auto-detects languages in your selected directories
  3. MCP registration — detects Claude Code and registers the MCP server automatically
  4. GitHub Action — optionally generates .github/workflows/echo-guard-ci.yml for PR checks
  5. Initial index + scan — indexes your codebase and runs the first scan

One command, fully configured. The wizard generates echo-guard.yml with all settings.

Manual workflow

If you prefer to skip the wizard:

echo-guard index        # Index your codebase
echo-guard scan         # Scan for duplicates
echo-guard review       # Walk through findings interactively
echo-guard add-mcp      # Register MCP server with Claude Code
echo-guard add-action   # Generate GitHub Action for PR checks

Example Output

Echo Guard — Scan Results

  18 HIGH · 28 MEDIUM · 11 LOW  (892 raw pairs)
  11 LOW findings hidden — use --verbose to show

  Top refactoring targets:
    fetchJson()  —  13 copies
    timeAgo()  —  4 copies
    schemaTypes()  —  4 copies

  ━━━ EXTRACT NOW (18) ━━━
  3+ copies — real DRY violations

  ● #1  T1/T2 Exact — fetchJson() x13
       components/UserList.tsx:10  fetchJson()
       components/TeamList.tsx:8  fetchJson()
       lib/api.ts:15  fetchJson()
       ...
       → Extract to shared module under lib/

  ━━━ WORTH NOTING (28) ━━━
  2 exact copies — fix if complex, defer per Rule of Three

  ● #1  T1/T2 Exact — validate_email()  (100%)
       services/auth/utils.py:12  →  import from services/user/validators.py:8

How It Works

Echo Guard uses a three-tier detection pipeline:

Tier 1 — AST Hash Matching (Type-1/Type-2)

Tree-sitter parses functions, normalizes identifiers, and computes structural hashes. Two functions with the same hash are exact or renamed clones. O(n) — 100% recall, zero false positives.

Tier 2 — Code Embeddings (Type-3/Type-4)

UniXcoder encodes each function into a 768-dim embedding vector. Cosine similarity search finds modified clones (same structure, different statements) and semantic clones (same intent, completely different implementation). ~15ms per function, ~2ms search at 100K functions.

Tier 3 — Feature Classifier

A trained classifier combines 14 code features — AST edit distance, embedding score, name/body identifier overlap, call patterns, control flow similarity, parameter signatures, return shape, and context flags — to make the final accept/reject decision on each candidate pair. The model is optimized for high precision on real-world codebases, suppressing structural false positives (CRUD boilerplate, UI wrapper patterns, observer callbacks) while preserving genuine duplicates.

Severity Model (DRY-based)

Severity is based on actionability, not just clone confidence:

Severity Meaning CI Behavior
HIGH 3+ copies of the same function — extract to shared module Fails fail_on: high
MEDIUM 2 exact copies — worth noting, defer per Rule of Three Fails fail_on: medium
LOW Lower-confidence semantic match — hidden by default Never fails CI

Report sections are grouped by action type: Extract Now (HIGH), Worth Noting (MEDIUM), Cross-Service, and Cross-Language.

MCP Integration

Echo Guard includes a built-in MCP server so AI agents can check for duplicates before generating new functions. Supported agents:

  • Claude Code — auto-detected and registered via claude mcp add
  • Codex — auto-detected and registered via codex mcp add

The MCP server is registered automatically during echo-guard setup, or manually via echo-guard add-mcp. It provides:

Tool Description
check_for_duplicates Check code for duplicates (before/after writing)
resolve_finding Record verdict: fixed, acknowledged, or false_positive
respond_to_probe Evaluate a low-confidence match for training data
get_finding_resolutions View resolution history and stats
search_functions Search index by function name, keyword, or language
suggest_refactor Get consolidation suggestions
get_index_stats View index statistics
get_codebase_clusters Understand code grouping
Manual MCP registration
# Claude Code
claude mcp add echo-guard -- "$(pipx environment --value PIPX_LOCAL_VENVS)/echo-guard/bin/python" -m echo_guard.mcp_server

# Codex
codex mcp add echo-guard -- "$(pipx environment --value PIPX_LOCAL_VENVS)/echo-guard/bin/python" -m echo_guard.mcp_server

Supported Languages

Python, JavaScript, TypeScript, Go, Rust, Java, Ruby, C, C++

Cross-language matching is supported.

CLI Reference

Command Description
echo-guard setup Interactive setup wizard
echo-guard scan Scan for redundant code
echo-guard scan -v Show detailed match table
echo-guard review Interactive review of all findings
echo-guard add-mcp Register MCP server (Claude/Codex)
echo-guard add-action Generate GitHub Action workflow
echo-guard index Index codebase
echo-guard check FILES Check specific files
echo-guard watch Watch files in real time
echo-guard health Compute codebase health
echo-guard acknowledge Acknowledge a single finding by ID
echo-guard training-data View/export collected training data
echo-guard clear-index Clear index

Configuration

Everything lives in echo-guard.yml, generated by echo-guard setup:

# Detection settings
threshold: 0.50 # General similarity floor (after scope penalties)
min_function_lines: 3 # Skip functions shorter than this
max_function_lines: 500 # Skip functions longer than this

# Languages to scan
languages:
  - python
  - javascript
  - typescript

# CI behavior (used by GitHub Action)
fail_on: high # high, medium, or none

# Directories to exclude from scanning
ignore:
  - docs/
  - tests/
  - benchmarks/

# Service boundaries for monorepo-aware suggestions
# service_boundaries:
#   - services/worker
#   - services/dashboard

# Acknowledged findings — suppressed in CI and future scans
# Run `echo-guard review` to add entries interactively
acknowledged:
  - echo_guard/cli.py:scan||echo_guard/cli.py:check

What each setting does

Setting Default Description
threshold 0.50 Minimum similarity score after scope penalties. Functions with private/internal visibility get penalized — this floor determines if penalized matches are still shown.
min_function_lines 3 Functions shorter than this are skipped (getters, one-liners).
max_function_lines 500 Functions longer than this are skipped (generated code, data dumps).
languages all 9 Which languages to scan. Restricting this speeds up indexing.
fail_on high Minimum severity that fails the CI check. none = advisory only.
ignore [] Directories/patterns to exclude from scanning (gitignore-style).
acknowledged [] Finding IDs that have been reviewed and accepted. These are suppressed in CI and in echo-guard review.

Local artifacts are stored in .echo-guard/ (gitignored):

.echo-guard/
├── index.duckdb        # Function metadata and training data
├── embeddings.npy      # Code embedding vectors
├── embedding_meta.json # Embedding store metadata
├── scan-results.txt    # Latest scan report
└── model_cache/        # Cached UniXcoder ONNX model (~500MB, downloaded on first use)

CI Integration

GitHub Action

Generated automatically by echo-guard setup, or add manually to .github/workflows/echo-guard-ci.yml:

name: Echo Guard
on: [pull_request]
permissions:
  contents: read
  pull-requests: write
jobs:
  echo-guard:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0
      - uses: actions/setup-python@v5
        with:
          python-version: "3.12"
      - uses: jwizenfeld04/Echo-Guard@v0.3.0 # Pin to your installed version
        with:
          threshold: "0.50"
          fail-on: "high" # Only 3+ copy DRY violations fail the check
          comment: "true"

Tip: Pin the action version to match your installed echo-guard version. Run echo-guard --version to check.

Acknowledging Findings

When Echo Guard flags intentional duplication that blocks your PR:

echo-guard review

This walks through each finding with code previews:

  • a = acknowledge (intentional duplication, suppress in CI)
  • f = false positive (not a real clone, suppress and record as training data)
  • s = skip (leave unresolved)

Acknowledged findings are saved to the acknowledged list in echo-guard.yml. Commit the file to suppress them in future CI runs.

Privacy

  • No telemetry, no uploads — everything runs locally on your machine
  • Training data — when you resolve findings or respond to probes, code pairs are stored locally in .echo-guard/index.duckdb for future model improvement. This data never leaves your machine. See FINE-TUNING.md for details.
  • No cloud dependencies — the embedding model runs locally via ONNX Runtime (CPU only)

Roadmap

  • GitHub Action — PR annotations, summary comments, severity-based gating
  • Semantic detection — UniXcoder embeddings for Type-3/Type-4 clone detection
  • Feature classifier — 14-feature logistic regression with AST edit distance, DRY-based severity
  • Intra-function detection — Block-level clone detection within function bodies
  • AI-powered fixes — Automated refactoring patches via LLM
  • Finding history — Track finding lifecycle, stale detection, trend dashboard
  • VS Code extension — Real-time inline diagnostics via MCP

See ROADMAP.md for the full plan with details and rationale.

Documentation

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

echo_guard-0.3.0.tar.gz (353.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

echo_guard-0.3.0-py3-none-any.whl (100.7 kB view details)

Uploaded Python 3

File details

Details for the file echo_guard-0.3.0.tar.gz.

File metadata

  • Download URL: echo_guard-0.3.0.tar.gz
  • Upload date:
  • Size: 353.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for echo_guard-0.3.0.tar.gz
Algorithm Hash digest
SHA256 1a843e39dd3320dfcc3bc10cbdd76ed13c4133ffed829af33100bf7f77f6a66e
MD5 a52f707fea02b9d6d0e21e9cf1cf4a53
BLAKE2b-256 5772cfbae8b78dbe2379958dcae55ab26006e7d3aa8d526efbbd189466367c20

See more details on using hashes here.

Provenance

The following attestation bundles were made for echo_guard-0.3.0.tar.gz:

Publisher: publish.yml on jwizenfeld04/Echo-Guard

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file echo_guard-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: echo_guard-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 100.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for echo_guard-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d1094d31d0dac12ebab262d8d99aee043ab7266f8f59355aa62b5b58a9b20dd5
MD5 9cab11aae7f8921e74233b4d45300804
BLAKE2b-256 8fabd49e854a566d05ce102e8685cbbcf5ecb426216952205083cbf1330a82c0

See more details on using hashes here.

Provenance

The following attestation bundles were made for echo_guard-0.3.0-py3-none-any.whl:

Publisher: publish.yml on jwizenfeld04/Echo-Guard

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page