Semantic linting CLI that detects codebase redundancy created by AI coding agents
Project description
Echo-Guard
Semantic linting CLI for AI-generated code redundancy
What is Echo-Guard?
Echo-Guard is a semantic linting CLI designed to catch the subtle, functional duplication that AI coding agents often introduce.
Unlike traditional linters that focus on syntax errors or style, Echo-Guard analyzes the logic and intent of your code. It identifies "echoes"—blocks of code that perform the same task but might look slightly different—across your entire project, regardless of the file or service they live in.
Why Echo-Guard?
AI-assisted development (Cursor, Claude Code, Copilot) is incredibly fast, but it has a "memory" problem. Agents often generate fresh code for a task that has already been solved elsewhere in your codebase.
Use Echo-Guard to:
- Kill Hidden Redundancy: Catch duplicate business logic that "grep" or simple string matching would miss.
- Prevent "AI Rot": Stop your codebase from bloating with slightly different versions of the same utility functions.
- Keep Your Data Local: Built for privacy-conscious teams. Echo-Guard runs entirely on your machine—no code is ever uploaded to the cloud for analysis, without opt-in for anonymized metadata for improving model.
- Scale Across Languages: Maintain a DRY (Don't Repeat Yourself) architecture even in polyglot repositories.
Install
Prerequisites
If you don't have pipx installed:
# macOS
brew install pipx && pipx ensurepath
# Linux / WSL
python3 -m pip install --user pipx && pipx ensurepath
# Windows (PowerShell)
pip install pipx
pipx ensurepath
Install Echo Guard
pipx install "echo-guard[languages,mcp]"
Upgrade
pipx upgrade echo-guard
To upgrade to a specific version:
pipx install "echo-guard[languages,mcp]" --force --pip-args="echo-guard==0.3.0"
Getting Started
echo-guard setup
The setup wizard handles everything:
- Directory selection — choose which directories to scan (interactive arrow-key selector)
- Language detection — auto-detects languages in your selected directories
- MCP registration — detects Claude Code and registers the MCP server automatically
- GitHub Action — optionally generates
.github/workflows/echo-guard-ci.ymlfor PR checks - Initial index + scan — indexes your codebase and runs the first scan
One command, fully configured. The wizard generates echo-guard.yml with all settings.
Manual workflow
If you prefer to skip the wizard:
echo-guard index # Index your codebase
echo-guard scan # Scan for duplicates
echo-guard review # Walk through findings interactively
echo-guard add-mcp # Register MCP server with Claude Code
echo-guard add-action # Generate GitHub Action for PR checks
Example Output
Echo Guard — Scan Results
18 HIGH · 28 MEDIUM · 11 LOW (892 raw pairs)
11 LOW findings hidden — use --verbose to show
Top refactoring targets:
fetchJson() — 13 copies
timeAgo() — 4 copies
schemaTypes() — 4 copies
━━━ EXTRACT NOW (18) ━━━
3+ copies — real DRY violations
● #1 T1/T2 Exact — fetchJson() x13
components/UserList.tsx:10 fetchJson()
components/TeamList.tsx:8 fetchJson()
lib/api.ts:15 fetchJson()
...
→ Extract to shared module under lib/
━━━ WORTH NOTING (28) ━━━
2 exact copies — fix if complex, defer per Rule of Three
● #1 T1/T2 Exact — validate_email() (100%)
services/auth/utils.py:12 → import from services/user/validators.py:8
How It Works
Echo Guard uses a three-tier detection pipeline:
Tier 1 — AST Hash Matching (Type-1/Type-2)
Tree-sitter parses functions, normalizes identifiers, and computes structural hashes. Two functions with the same hash are exact or renamed clones. O(n) — 100% recall, zero false positives.
Tier 2 — Code Embeddings (Type-3/Type-4)
UniXcoder encodes each function into a 768-dim embedding vector. Cosine similarity search finds modified clones (same structure, different statements) and semantic clones (same intent, completely different implementation). ~15ms per function, ~2ms search at 100K functions.
Tier 3 — Feature Classifier
A trained classifier combines 14 code features — AST edit distance, embedding score, name/body identifier overlap, call patterns, control flow similarity, parameter signatures, return shape, and context flags — to make the final accept/reject decision on each candidate pair. The model is optimized for high precision on real-world codebases, suppressing structural false positives (CRUD boilerplate, UI wrapper patterns, observer callbacks) while preserving genuine duplicates.
Severity Model (DRY-based)
Severity is based on actionability, not just clone confidence:
| Severity | Meaning | CI Behavior |
|---|---|---|
| HIGH | 3+ copies of the same function — extract to shared module | Fails fail_on: high |
| MEDIUM | 2 exact copies — worth noting, defer per Rule of Three | Fails fail_on: medium |
| LOW | Lower-confidence semantic match — hidden by default | Never fails CI |
Report sections are grouped by action type: Extract Now (HIGH), Worth Noting (MEDIUM), Cross-Service, and Cross-Language.
MCP Integration
Echo Guard includes a built-in MCP server so AI agents can check for duplicates before generating new functions. Supported agents:
- Claude Code — auto-detected and registered via
claude mcp add - Codex — auto-detected and registered via
codex mcp add
The MCP server is registered automatically during echo-guard setup, or manually via echo-guard add-mcp. It provides:
| Tool | Description |
|---|---|
check_for_duplicates |
Check code for duplicates (before/after writing) |
resolve_finding |
Record verdict: fixed, acknowledged, or false_positive |
respond_to_probe |
Evaluate a low-confidence match for training data |
get_finding_resolutions |
View resolution history and stats |
search_functions |
Search index by function name, keyword, or language |
suggest_refactor |
Get consolidation suggestions |
get_index_stats |
View index statistics |
get_codebase_clusters |
Understand code grouping |
Manual MCP registration
# Claude Code
claude mcp add echo-guard -- "$(pipx environment --value PIPX_LOCAL_VENVS)/echo-guard/bin/python" -m echo_guard.mcp_server
# Codex
codex mcp add echo-guard -- "$(pipx environment --value PIPX_LOCAL_VENVS)/echo-guard/bin/python" -m echo_guard.mcp_server
Supported Languages
Python, JavaScript, TypeScript, Go, Rust, Java, Ruby, C, C++
Cross-language matching is supported.
CLI Reference
| Command | Description |
|---|---|
echo-guard setup |
Interactive setup wizard |
echo-guard scan |
Scan for redundant code |
echo-guard scan -v |
Show detailed match table |
echo-guard review |
Interactive review of all findings |
echo-guard add-mcp |
Register MCP server (Claude/Codex) |
echo-guard add-action |
Generate GitHub Action workflow |
echo-guard index |
Index codebase |
echo-guard check FILES |
Check specific files |
echo-guard watch |
Watch files in real time |
echo-guard health |
Compute codebase health |
echo-guard acknowledge |
Acknowledge a single finding by ID |
echo-guard training-data |
View/export collected training data |
echo-guard clear-index |
Clear index |
Configuration
Everything lives in echo-guard.yml, generated by echo-guard setup:
# Detection settings
threshold: 0.50 # General similarity floor (after scope penalties)
min_function_lines: 3 # Skip functions shorter than this
max_function_lines: 500 # Skip functions longer than this
# Languages to scan
languages:
- python
- javascript
- typescript
# CI behavior (used by GitHub Action)
fail_on: high # high, medium, or none
# Directories to exclude from scanning
ignore:
- docs/
- tests/
- benchmarks/
# Service boundaries for monorepo-aware suggestions
# service_boundaries:
# - services/worker
# - services/dashboard
# Acknowledged findings — suppressed in CI and future scans
# Run `echo-guard review` to add entries interactively
acknowledged:
- echo_guard/cli.py:scan||echo_guard/cli.py:check
What each setting does
| Setting | Default | Description |
|---|---|---|
threshold |
0.50 |
Minimum similarity score after scope penalties. Functions with private/internal visibility get penalized — this floor determines if penalized matches are still shown. |
min_function_lines |
3 |
Functions shorter than this are skipped (getters, one-liners). |
max_function_lines |
500 |
Functions longer than this are skipped (generated code, data dumps). |
languages |
all 9 | Which languages to scan. Restricting this speeds up indexing. |
fail_on |
high |
Minimum severity that fails the CI check. none = advisory only. |
ignore |
[] |
Directories/patterns to exclude from scanning (gitignore-style). |
acknowledged |
[] |
Finding IDs that have been reviewed and accepted. These are suppressed in CI and in echo-guard review. |
Local artifacts are stored in .echo-guard/ (gitignored):
.echo-guard/
├── index.duckdb # Function metadata and training data
├── embeddings.npy # Code embedding vectors
├── embedding_meta.json # Embedding store metadata
├── scan-results.txt # Latest scan report
└── model_cache/ # Cached UniXcoder ONNX model (~500MB, downloaded on first use)
CI Integration
GitHub Action
Generated automatically by echo-guard setup, or add manually to .github/workflows/echo-guard-ci.yml:
name: Echo Guard
on: [pull_request]
permissions:
contents: read
pull-requests: write
jobs:
echo-guard:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- uses: actions/setup-python@v5
with:
python-version: "3.12"
- uses: jwizenfeld04/Echo-Guard@v0.3.0 # Pin to your installed version
with:
threshold: "0.50"
fail-on: "high" # Only 3+ copy DRY violations fail the check
comment: "true"
Tip: Pin the action version to match your installed
echo-guardversion. Runecho-guard --versionto check.
Acknowledging Findings
When Echo Guard flags intentional duplication that blocks your PR:
echo-guard review
This walks through each finding with code previews:
- a = acknowledge (intentional duplication, suppress in CI)
- f = false positive (not a real clone, suppress and record as training data)
- s = skip (leave unresolved)
Acknowledged findings are saved to the acknowledged list in echo-guard.yml. Commit the file to suppress them in future CI runs.
Privacy
- No telemetry, no uploads — everything runs locally on your machine
- Training data — when you resolve findings or respond to probes, code pairs are stored locally in
.echo-guard/index.duckdbfor future model improvement. This data never leaves your machine. See FINE-TUNING.md for details. - No cloud dependencies — the embedding model runs locally via ONNX Runtime (CPU only)
Roadmap
- GitHub Action — PR annotations, summary comments, severity-based gating
- Semantic detection — UniXcoder embeddings for Type-3/Type-4 clone detection
- Feature classifier — 14-feature logistic regression with AST edit distance, DRY-based severity
- Intra-function detection — Block-level clone detection within function bodies
- AI-powered fixes — Automated refactoring patches via LLM
- Finding history — Track finding lifecycle, stale detection, trend dashboard
- VS Code extension — Real-time inline diagnostics via MCP
See ROADMAP.md for the full plan with details and rationale.
Documentation
- Architecture — Three-tier detection pipeline, clone types, storage, scaling
- Fine-Tuning Roadmap — Improving semantic detection through contrastive learning
- Roadmap — Development phases and planned features
- Changelog
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file echo_guard-0.3.0.tar.gz.
File metadata
- Download URL: echo_guard-0.3.0.tar.gz
- Upload date:
- Size: 353.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1a843e39dd3320dfcc3bc10cbdd76ed13c4133ffed829af33100bf7f77f6a66e
|
|
| MD5 |
a52f707fea02b9d6d0e21e9cf1cf4a53
|
|
| BLAKE2b-256 |
5772cfbae8b78dbe2379958dcae55ab26006e7d3aa8d526efbbd189466367c20
|
Provenance
The following attestation bundles were made for echo_guard-0.3.0.tar.gz:
Publisher:
publish.yml on jwizenfeld04/Echo-Guard
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
echo_guard-0.3.0.tar.gz -
Subject digest:
1a843e39dd3320dfcc3bc10cbdd76ed13c4133ffed829af33100bf7f77f6a66e - Sigstore transparency entry: 1178876009
- Sigstore integration time:
-
Permalink:
jwizenfeld04/Echo-Guard@876ac5c9d20680b509a9f651e8badef6c52926b7 -
Branch / Tag:
refs/tags/v0.3.0 - Owner: https://github.com/jwizenfeld04
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@876ac5c9d20680b509a9f651e8badef6c52926b7 -
Trigger Event:
push
-
Statement type:
File details
Details for the file echo_guard-0.3.0-py3-none-any.whl.
File metadata
- Download URL: echo_guard-0.3.0-py3-none-any.whl
- Upload date:
- Size: 100.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d1094d31d0dac12ebab262d8d99aee043ab7266f8f59355aa62b5b58a9b20dd5
|
|
| MD5 |
9cab11aae7f8921e74233b4d45300804
|
|
| BLAKE2b-256 |
8fabd49e854a566d05ce102e8685cbbcf5ecb426216952205083cbf1330a82c0
|
Provenance
The following attestation bundles were made for echo_guard-0.3.0-py3-none-any.whl:
Publisher:
publish.yml on jwizenfeld04/Echo-Guard
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
echo_guard-0.3.0-py3-none-any.whl -
Subject digest:
d1094d31d0dac12ebab262d8d99aee043ab7266f8f59355aa62b5b58a9b20dd5 - Sigstore transparency entry: 1178876029
- Sigstore integration time:
-
Permalink:
jwizenfeld04/Echo-Guard@876ac5c9d20680b509a9f651e8badef6c52926b7 -
Branch / Tag:
refs/tags/v0.3.0 - Owner: https://github.com/jwizenfeld04
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@876ac5c9d20680b509a9f651e8badef6c52926b7 -
Trigger Event:
push
-
Statement type: