Skip to main content

Code duplication analyzer and refactoring planner for LLMs

Project description

reDUP

Code duplication analyzer and refactoring planner for LLMs.

PyPI License: Apache-2.0 Python Version

AI Cost Tracking

PyPI Version Python License AI Cost Human Time Model

  • ๐Ÿค– LLM usage: $7.5000 (63 commits)
  • ๐Ÿ‘ค Human dev: ~$1568 (15.7h @ $100/h, 30min dedup)

Generated on 2026-04-16 using openrouter/qwen/qwen3-coder-next


reDUP scans codebases for duplicated functions, blocks, and structural patterns โ€” then builds a prioritized refactoring map that LLMs can consume to eliminate redundancy systematically.

Features

  • Exact duplicate detection via SHA-256 block hashing
  • Structural clone detection โ€” same AST shape, different variable names
  • LSH near-duplicate detection for large code blocks (>50 lines)
  • Multi-language support โ€” 35+ languages via tree-sitter (Python, JavaScript, TypeScript, Go, Rust, Java, C/C++, C#, Ruby, PHP, Bash, SQL, HTML, CSS, Lua, Scala, Kotlin, Swift, Objective-C, JSON, YAML, TOML, XML, Markdown, GraphQL, Dockerfile, Makefile, Nginx, Vim, Svelte, Vue, and more)
  • Parallel scanning for large projects (2x+ performance improvement)
  • Fuzzy near-duplicate matching via SequenceMatcher / rapidfuzz
  • Function-level analysis using Python AST and tree-sitter extraction
  • Impact scoring โ€” prioritizes duplicates by saved_lines ร— similarity
  • Refactoring planner โ€” generates concrete extract/inline suggestions
  • Multiple output formats: JSON, YAML, TOON, Markdown
  • Configuration system โ€” TOML files and environment variables
  • CLI commands: scan, compare, diff, check, config, info
  • Cross-project comparison โ€” detect shared code between projects with merge/extract recommendations
  • CI integration with configurable quality gates
  • Clean output โ€” no syntax warnings from external libraries

New Features (v0.4.20)

๐Ÿค– MCP Server

Full MCP (Model Context Protocol) server for AI assistant integration:

# Start MCP server
redup-mcp

# Or HTTP mode
redup-mcp --transport http --port 8000

Available Tools:

  • analyze_project โ€” Full duplication analysis
  • find_duplicates โ€” Quick duplicate detection
  • check_project โ€” Quality gate check
  • compare_projects โ€” Cross-project comparison
  • suggest_refactoring โ€” AI-powered refactoring suggestions
  • project_info โ€” Project metadata

๐ŸŒ Universal Fuzzy Similarity Detection

Cross-language duplicate detection across all 35+ supported languages:

# Detect similar code across languages
redup scan . --fuzzy --fuzzy-threshold 0.65

Cross-Language Matching:

  • JavaScript โ†” Python functions: ~65% similarity
  • Docker โ†” YAML configs: ~40% similarity
  • Auth patterns across languages: ~70% similarity

Supported Patterns:

  • Functions, classes, API endpoints
  • Database queries, web components
  • Auth/validation, error handling, logging
  • Configuration, infrastructure code

๐ŸŒณ Modular Tree-Sitter Extractor

Refactored tree-sitter extraction with clean, modular architecture:

ts_extractor/
โ”œโ”€โ”€ extractors/          # Modular per-language extractors
โ”‚   โ”œโ”€โ”€ c_family.py      # C, C++, C#, Objective-C
โ”‚   โ”œโ”€โ”€ go.py            # Go
โ”‚   โ”œโ”€โ”€ java.py          # Java, Scala, Kotlin
โ”‚   โ”œโ”€โ”€ markup.py        # HTML, XML, Svelte, Vue
โ”‚   โ”œโ”€โ”€ web.py           # JavaScript, TypeScript
โ”‚   โ””โ”€โ”€ ...
โ”œโ”€โ”€ dispatcher.py        # Smart language routing
โ”œโ”€โ”€ config.py            # Language registry
โ””โ”€โ”€ main.py              # Unified API

Benefits:

  • Easier to add new languages
  • Better testability
  • Cleaner separation of concerns
  • 35+ languages supported

New Features (v0.5.0+)

๐ŸŒ Universal Fuzzy Similarity Detection

Cross-language fuzzy matching for detecting similar code patterns across all 35+ supported languages:

# Detect similar patterns across different languages
redup scan . --fuzzy --ext .py,.js,.ts

# Cross-project comparison with fuzzy matching
redup compare ./project-a ./project-b --fuzzy --threshold 0.65

Features:

  • Detects similar functions, API endpoints, validation logic across languages (e.g., JS โ†” Python)
  • Pattern recognition: authentication, error handling, database queries, web components
  • Language-agnostic signature generation with identifier normalization
  • Complexity scoring (0.0-1.0) for each detected pattern

Example patterns detected:

  • Express.js route handler โ†” Flask endpoint (70% similarity)
  • Docker Compose service โ†” Kubernetes deployment (40% similarity)
  • Auth middleware patterns across frameworks

๐Ÿงฉ Modular ts_extractor Architecture

The tree-sitter multi-language extractor has been refactored from a 782-line god module into a clean package:

redup/core/ts_extractor/
โ”œโ”€โ”€ extractors/
โ”‚   โ”œโ”€โ”€ web.py        # JavaScript/TypeScript
โ”‚   โ”œโ”€โ”€ c_family.py   # C/C++
โ”‚   โ”œโ”€โ”€ dotnet.py     # C#
โ”‚   โ”œโ”€โ”€ ruby.py       # Ruby
โ”‚   โ”œโ”€โ”€ php.py        # PHP
โ”‚   โ””โ”€โ”€ ...           # 10+ language-specific modules

Benefits:

  • Better maintainability (avg 100 lines per module vs 782)
  • Easier to add new language extractors
  • Shared base utilities for common operations
  • Full backward compatibility maintained

๐ŸŽฏ Enriched TOON Reporter

The TOON format now includes actionable sections for practical refactoring:

  • HOTSPOTS โ€” Top 7 files with most duplicated lines (where to focus effort)
  • QUICK_WINS โ€” Low-risk, high-savings suggestions (do first)
  • DEPENDENCY_RISK โ€” Duplicates spanning multiple packages (cross-module risk)
  • EFFORT_ESTIMATE โ€” Time estimates per task with difficulty (easy/medium/hard)

๐Ÿค– LLM-Powered Refactoring Plans

Generate AI-assisted refactoring TODO lists from cross-project comparisons:

redup compare ./project-a ./project-b --refactor-plan --env .env --output report.json
  • Uses litellm for flexible LLM provider support
  • Compact metadata-only prompts for efficiency
  • Structured JSON output with prioritized tasks
  • Token usage tracking

๐Ÿ“Š Simplified Compare Reports

Cross-project comparison reports are now more compact and human-readable:

  • Relative file paths instead of absolute
  • Matches deduplicated by function pair
  • Communities with compact member dicts
  • Filtered trivial entries to reduce noise
  • ~60% smaller JSON size

Installation

pip install redup

With optional dependencies:

pip install redup[all]       # Everything
pip install redup[fuzzy]     # rapidfuzz for better similarity matching
pip install redup[ast]       # tree-sitter for multi-language AST
pip install redup[lsh]       # datasketch for LSH near-duplicate detection
pip install redup[compare]   # networkx for cross-project community detection
pip install redup[llm]       # litellm for LLM-powered refactoring plans

Quick Start

CLI

# Scan current directory, output TOON to stdout
redup scan .

# Scan with JSON output saved to file
redup scan ./src --format json --output ./reports/

# Parallel scanning for large projects
redup scan . --parallel --max-workers 4

# Multi-language scanning with 35+ supported languages
redup scan . --ext ".py,.js,.ts,.go,.rs,.java,.rb,.php,.html,.css,.sql,.lua,.scala,.kt,.swift,.m,.json,.yaml,.toml,.xml,.md,.graphql,.dockerfile,.svelte,.vue"

# CI gate with thresholds
redup check . --max-groups 10 --max-lines 100

# Compare two scans
redup diff before.json after.json

# Cross-project comparison (merge vs extract decision)
redup compare ./project-a ./project-b --threshold 0.75

# With LLM-powered refactoring plan (requires litellm + .env with API keys)
redup compare ./project-a ./project-b --refactor-plan --env .env --output comparison.json

# Specify custom LLM model
redup compare ./project-a ./project-b --refactor-plan --llm-model openrouter/anthropic/claude-3.5-sonnet

# Initialize configuration
redup config --init
# Scan with all formats
redup scan . --format all --output ./redup_output/

# Only function-level duplicates (faster)
redup scan . --functions-only

# Custom thresholds
redup scan . --min-lines 5 --min-sim 0.9

# Show installed optional dependencies
redup info

Configuration

Create a redup.toml file:

[scan]
extensions = ".py,.js,.ts,.go,.rs,.java,.rb,.php,.html,.css,.sql,.lua,.scala,.kt,.swift,.m,.json,.yaml,.toml,.xml,.md,.graphql,.dockerfile,.svelte,.vue"
min_lines = 3
min_similarity = 0.85
include_tests = false

[lsh]
enabled = true
min_lines = 50
threshold = 0.8

[check]
max_groups = 10
max_lines = 100

[output]
format = "toon"
output = "redup_output"

[reporting]
include_snippets = true
generate_suggestions = true

Or use [tool.redup] in pyproject.toml. Environment variables with REDUP_ prefix override file settings.

Python API

from pathlib import Path
from redup import ScanConfig, analyze
from redup.reporters.toon_reporter import to_toon
from redup.reporters.json_reporter import to_json

config = ScanConfig(
    root=Path("./my_project"),
    extensions=[".py", ".js", ".ts", ".go", ".rs", ".java", ".rb", ".php", ".html", ".css"],
    min_block_lines=3,
    min_similarity=0.85,
)

result = analyze(config=config, function_level_only=True)

print(f"Found {result.total_groups} duplicate groups")
print(f"Lines recoverable: {result.total_saved_lines}")

# For LLM consumption
print(to_toon(result))

# For tooling / CI
Path("duplication.json").write_text(to_json(result))

Output Formats

TOON (LLM-optimized)

# redup/duplication | 15 groups | 86f 10453L | 2026-04-16

SUMMARY:
  files_scanned: 86
  total_lines:   10453
  dup_groups:    15
  dup_fragments: 36
  saved_lines:   217
  scan_ms:       3620

HOTSPOTS[7] (files with most duplication):
  src/redup/core/ts_extractor.py  dup=74L  groups=4  frags=11  (0.7%)
  src/redup/core/scanner_utils.py  dup=70L  groups=3  frags=3  (0.7%)
  src/redup/core/scanner_loader.py  dup=52L  groups=1  frags=1  (0.5%)

DUPLICATES[15] (ranked by impact):
  [E0001] ! EXAC  _preload_files  L=52 N=2 saved=52 sim=1.00
      src/redup/core/scanner_loader.py:9-60  (_preload_files)
      src/redup/core/scanner_utils.py:53-104  (_preload_files)

REFACTOR[15] (ranked by priority):
  [1] โ— extract_module     โ†’ src/redup/core/utils/_preload_files.py
      WHY: 2 occurrences of 52-line block across 2 files โ€” saves 52 lines
      FILES: src/redup/core/scanner_loader.py, src/redup/core/scanner_utils.py

QUICK_WINS[8] (low risk, high savings โ€” do first):
  [3] extract_function   saved=26L  โ†’ src/redup/core/utils/find_exact_duplicates_lazy.py
      FILES: lazy_grouper.py
  [4] extract_function   saved=21L  โ†’ src/redup/core/utils/_extract_functions_go.py
      FILES: ts_extractor.py

DEPENDENCY_RISK[3] (duplicates spanning multiple packages):
  validate_input  packages=2  files=2
      api/routes/users.py
      services/auth/validate.py

EFFORT_ESTIMATE (total โ‰ˆ 8.7h):
  hard   _preload_files                      saved=52L  ~156min
  hard   __init__                            saved=36L  ~108min
  medium find_exact_duplicates_lazy          saved=26L  ~52min
  easy   _is_test_file                       saved=12L  ~24min

METRICS-TARGET:
  dup_groups:  15 โ†’ 0
  saved_lines: 217 lines recoverable

JSON (machine-readable)

{
  "summary": {
    "total_groups": 3,
    "total_saved_lines": 84
  },
  "groups": [
    {
      "id": "E0001",
      "type": "exact",
      "normalized_name": "calculate_tax",
      "fragments": [
        {"file": "billing.py", "line_start": 1, "line_end": 8},
        {"file": "shipping.py", "line_start": 1, "line_end": 8}
      ],
      "saved_lines_potential": 16
    }
  ],
  "refactor_suggestions": [
    {
      "priority": 1,
      "action": "extract_function",
      "new_module": "utils/calculate_tax.py",
      "risk_level": "low"
    }
  ]
}

Cross-Project Comparison

The redup compare command analyzes two separate projects to detect shared code and recommends a refactoring strategy:

  • Merge projects โ€” if >60% code overlap
  • Extract shared library โ€” if 5-60% overlap with well-defined clusters
  • Keep separate โ€” if <5% overlap

CLI Usage

# Basic comparison
redup compare ./project-a ./project-b --threshold 0.75

# With semantic similarity (slower, more accurate)
redup compare ./project-a ./project-b --semantic --threshold 0.70

# Multi-language projects
redup compare ./backend ./frontend --ext ".py,.js,.ts" --threshold 0.80

# Skip community detection (faster, no networkx required)
redup compare ./a ./b --no-community

# Generate LLM-powered refactoring plan (requires redup[llm])
redup compare ./a ./b --refactor-plan --env .env --output plan.json

Sample Output

Comparing project-a โ†” project-b (threshold=0.75)
โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”“
โ”ƒ      Cross-Project Comparison                        โ”ƒ
โ”กโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ฉ
โ”‚ Metric                  โ”‚ Value                      โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ Project A files         โ”‚ 42                         โ”‚
โ”‚ Project B files         โ”‚ 38                         โ”‚
โ”‚ Project A lines         โ”‚ 8500                       โ”‚
โ”‚ Project B lines         โ”‚ 7200                       โ”‚
โ”‚ Cross matches           โ”‚ 15                         โ”‚
โ”‚ Shared LOC (potential)  โ”‚ 1200                       โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Recommendation: extract_shared_lib
15% overlap (1200 shared lines, 5 clusters). Extract to shared library.
Confidence: 80%

Top Communities (shared code candidates):
โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”“
โ”ƒ ID โ”ƒ Name                 โ”ƒ Similarity โ”ƒ LOC โ”ƒ Members  โ”ƒ
โ”กโ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ฉ
โ”‚  0 โ”‚ validate_input       โ”‚ 0.89       โ”‚ 180 โ”‚ 5        โ”‚
โ”‚  1 โ”‚ parse_config         โ”‚ 0.82       โ”‚ 140 โ”‚ 4        โ”‚
โ”‚  2 โ”‚ format_response      โ”‚ 0.76       โ”‚ 100 โ”‚ 3        โ”‚
โ””โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Report JSON Structure

{
  "project_a": "./project-a",
  "project_b": "./project-b",
  "stats": {
    "a": {"files": 42, "lines": 8500},
    "b": {"files": 38, "lines": 7200}
  },
  "total_matches": 15,
  "shared_loc_potential": 1200,
  "recommendation": {
    "decision": "extract_shared_lib",
    "rationale": "15% overlap (1200 shared lines, 5 clusters). Extract to shared library.",
    "overlap_pct": 0.1523,
    "shared_loc": 1200,
    "confidence": 0.8
  },
  "communities": [
    {
      "name": "validate_input",
      "similarity": 0.89,
      "loc": 180,
      "members": [
        {"project": "A", "file": "api/validators.py", "function": "validate_input"},
        {"project": "B", "file": "utils/validation.py", "function": "validate_input"}
      ]
    }
  ],
  "matches": [...]
}

Algorithm Overview

The comparison uses a 3-tier similarity detection:

  1. Structural hash โ€” exact AST matches (fast, O(n+m))
  2. LSH (Locality Sensitive Hashing) โ€” near-duplicates via MinHash
  3. Semantic similarity โ€” CodeBERT embeddings (optional, slowest)

Matches are deduplicated by (function_a, function_b, file_a, file_b) with the highest similarity score retained.

Community Detection

Requires networkx (pip install redup[compare]).

Uses greedy modularity communities on a similarity graph where:

  • Nodes = functions from both projects
  • Edges = similarity score (filtered by --threshold)
  • Communities = clusters of mutually similar functions

Each community gets a generated name based on longest common prefix of its member functions (e.g., validate_* โ†’ validate_input).

Architecture

src/redup/
โ”œโ”€โ”€ __init__.py            # Public API
โ”œโ”€โ”€ __main__.py            # python -m redup
โ”œโ”€โ”€ mcp_server.py          # MCP server entry point (re-exports from mcp package)
โ”œโ”€โ”€ mcp/                   # MCP server package
โ”‚   โ”œโ”€โ”€ __init__.py        # Public MCP API
โ”‚   โ”œโ”€โ”€ handlers.py        # Tool handlers
โ”‚   โ”œโ”€โ”€ schemas.py         # JSON-RPC schemas
โ”‚   โ”œโ”€โ”€ server.py          # JSON-RPC server core
โ”‚   โ””โ”€โ”€ utils.py           # Shared utilities
โ”œโ”€โ”€ core/
โ”‚   โ”œโ”€โ”€ models.py          # Pydantic data models
โ”‚   โ”œโ”€โ”€ scanner.py         # File discovery + block extraction
โ”‚   โ”œโ”€โ”€ scanner/           # Scanner package
โ”‚   โ”‚   โ”œโ”€โ”€ __init__.py    # Public scanner API
โ”‚   โ”‚   โ”œโ”€โ”€ cache.py       # Memory cache
โ”‚   โ”‚   โ”œโ”€โ”€ filters.py     # File filtering
โ”‚   โ”‚   โ”œโ”€โ”€ loader.py      # File preloading
โ”‚   โ”‚   โ””โ”€โ”€ types.py       # Scanner types
โ”‚   โ”œโ”€โ”€ hasher.py          # SHA-256 / structural fingerprinting
โ”‚   โ”œโ”€โ”€ matcher.py         # Fuzzy similarity comparison
โ”‚   โ”œโ”€โ”€ planner.py         # Refactoring suggestion generator
โ”‚   โ”œโ”€โ”€ pipeline.py        # Legacy: re-exports from pipeline package
โ”‚   โ””โ”€โ”€ pipeline/          # Pipeline package (new)
โ”‚       โ”œโ”€โ”€ __init__.py    # analyze(), analyze_optimized(), analyze_parallel()
โ”‚       โ”œโ”€โ”€ phases.py      # scan_phase(), process_blocks()
โ”‚       โ”œโ”€โ”€ duplicate_finder.py  # Duplicate finding phases
โ”‚       โ””โ”€โ”€ groups.py      # Group creation, deduplication
โ”‚   โ””โ”€โ”€ ts_extractor/        # Tree-sitter extraction (35+ languages)
โ”‚       โ”œโ”€โ”€ __init__.py    # Public API
โ”‚       โ”œโ”€โ”€ main.py        # Core extraction API
โ”‚       โ”œโ”€โ”€ dispatcher.py  # Language routing
โ”‚       โ”œโ”€โ”€ config.py      # Language registry
โ”‚       โ””โ”€โ”€ extractors/    # Per-language extractors
โ”œโ”€โ”€ reporters/
โ”‚   โ”œโ”€โ”€ json_reporter.py   # JSON output
โ”‚   โ”œโ”€โ”€ yaml_reporter.py   # YAML output
โ”‚   โ””โ”€โ”€ toon_reporter.py   # TOON output (LLM-optimized)
โ””โ”€โ”€ cli_app/
    โ””โ”€โ”€ main.py            # Typer CLI

Analysis Pipeline

1. SCAN      Walk project, read files, extract function-level + sliding-window blocks
2. HASH      Generate exact (SHA-256) and structural (normalized AST) fingerprints
3. GROUP     Bucket by hash, keep only groups with 2+ blocks from different locations
4. MATCH     Verify candidates with fuzzy similarity (SequenceMatcher / rapidfuzz)
5. DEDUP     Remove overlapping groups (keep highest-impact)
6. PLAN      Generate prioritized refactoring suggestions with risk assessment
7. REPORT    Export to JSON / YAML / TOON

Recent Improvements (v0.5.0)

๐Ÿ—๏ธ Modular Architecture Refactoring

Major internal restructuring for better maintainability and extensibility:

MCP Server Package

The MCP server has been split from a 675-line monolith into a clean package:

redup/mcp/
โ”œโ”€โ”€ __init__.py      # Public API
โ”œโ”€โ”€ handlers.py      # 8 tool handlers
โ”œโ”€โ”€ schemas.py       # JSON-RPC schemas
โ”œโ”€โ”€ server.py        # Server core
โ””โ”€โ”€ utils.py         # Utilities
  • 82% code reduction in main file
  • Backward compatible: mcp_server.py re-exports all APIs
  • Better testability: Isolated handlers can be tested independently

Pipeline Package

The analysis pipeline (714 lines) now lives in a modular package:

redup/core/pipeline/
โ”œโ”€โ”€ __init__.py          # analyze(), analyze_optimized(), analyze_parallel()
โ”œโ”€โ”€ phases.py            # scan_phase(), process_blocks()
โ”œโ”€โ”€ duplicate_finder.py  # find_exact_groups(), find_structural_groups(), etc.
โ””โ”€โ”€ groups.py            # deduplicate_groups(), blocks_to_group(), etc.
  • 66% reduction in main orchestrator file
  • Phases can be used independently for custom workflows
  • Cleaner separation of concerns

Scanner Improvements

The scanner has been refactored with extracted helpers:

  • _init_strategy() - Strategy initialization
  • _process_single_file() - Per-file processing
  • _extract_blocks_for_file() - Block extraction
  • Reduced CC and fan-out in main scan_project() function

๐ŸŽฏ Sprint 1 Refactoring Complete

  • Reduced cyclomatic complexity from CCฬ„=4.2 to CCฬ„=3.5
  • Eliminated all critical functions (CC > 10): 2 โ†’ 0
  • Achieved HEALTHY status with no structural issues
  • Dispatch pattern implementation for AST node processing
  • Modular TOON reporter split into 5 focused functions
  • CLI refactoring with helper functions for better maintainability

๐Ÿš€ Technical Achievements

  • _process_ast_node: CC=14 โ†’ CC=6 (dispatch dict pattern)
  • to_toon: CC=12 โ†’ CC=8 (5 helper functions)
  • CLI scan(): fan-out=18 โ†’ โ‰ค10 (4 helper functions)
  • Code quality: 0 high-complexity functions
  • Test coverage: 64/64 tests passing (100%)

๐Ÿ“Š Quality Metrics

  • Health status: โœ… HEALTHY (no critical issues)
  • Cyclomatic complexity: CCฬ„=3.5 (target โ‰ค 3.0 achieved)
  • Maximum CC: 9 (target โ‰ค 10 achieved)
  • Code maintainability: Significantly improved
  • Duplication: Minimal (2 groups, 6 lines - acceptable patterns)

๐Ÿ”ง Code Architecture

  • Dispatch tables for extensible AST processing
  • Single responsibility functions throughout codebase
  • Clean separation of concerns in CLI pipeline
  • Type safety improvements with proper annotations
  • Error handling enhanced for edge cases

Integration with wronai Toolchain

reDUP is part of the wronai developer toolchain:

  • code2llm โ€” static analysis engine (health diagnostics, complexity)
  • reDUP โ€” deep duplication analysis and refactoring planning
  • code2docs โ€” automatic documentation generation
  • vallm โ€” validation of LLM-generated code proposals

๐Ÿ“ˆ Typical workflow:

  1. code2llm analyzes the project โ†’ .toon diagnostics
  2. redup finds duplicates โ†’ duplication.toon.yaml
  3. Feed both to an LLM for targeted refactoring
  4. vallm validates the LLM's proposals before merging

๐ŸŽฏ Why reDUP?

  • LLM-ready: TOON format optimized for LLM consumption
  • Actionable: Generates concrete refactoring suggestions
  • Prioritized: Ranks duplicates by impact and risk
  • Integrated: Works seamlessly with wronai toolchain
  • Fast: Scans 1000+ lines in < 1 second
  • Clean: No syntax warnings, professional output

Development

git clone https://github.com/semcod/redup.git
cd redup
pip install -e ".[dev]"
pytest

License

Licensed under Apache-2.0.

Author

Tom Sapletta

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

redup-0.4.22.tar.gz (119.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

redup-0.4.22-py3-none-any.whl (131.9 kB view details)

Uploaded Python 3

File details

Details for the file redup-0.4.22.tar.gz.

File metadata

  • Download URL: redup-0.4.22.tar.gz
  • Upload date:
  • Size: 119.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for redup-0.4.22.tar.gz
Algorithm Hash digest
SHA256 3ea827eaaa41c550abab4a79018f5a3198615f865204d27e2080c735ceed086f
MD5 cccad17251e1be4a03c7c5a2ca5f57ba
BLAKE2b-256 f1422daf8cdc0160dc5791584fe4435f204174b2dcda3b7569e2bac4c111f242

See more details on using hashes here.

File details

Details for the file redup-0.4.22-py3-none-any.whl.

File metadata

  • Download URL: redup-0.4.22-py3-none-any.whl
  • Upload date:
  • Size: 131.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for redup-0.4.22-py3-none-any.whl
Algorithm Hash digest
SHA256 04602349d077cdcc4b9bc5e7e744b946c8e18c875fc0510553323ffae83d3789
MD5 3d2d9abcdcd9d6ce2c4b839ea407df03
BLAKE2b-256 ea231d59721bc8823407ef4abc2312ed8f341646ca6636b91a0df2684d4debd7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page