Code duplication analyzer and refactoring planner for LLMs

These details have not been verified by PyPI

Project links

Project description

reDUP

Code duplication analyzer and refactoring planner for LLMs.

reDUP scans codebases for duplicated functions, blocks, and structural patterns — then builds a prioritized refactoring map that LLMs can consume to eliminate redundancy systematically.

Features

Exact duplicate detection via SHA-256 block hashing
Structural clone detection — same AST shape, different variable names
Fuzzy near-duplicate matching via SequenceMatcher / rapidfuzz
Function-level analysis using Python AST extraction
Impact scoring — prioritizes duplicates by saved_lines × similarity
Refactoring planner — generates concrete extract/inline suggestions
Three output formats: JSON (tooling), YAML (humans), TOON (LLMs)
CLI with typer + rich for interactive use
Clean output — no syntax warnings from external libraries
Optimized performance — reduced complexity and improved maintainability

Installation

pip install redup

With optional dependencies:

pip install redup[all]       # Everything
pip install redup[fuzzy]     # rapidfuzz for better similarity matching
pip install redup[ast]       # tree-sitter for multi-language AST
pip install redup[lsh]       # datasketch for LSH near-duplicate detection

Quick Start

CLI

# Scan current directory, output TOON to stdout
redup scan .

# Scan with JSON output saved to file
redup scan ./src --format json --output ./reports/

# Scan with all formats
redup scan . --format all --output ./redup_output/

# Only function-level duplicates (faster)
redup scan . --functions-only

# Custom thresholds
redup scan . --min-lines 5 --min-sim 0.9

# Show installed optional dependencies
redup info

Python API

from pathlib import Path
from redup import ScanConfig, analyze
from redup.reporters.toon_reporter import to_toon
from redup.reporters.json_reporter import to_json

config = ScanConfig(
    root=Path("./my_project"),
    extensions=[".py"],
    min_block_lines=3,
    min_similarity=0.85,
)

result = analyze(config=config, function_level_only=True)

print(f"Found {result.total_groups} duplicate groups")
print(f"Lines recoverable: {result.total_saved_lines}")

# For LLM consumption
print(to_toon(result))

# For tooling / CI
Path("duplication.json").write_text(to_json(result))

Output Formats

TOON (LLM-optimized)

# redup/duplication | 3 groups | 12f 4200L | 2026-03-22

SUMMARY:
  files_scanned: 12
  total_lines:   4200
  dup_groups:    3
  saved_lines:   84

DUPLICATES[3] (ranked by impact):
  [E0001] !! EXAC  calculate_tax  L=8 N=3 saved=16 sim=1.00
      billing.py:1-8  (calculate_tax)
      shipping.py:1-8  (calculate_tax)
      returns.py:1-8  (calculate_tax)

REFACTOR[1] (ranked by priority):
  [1] ○ extract_function   → utils/calculate_tax.py
      WHY: 3 occurrences of 8-line block across 3 files — saves 16 lines
      FILES: billing.py, shipping.py, returns.py

JSON (machine-readable)

{
  "summary": {
    "total_groups": 3,
    "total_saved_lines": 84
  },
  "groups": [
    {
      "id": "E0001",
      "type": "exact",
      "normalized_name": "calculate_tax",
      "fragments": [
        {"file": "billing.py", "line_start": 1, "line_end": 8},
        {"file": "shipping.py", "line_start": 1, "line_end": 8}
      ],
      "saved_lines_potential": 16
    }
  ],
  "refactor_suggestions": [
    {
      "priority": 1,
      "action": "extract_function",
      "new_module": "utils/calculate_tax.py",
      "risk_level": "low"
    }
  ]
}

Architecture

src/redup/
├── __init__.py            # Public API
├── __main__.py            # python -m redup
├── core/
│   ├── models.py          # Pydantic data models
│   ├── scanner.py         # File discovery + block extraction
│   ├── hasher.py          # SHA-256 / structural fingerprinting
│   ├── matcher.py         # Fuzzy similarity comparison
│   ├── planner.py         # Refactoring suggestion generator
│   └── pipeline.py        # Orchestrator: scan → hash → match → plan
├── reporters/
│   ├── json_reporter.py   # JSON output
│   ├── yaml_reporter.py   # YAML output
│   └── toon_reporter.py   # TOON output (LLM-optimized)
└── cli_app/
    └── main.py            # Typer CLI

Analysis Pipeline

1. SCAN      Walk project, read files, extract function-level + sliding-window blocks
2. HASH      Generate exact (SHA-256) and structural (normalized AST) fingerprints
3. GROUP     Bucket by hash, keep only groups with 2+ blocks from different locations
4. MATCH     Verify candidates with fuzzy similarity (SequenceMatcher / rapidfuzz)
5. DEDUP     Remove overlapping groups (keep highest-impact)
6. PLAN      Generate prioritized refactoring suggestions with risk assessment
7. REPORT    Export to JSON / YAML / TOON

Recent Improvements (v0.1.8)

🎯 Complexity Reduction

Reduced cyclomatic complexity from CC̄=4.8 to CC̄=4.4
Eliminated high-complexity functions (CC > 15)
Modularized analyze() function into 7 focused helpers
Refactored _ast_to_normalized_string() into 3 specialized functions
Improved code maintainability and testability

🚀 Performance & UX

Clean output — no syntax warnings from external libraries
Optimized imports and code organization
Enhanced error handling for edge cases
Better type hints with Callable[[str], str] patterns
Streamlined path operations using os.path.commonpath

📊 Quality Metrics

Health status: ✅ HEALTHY (no critical issues)
Test coverage: 64/64 tests passing
Code quality: 0 high-complexity functions
Duplication: Minimal (2 groups, 6 lines)

Integration with wronai Toolchain

reDUP is part of the wronai developer toolchain:

code2llm — static analysis engine (health diagnostics, complexity)
reDUP — deep duplication analysis and refactoring planning
code2docs — automatic documentation generation
vallm — validation of LLM-generated code proposals

📈 Typical workflow:

code2llm analyzes the project → .toon diagnostics
redup finds duplicates → duplication.toon
Feed both to an LLM for targeted refactoring
vallm validates the LLM's proposals before merging

🎯 Why reDUP?

LLM-ready: TOON format optimized for LLM consumption
Actionable: Generates concrete refactoring suggestions
Prioritized: Ranks duplicates by impact and risk
Integrated: Works seamlessly with wronai toolchain
Fast: Scans 1000+ lines in < 1 second
Clean: No syntax warnings, professional output

Development

git clone https://github.com/semcod/redup.git
cd redup
pip install -e ".[dev]"
pytest

License

Apache License 2.0 - see LICENSE for details.

Author

Created by Tom Sapletta - tom@sapletta.com

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.4.22

Apr 16, 2026

0.4.21

Apr 16, 2026

0.4.20

Apr 16, 2026

0.4.19

Apr 16, 2026

0.4.18

Apr 8, 2026

0.4.17

Apr 8, 2026

0.4.16

Apr 8, 2026

0.4.15

Mar 25, 2026

0.4.14

Mar 25, 2026

0.4.13

Mar 25, 2026

0.4.12

Mar 25, 2026

0.4.11

Mar 25, 2026

0.4.10

Mar 25, 2026

0.4.9

Mar 25, 2026

0.4.8

Mar 25, 2026

0.4.7

Mar 25, 2026

0.4.6

Mar 25, 2026

0.4.5

Mar 25, 2026

0.4.4

Mar 24, 2026

0.4.3

Mar 24, 2026

0.4.2

Mar 24, 2026

0.4.1

Mar 23, 2026

0.3.31

Mar 23, 2026

0.3.30

Mar 23, 2026

0.3.29

Mar 23, 2026

0.3.28

Mar 23, 2026

0.3.27

Mar 23, 2026

0.3.26

Mar 23, 2026

0.3.25

Mar 23, 2026

0.3.24

Mar 23, 2026

0.3.23

Mar 23, 2026

0.3.22

Mar 23, 2026

0.3.21

Mar 23, 2026

0.3.20

Mar 23, 2026

0.3.19

Mar 23, 2026

0.3.18

Mar 23, 2026

0.3.17

Mar 23, 2026

0.3.16

Mar 23, 2026

0.3.15

Mar 23, 2026

0.3.14

Mar 23, 2026

0.3.13

Mar 23, 2026

0.3.12

Mar 23, 2026

0.3.11

Mar 23, 2026

0.3.10

Mar 23, 2026

0.3.9

Mar 23, 2026

0.3.8

Mar 23, 2026

0.3.7

Mar 23, 2026

0.3.6

Mar 23, 2026

0.3.5

Mar 23, 2026

0.3.4

Mar 23, 2026

0.3.3

Mar 23, 2026

0.3.2

Mar 23, 2026

0.3.1

Mar 23, 2026

0.2.4

Mar 23, 2026

0.2.2

Mar 23, 2026

0.2.1

Mar 23, 2026

This version

0.1.10

Mar 23, 2026

0.1.9

Mar 23, 2026

0.1.8

Mar 23, 2026

0.1.6

Mar 23, 2026

0.1.5

Mar 23, 2026

0.1.4

Mar 23, 2026

0.1.2

Mar 23, 2026

0.1.1

Mar 22, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

redup-0.1.10.tar.gz (31.1 kB view details)

Uploaded Mar 23, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

redup-0.1.10-py3-none-any.whl (26.1 kB view details)

Uploaded Mar 23, 2026 Python 3

File details

Details for the file redup-0.1.10.tar.gz.

File metadata

Download URL: redup-0.1.10.tar.gz
Upload date: Mar 23, 2026
Size: 31.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for redup-0.1.10.tar.gz
Algorithm	Hash digest
SHA256	`d48bc2a32010d89e97d215e3427208ceb4bd35338987a146c1053de86ffa0e74`
MD5	`93d183dfffc74820379d1eac06df0134`
BLAKE2b-256	`a04989095ea919bf8baf221009ecc872aada3170b4093990f912fa00470570d6`

See more details on using hashes here.

File details

Details for the file redup-0.1.10-py3-none-any.whl.

File metadata

Download URL: redup-0.1.10-py3-none-any.whl
Upload date: Mar 23, 2026
Size: 26.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for redup-0.1.10-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e54d0c397c91936519faee2fed35958ab5bac4aaaf3f1b8385d853aafa305f77`
MD5	`29fb4fbe1694fee02ed4513aa8993233`
BLAKE2b-256	`cef65550db785a3bb834556871e32898a918c3b740f921a35cc828842b96b6c1`

See more details on using hashes here.

redup 0.1.10

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

reDUP

Features

Installation

Quick Start

CLI

Python API

Output Formats

TOON (LLM-optimized)

JSON (machine-readable)

Architecture

Analysis Pipeline

Recent Improvements (v0.1.8)

🎯 Complexity Reduction

🚀 Performance & UX

📊 Quality Metrics

Integration with wronai Toolchain

📈 Typical workflow:

🎯 Why reDUP?

Development

License

Author

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes