Automated code duplicate detection and refactoring library
Project description
Recator ๐ง
Recator - Automated code duplicate detection and refactoring library for Python
๐ Overview
Recator is a powerful Python library that automatically detects and refactors code duplicates across multiple programming languages using simple heuristics without requiring LLMs. It works efficiently on CPU and supports various programming languages including Python, JavaScript, Java, C/C++, and more.
โจ Features
- Multi-language Support: Python, JavaScript, Java, C/C++, C#, PHP, Ruby, Go, Rust, Kotlin, Swift
- Multiple Detection Algorithms:
- Exact duplicate detection (hash-based)
- Token-based similarity detection
- Fuzzy matching using sequence comparison
- Structural similarity detection (same structure, different names)
- Automated Refactoring Strategies:
- Extract Method - for duplicate code blocks
- Extract Class - for structural duplicates
- Extract Module - for file-level duplicates
- Parameterize - for similar code with differences
- Safe Mode: Creates
.refactoredversions without modifying originals - CPU Efficient: Uses simple heuristics, no GPU or LLM required
- Configurable: Adjustable thresholds and parameters
๐ Installation
# Basic installation
pip install recator
# Or install from source
git clone https://github.com/pyfunc/recator.git
cd recator
pip install -e .
# Install with development dependencies
pip install -e ".[dev]"
# Install with advanced features
pip install -e ".[advanced]"
๐ Usage
Command Line Interface
# Basic analysis
recator /path/to/project
# Verbose analysis with custom parameters
recator /path/to/project -v --min-lines 6 --threshold 0.9
# Preview refactoring suggestions
recator /path/to/project --refactor
# Apply refactoring (creates .refactored files)
recator /path/to/project --refactor --apply
# Analyze specific languages only
recator /path/to/project --languages python javascript
# Exclude patterns
recator /path/to/project --exclude "*.test.js" "build/*"
# Save results to JSON
recator /path/to/project --output results.json
Python API
from recator import Recator
# Initialize with project path
recator = Recator('/path/to/project')
# Analyze for duplicates
results = recator.analyze()
print(f"Found {results['duplicates_found']} duplicates")
# Get detailed duplicate information
for duplicate in results['duplicates']:
print(f"Type: {duplicate['type']}")
print(f"Files: {duplicate.get('files', [])}")
print(f"Confidence: {duplicate.get('confidence', 0)}")
# Preview refactoring
preview = recator.refactor_duplicates(dry_run=True)
print(f"Estimated LOC reduction: {preview['estimated_loc_reduction']}")
# Apply refactoring
refactoring_results = recator.refactor_duplicates(dry_run=False)
print(f"Modified {len(refactoring_results['modified_files'])} files")
Custom Configuration
from recator import Recator
config = {
'min_lines': 5, # Minimum lines for duplicate
'min_tokens': 40, # Minimum tokens for duplicate
'similarity_threshold': 0.90, # Similarity threshold (0-1)
'languages': ['python', 'java'], # Languages to analyze
'exclude_patterns': ['*.min.js'], # Patterns to exclude
'safe_mode': True, # Don't modify originals
}
recator = Recator('/path/to/project', config)
results = recator.analyze()
๐ Detection Algorithms
1. Exact Duplicate Detection
Finds identical code blocks using hash comparison.
2. Token-based Detection
Compares token sequences to find duplicates that may have different formatting.
3. Fuzzy Matching
Uses sequence matching algorithms to find similar (but not identical) code.
4. Structural Detection
Identifies code with the same structure but different variable/function names.
๐ ๏ธ Refactoring Strategies
Extract Method
# Before: Duplicate blocks in multiple places
def process_user(user):
# validation block (duplicate)
if not user.email:
raise ValueError("Email required")
if "@" not in user.email:
raise ValueError("Invalid email")
# ... processing
def update_user(user):
# validation block (duplicate)
if not user.email:
raise ValueError("Email required")
if "@" not in user.email:
raise ValueError("Invalid email")
# ... updating
# After: Extracted method
def validate_user_email(user):
if not user.email:
raise ValueError("Email required")
if "@" not in user.email:
raise ValueError("Invalid email")
def process_user(user):
validate_user_email(user)
# ... processing
def update_user(user):
validate_user_email(user)
# ... updating
Extract Module
Creates shared modules for file-level duplicates.
Parameterize
Converts similar code with differences into parameterized functions.
๐ Example Output
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ RECATOR - Code Refactoring Bot โ
โ Eliminate Code Duplicates with Ease โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
๐ Initializing Recator for: /home/user/project
๐ Analyzing project for duplicates...
๐ Analysis Results:
โข Total files scanned: 45
โข Files parsed: 42
โข Duplicates found: 8
๐ Duplicate Details:
[1] Type: exact_block
Files: utils.py, helpers.py, validation.py
Confidence: 100%
Lines: 12
[2] Type: fuzzy
Files: api_client.py, http_handler.py
Confidence: 87%
Lines: 25
๐ง Refactoring Preview:
โข Total actions: 8
โข Estimated LOC reduction: 147
โข Affected files: 12
โ
Done!
๐ง Configuration File
Create a recator.json configuration file:
{
"min_lines": 4,
"min_tokens": 30,
"similarity_threshold": 0.85,
"languages": ["python", "javascript", "java"],
"exclude_patterns": [
"*.min.js",
"*.min.css",
"node_modules/*",
".git/*",
"build/*",
"dist/*"
],
"safe_mode": true
}
Use with: recator /path/to/project --config recator.json
๐๏ธ Architecture
recator/
โโโ __init__.py # Main Recator class
โโโ scanner.py # File scanning and reading
โโโ analyzer.py # Code parsing and tokenization
โโโ detector.py # Duplicate detection algorithms
โโโ refactor.py # Refactoring strategies
โโโ cli.py # Command-line interface
๐ Supported Languages
- Python (.py)
- JavaScript/TypeScript (.js, .jsx, .ts, .tsx)
- Java (.java)
- C/C++ (.c, .cpp, .cc, .cxx, .h, .hpp)
- C# (.cs)
- PHP (.php)
- Ruby (.rb)
- Go (.go)
- Rust (.rs)
- Kotlin (.kt)
- Swift (.swift)
โ๏ธ How It Works
- Scanning: Traverses project directory to find source files
- Parsing: Tokenizes and parses code into analyzable structures
- Detection: Applies multiple algorithms to find duplicates
- Analysis: Groups and ranks duplicates by confidence
- Refactoring: Suggests or applies appropriate refactoring strategies
- Output: Generates modified files or preview reports
๐ค Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
๐ License
This project is licensed under the Apache License 2.0.
๐ Acknowledgments
Built using only Python standard library for maximum compatibility and efficiency.
๐ฎ Support
For issues and questions, please open an issue on GitHub.
Made with โค๏ธ for cleaner, more maintainable code
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file recator-0.1.0.tar.gz.
File metadata
- Download URL: recator-0.1.0.tar.gz
- Upload date:
- Size: 22.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e2768a1789756d8434cf229ca87562dddca53b3d08177bb7ecc198cbca38b0f7
|
|
| MD5 |
a10a6f5b96af9bf999c47781e5391809
|
|
| BLAKE2b-256 |
97ea5ee2176fc5054a4e4d6dbcf90d9d56cc7c28d2f470fa11d9ee5e09ae0ab7
|
File details
Details for the file recator-0.1.0-py3-none-any.whl.
File metadata
- Download URL: recator-0.1.0-py3-none-any.whl
- Upload date:
- Size: 20.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
adb2a9e5c19f8f719081f973b779539073450aa7b69a29b2e02bb77130600b5c
|
|
| MD5 |
c38a2e8319f848ab30e9c89c9f0d828b
|
|
| BLAKE2b-256 |
413abb4f898e306deb0866039a8e49fd730504af498a4f70e39dff562e139986
|