A Rust implementation of Python's difflib.unified_diff with PyO3 bindings
Project description
difflib-rs
A high-performance Rust implementation of Python's difflib.unified_diff function with PyO3 bindings.
Overview
This package provides a Rust-based implementation of the unified diff algorithm, offering significant performance improvements over Python's built-in difflib module while maintaining API compatibility.
Features
- 🚀 3-5x Faster: Consistently outperforms Python's difflib across all file sizes and change patterns (see Performance section for detailed benchmarks)
- 100% Compatible: Drop-in replacement for
difflib.unified_diffwith identical output - Thoroughly Tested: Comprehensive test suite ensuring byte-for-byte compatibility with Python's implementation
- Easy to use: Simple Python API with PyO3 bindings
Installation
From pip
pip install difflib-rs
Build from source
# Clone the repository
git clone https://github.com/sweepai/difflib-rs.git
cd difflib-rs
# Set up virtual environment
python -m venv venv
source venv/bin/activate
# Install build dependencies
pip install maturin pytest
# Build and install
maturin develop --release
Usage
This is a drop-in replacement for Python's difflib.unified_diff. Simply replace your import:
- from difflib import unified_diff
+ from difflib_rs import unified_diff
# Compare two sequences of lines
a = ['line1', 'line2', 'line3']
b = ['line1', 'modified', 'line3']
diff = unified_diff(
a, b,
fromfile='original.txt',
tofile='modified.txt',
fromfiledate='2023-01-01',
tofiledate='2023-01-02'
)
for line in diff:
print(line, end='')
Note: Currently, only unified_diff is supported. Other difflib functions are not implemented, but pull requests are welcome!
Most agents (including Sweep) can add support for any other methods if needed. A copy of the Python implementation is provided in src/difflib.py for reference.
Extra: String-based API
For additional convenience, use unified_diff_str directly with (unsplit) strings:
from difflib_rs import unified_diff_str
# Compare two strings directly - no need to split first!
text_a = """line1
line2
line3"""
text_b = """line1
modified
line3"""
# The function handles splitting internally (more efficient)
diff = unified_diff_str(
text_a, text_b,
fromfile='original.txt',
tofile='modified.txt',
keepends=False # Whether to keep line endings in the diff
)
for line in diff:
print(line, end='')
The unified_diff_str function:
- Takes strings directly instead of lists
- Handles line splitting internally in Rust (faster than Python's
splitlines()) - Supports
\n,\r\n, and\rline endings - Has a
keependsparameter to preserve line endings in the output
Performance
The Rust implementation consistently outperforms Python's built-in difflib module while producing identical output:
Benchmark Results (Baseline - HashMap Implementation)
Small to Medium Files (10% changes)
| File Size | Python Time | Rust Time | Speedup | Output Lines |
|---|---|---|---|---|
| 100 lines | 86.0μs | 38.3μs | 2.24x | 71 |
| 500 lines | 450.6μs | 130.3μs | 3.46x | 300 |
| 1,000 lines | 910.2μs | 220.8μs | 4.12x | 587 |
| 2,000 lines | 2203.1μs | 482.3μs | 4.57x | 1,222 |
Files with Heavy Changes (50% changes)
| File Size | Python Time | Rust Time | Speedup | Output Lines |
|---|---|---|---|---|
| 100 lines | 167.9μs | 49.3μs | 3.41x | 131 |
| 500 lines | 1028.5μs | 252.0μs | 4.08x | 655 |
| 1,000 lines | 1925.0μs | 414.3μs | 4.65x | 1,285 |
Large Files with Few Changes
| File Size | Changes | Python Time | Rust Time | Speedup | Output Lines |
|---|---|---|---|---|---|
| 5,000 lines | 5 | 2842.0μs | 859.7μs | 3.31x | 47 |
| 10,000 lines | 5 | 5003.2μs | 1471.3μs | 3.40x | 47 |
| 20,000 lines | 5 | 8470.5μs | 2821.6μs | 3.00x | 47 |
Large Files with Medium Changes (5% changed)
| File Size | Changes | Python Time | Rust Time | Speedup | Output Lines |
|---|---|---|---|---|---|
| 5,000 lines | 250 | 7985.5μs | 1579.4μs | 5.06x | 1,869 |
| 10,000 lines | 500 | 14692.5μs | 2833.8μs | 5.18x | 3,793 |
| 20,000 lines | 1,000 | 34949.0μs | 6461.2μs | 5.41x | 7,569 |
Special Cases
| Test Case | Python Time | Rust Time | Speedup |
|---|---|---|---|
| Identical sequences (5,000 lines) | 1773.1μs | 406.1μs | 4.37x |
| Completely different (1,000 lines) | 284.5μs | 219.8μs | 1.29x |
String Splitting Performance
Performance comparison of unified_diff_str vs unified_diff with Python splitlines():
| File Size | Python split + Rust diff | All Rust (unified_diff_str) |
Speedup |
|---|---|---|---|
| 100 lines | 54.8μs | 21.1μs | 2.59x |
| 500 lines | 169.9μs | 118.3μs | 1.44x |
| 1000 lines | 316.1μs | 248.3μs | 1.27x |
| 2000 lines | 654.8μs | 550.4μs | 1.19x |
API
def unified_diff(a, b, fromfile='', tofile='', fromfiledate='', tofiledate='', n=3, lineterm='\n'):
"""
Compare two sequences of lines; generate the unified diff.
Unified diffs are a compact way of showing line changes and a few
lines of context. The number of context lines is set by n which
defaults to three.
Parameters:
a: Sequence of lines to compare (the 'from' file)
b: Sequence of lines to compare (the 'to' file)
fromfile: Label to use for the 'from' file in the diff header
tofile: Label to use for the 'to' file in the diff header
fromfiledate: Modification date of the 'from' file
tofiledate: Modification date of the 'to' file
n: Number of context lines (default: 3)
lineterm: Line terminator to use (default: '\n')
Returns:
Generator yielding unified diff format strings
Note: This is a high-performance Rust implementation that provides
3-5x speedup over Python's difflib while maintaining 100% compatibility.
"""
pass
Development
# Activate virtual environment
source venv/bin/activate
# Run tests
python -m pytest tests/ -v
# Run benchmarks
python -m pytest tests/test_benchmark.py -s
# Build the package with optimizations
maturin develop --release
Contributing
If you want a feature or have an idea, just create a pull request! Contributions are welcome.
Author
Everything in this project was written by Sweep AI, an AI agent for Jetbrains IDEs.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file difflib_rs-0.1.2-cp310-abi3-macosx_11_0_arm64.whl.
File metadata
- Download URL: difflib_rs-0.1.2-cp310-abi3-macosx_11_0_arm64.whl
- Upload date:
- Size: 250.1 kB
- Tags: CPython 3.10+, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
70812efd99b7f4fc53b0207985111a534834011b95e7c1d2b97e410eeba8e8c9
|
|
| MD5 |
7a27fda20a68bce84a3f48f257b797db
|
|
| BLAKE2b-256 |
cc3a5407b632de4eda7d373dabf42dd6b0b949503135a92e4b2cd353061fd08c
|