A Rust implementation of Python's difflib.unified_diff with PyO3 bindings
Project description
difflib-rs
A high-performance Rust implementation of Python's difflib.unified_diff function with PyO3 bindings.
Overview
This package provides a Rust-based implementation of the unified diff algorithm, offering significant performance improvements over Python's built-in difflib module while maintaining API compatibility.
Features
- 🚀 3-5x Faster: Consistently outperforms Python's difflib across all file sizes and change patterns
- 100% Compatible: Drop-in replacement for
difflib.unified_diffwith identical output - Thoroughly Tested: Comprehensive test suite ensuring byte-for-byte compatibility with Python's implementation
- Easy to use: Simple Python API with PyO3 bindings
Performance
The Rust implementation consistently outperforms Python's built-in difflib module while producing identical output:
Benchmark Results (Baseline - HashMap Implementation)
Small to Medium Files (10% changes)
| File Size | Python Time | Rust Time | Speedup | Output Lines |
|---|---|---|---|---|
| 100 lines | 0.0001s | 0.00003s | 2.96x | 71 |
| 500 lines | 0.0005s | 0.00013s | 3.89x | 300 |
| 1,000 lines | 0.0011s | 0.00024s | 4.65x | 587 |
| 2,000 lines | 0.0023s | 0.00056s | 4.14x | 1,222 |
Files with Heavy Changes (50% changes)
| File Size | Python Time | Rust Time | Speedup | Output Lines |
|---|---|---|---|---|
| 100 lines | 0.0001s | 0.00002s | 5.72x | 131 |
| 500 lines | 0.0009s | 0.00019s | 4.62x | 655 |
| 1,000 lines | 0.0017s | 0.00036s | 4.76x | 1,285 |
Large Files with Few Changes
| File Size | Changes | Python Time | Rust Time | Speedup | Output Lines |
|---|---|---|---|---|---|
| 5,000 lines | 5 | 0.0025s | 0.00073s | 3.43x | 47 |
| 10,000 lines | 5 | 0.0051s | 0.00148s | 3.44x | 47 |
| 20,000 lines | 5 | 0.0086s | 0.00295s | 2.92x | 47 |
Large Files with Medium Changes (5% changed)
| File Size | Changes | Python Time | Rust Time | Speedup | Output Lines |
|---|---|---|---|---|---|
| 5,000 lines | 250 | 0.0068s | 0.00138s | 4.91x | 1,869 |
| 10,000 lines | 500 | 0.0146s | 0.00285s | 5.12x | 3,793 |
| 20,000 lines | 1,000 | 0.0357s | 0.00680s | 5.25x | 7,569 |
Special Cases
| Test Case | Python Time | Rust Time | Speedup |
|---|---|---|---|
| Identical sequences (5,000 lines) | 0.00145s | 0.00033s | 4.43x |
| Completely different (1,000 lines) | 0.00030s | 0.00020s | 1.50x |
Key Optimizations
The performance improvements come from:
- FxHashMap (Firefox's fast hash) instead of Python's dict for sparse representation
- Efficient HashMap swapping to avoid allocations (using
std::mem::swap) - Queue-based matching algorithm for better cache locality
- Optimized string operations leveraging Rust's zero-cost abstractions
- Popularity heuristic to skip overly common elements (matches Python's algorithm)
Installation
From TestPyPI (current release)
pip install -i https://test.pypi.org/simple/ difflib-rs
Build from source
# Clone the repository
git clone https://github.com/sweepai/difflib-rs.git
cd difflib-rs
# Set up virtual environment
python -m venv venv
source venv/bin/activate
# Install build dependencies
pip install maturin pytest
# Build and install
maturin develop --release
Usage
from difflib_rs import unified_diff
# Compare two sequences of lines
a = ['line1', 'line2', 'line3']
b = ['line1', 'modified', 'line3']
diff = unified_diff(
a, b,
fromfile='original.txt',
tofile='modified.txt',
fromfiledate='2023-01-01',
tofiledate='2023-01-02'
)
for line in diff:
print(line, end='')
API
The unified_diff function accepts the same parameters as Python's difflib.unified_diff:
a,b: Sequences of lines to comparefromfile,tofile: Filenames for the diff headerfromfiledate,tofiledate: File modification datesn: Number of context lines (default: 3)lineterm: Line terminator (default: '\n')
Development
# Activate virtual environment
source venv/bin/activate
# Run tests
python -m pytest tests/ -v
# Run benchmarks
python -m pytest tests/test_benchmark.py -s
# Build the package with optimizations
maturin develop --release
Author
Everything in this project was written by Sweep AI, an AI agent for Jetbrains IDEs.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file difflib_rs-0.1.0-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: difflib_rs-0.1.0-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 254.9 kB
- Tags: CPython 3.10+, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
07d1b2cb45129bc6d2a23787cea53efb254f31ac9f1d14e4eacdfdd727558d86
|
|
| MD5 |
b006a5e55af388c35ed6643a32ffbf3f
|
|
| BLAKE2b-256 |
ee107615e1541027e09e89e76b21e1466362e13a21d53e148f0920d004514d9f
|
File details
Details for the file difflib_rs-0.1.0-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.
File metadata
- Download URL: difflib_rs-0.1.0-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
- Upload date:
- Size: 251.6 kB
- Tags: CPython 3.10+, manylinux: glibc 2.17+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7a5394a344a3af11251216722fe0c2457be106681845cc465162aab878684182
|
|
| MD5 |
d9ef88b4f25f23c43708f3eeaa39f511
|
|
| BLAKE2b-256 |
f835475e553e4c5efb72d982f911b08f08f270d3507860677d1847ce8b68dada
|
File details
Details for the file difflib_rs-0.1.0-cp310-abi3-macosx_11_0_arm64.whl.
File metadata
- Download URL: difflib_rs-0.1.0-cp310-abi3-macosx_11_0_arm64.whl
- Upload date:
- Size: 241.3 kB
- Tags: CPython 3.10+, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c2f4c0d6526a72a86220b6fb884d503a9a1640523e61d577d68a922c80415975
|
|
| MD5 |
1d1691fd14960fe37df5f6ed8d70501a
|
|
| BLAKE2b-256 |
f6c857b897d71cf9ab65349fe9bda90452d468e319451443418bcee613853969
|