Skip to main content

A Rust implementation of Python's difflib.unified_diff with PyO3 bindings

Project description

difflib-rs

A high-performance Rust implementation of Python's difflib.unified_diff function with PyO3 bindings.

Overview

This package provides a Rust-based implementation of the unified diff algorithm, offering significant performance improvements over Python's built-in difflib module while maintaining API compatibility.

Features

  • 🚀 3-5x Faster: Consistently outperforms Python's difflib across all file sizes and change patterns (see Performance section for detailed benchmarks)
  • 100% Compatible: Drop-in replacement for difflib.unified_diff with identical output
  • Thoroughly Tested: Comprehensive test suite ensuring byte-for-byte compatibility with Python's implementation
  • Easy to use: Simple Python API with PyO3 bindings

Installation

From pip

pip install difflib-rs

Build from source

# Clone the repository
git clone https://github.com/sweepai/difflib-rs.git
cd difflib-rs

# Set up virtual environment
python -m venv venv
source venv/bin/activate

# Install build dependencies
pip install maturin pytest

# Build and install
maturin develop --release

Usage

This is a drop-in replacement for Python's difflib.unified_diff. Simply replace your import:

- from difflib import unified_diff
+ from difflib_rs import unified_diff

# Compare two sequences of lines
a = ['line1', 'line2', 'line3']
b = ['line1', 'modified', 'line3']

diff = unified_diff(
    a, b,
    fromfile='original.txt',
    tofile='modified.txt',
    fromfiledate='2023-01-01',
    tofiledate='2023-01-02'
)

for line in diff:
    print(line, end='')

Note: Currently, only unified_diff is supported. Other difflib functions are not implemented, but pull requests are welcome!

Most agents (including Sweep) can add support for any other methods if needed. A copy of the Python implementation is provided in src/difflib.py for reference.

Extra: String-based API

For additional convenience, use unified_diff_str directly with (unsplit) strings:

from difflib_rs import unified_diff_str

# Compare two strings directly - no need to split first!
text_a = """line1
line2
line3"""

text_b = """line1
modified
line3"""

# The function handles splitting internally (more efficient)
diff = unified_diff_str(
    text_a, text_b,
    fromfile='original.txt',
    tofile='modified.txt',
    keepends=False  # Whether to keep line endings in the diff
)

for line in diff:
    print(line, end='')

The unified_diff_str function:

  • Takes strings directly instead of lists
  • Handles line splitting internally in Rust (faster than Python's splitlines())
  • Supports \n, \r\n, and \r line endings
  • Has a keepends parameter to preserve line endings in the output

Performance

The Rust implementation consistently outperforms Python's built-in difflib module while producing identical output:

Benchmark Results (Baseline - HashMap Implementation)

Small to Medium Files (10% changes)

File Size Python Time Rust Time Speedup Output Lines
100 lines 86.0μs 38.3μs 2.24x 71
500 lines 450.6μs 130.3μs 3.46x 300
1,000 lines 910.2μs 220.8μs 4.12x 587
2,000 lines 2203.1μs 482.3μs 4.57x 1,222

Files with Heavy Changes (50% changes)

File Size Python Time Rust Time Speedup Output Lines
100 lines 167.9μs 49.3μs 3.41x 131
500 lines 1028.5μs 252.0μs 4.08x 655
1,000 lines 1925.0μs 414.3μs 4.65x 1,285

Large Files with Few Changes

File Size Changes Python Time Rust Time Speedup Output Lines
5,000 lines 5 2842.0μs 859.7μs 3.31x 47
10,000 lines 5 5003.2μs 1471.3μs 3.40x 47
20,000 lines 5 8470.5μs 2821.6μs 3.00x 47

Large Files with Medium Changes (5% changed)

File Size Changes Python Time Rust Time Speedup Output Lines
5,000 lines 250 7985.5μs 1579.4μs 5.06x 1,869
10,000 lines 500 14692.5μs 2833.8μs 5.18x 3,793
20,000 lines 1,000 34949.0μs 6461.2μs 5.41x 7,569

Special Cases

Test Case Python Time Rust Time Speedup
Identical sequences (5,000 lines) 1773.1μs 406.1μs 4.37x
Completely different (1,000 lines) 284.5μs 219.8μs 1.29x

String Splitting Performance

Performance comparison of unified_diff_str vs unified_diff with Python splitlines():

File Size Python split + Rust diff All Rust (unified_diff_str) Speedup
100 lines 54.8μs 21.1μs 2.59x
500 lines 169.9μs 118.3μs 1.44x
1000 lines 316.1μs 248.3μs 1.27x
2000 lines 654.8μs 550.4μs 1.19x

API

def unified_diff(a, b, fromfile='', tofile='', fromfiledate='', tofiledate='', n=3, lineterm='\n'):
    """
    Compare two sequences of lines; generate the unified diff.
    
    Unified diffs are a compact way of showing line changes and a few
    lines of context. The number of context lines is set by n which
    defaults to three.
    
    Parameters:
        a: Sequence of lines to compare (the 'from' file)
        b: Sequence of lines to compare (the 'to' file)
        fromfile: Label to use for the 'from' file in the diff header
        tofile: Label to use for the 'to' file in the diff header
        fromfiledate: Modification date of the 'from' file
        tofiledate: Modification date of the 'to' file
        n: Number of context lines (default: 3)
        lineterm: Line terminator to use (default: '\n')
    
    Returns:
        Generator yielding unified diff format strings
    
    Note: This is a high-performance Rust implementation that provides
    3-5x speedup over Python's difflib while maintaining 100% compatibility.
    """
    pass

Development

# Activate virtual environment
source venv/bin/activate

# Run tests
python -m pytest tests/ -v

# Run benchmarks
python -m pytest tests/test_benchmark.py -s

# Build the package with optimizations
maturin develop --release

Contributing

If you want a feature or have an idea, just create a pull request! Contributions are welcome.

Author

Everything in this project was written by Sweep AI, an AI agent for Jetbrains IDEs.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

difflib_rs-0.1.3-cp310-abi3-macosx_11_0_arm64.whl (250.8 kB view details)

Uploaded CPython 3.10+macOS 11.0+ ARM64

File details

Details for the file difflib_rs-0.1.3-cp310-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for difflib_rs-0.1.3-cp310-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 4faf0984c1821288fcb0b3435d8afaa7df3b3af61e209f861186f35497789323
MD5 5529b2503a55e497da33af9a6ed1d765
BLAKE2b-256 e5790cb8be423cba7916f551534ea0651a0177c34dcd1d70657b9968b63b3c1a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page