Skip to main content

A Rust implementation of Python's difflib.unified_diff with PyO3 bindings

Project description

difflib-rs

A high-performance Rust implementation of Python's difflib.unified_diff function with PyO3 bindings.

Overview

This package provides a Rust-based implementation of the unified diff algorithm, offering significant performance improvements over Python's built-in difflib module while maintaining API compatibility.

Features

  • 🚀 3-5x Faster: Consistently outperforms Python's difflib across all file sizes and change patterns
  • 100% Compatible: Drop-in replacement for difflib.unified_diff with identical output
  • Thoroughly Tested: Comprehensive test suite ensuring byte-for-byte compatibility with Python's implementation
  • Easy to use: Simple Python API with PyO3 bindings

Performance

The Rust implementation consistently outperforms Python's built-in difflib module while producing identical output:

Benchmark Results (Baseline - HashMap Implementation)

Small to Medium Files (10% changes)

File Size Python Time Rust Time Speedup Output Lines
100 lines 0.0001s 0.00003s 2.96x 71
500 lines 0.0005s 0.00013s 3.89x 300
1,000 lines 0.0011s 0.00024s 4.65x 587
2,000 lines 0.0023s 0.00056s 4.14x 1,222

Files with Heavy Changes (50% changes)

File Size Python Time Rust Time Speedup Output Lines
100 lines 0.0001s 0.00002s 5.72x 131
500 lines 0.0009s 0.00019s 4.62x 655
1,000 lines 0.0017s 0.00036s 4.76x 1,285

Large Files with Few Changes

File Size Changes Python Time Rust Time Speedup Output Lines
5,000 lines 5 0.0025s 0.00073s 3.43x 47
10,000 lines 5 0.0051s 0.00148s 3.44x 47
20,000 lines 5 0.0086s 0.00295s 2.92x 47

Large Files with Medium Changes (5% changed)

File Size Changes Python Time Rust Time Speedup Output Lines
5,000 lines 250 0.0068s 0.00138s 4.91x 1,869
10,000 lines 500 0.0146s 0.00285s 5.12x 3,793
20,000 lines 1,000 0.0357s 0.00680s 5.25x 7,569

Special Cases

Test Case Python Time Rust Time Speedup
Identical sequences (5,000 lines) 0.00145s 0.00033s 4.43x
Completely different (1,000 lines) 0.00030s 0.00020s 1.50x

Key Optimizations

The performance improvements come from:

  • FxHashMap (Firefox's fast hash) instead of Python's dict for sparse representation
  • Efficient HashMap swapping to avoid allocations (using std::mem::swap)
  • Queue-based matching algorithm for better cache locality
  • Optimized string operations leveraging Rust's zero-cost abstractions
  • Popularity heuristic to skip overly common elements (matches Python's algorithm)

Installation

From TestPyPI (current release)

pip install -i https://test.pypi.org/simple/ difflib-rs

Build from source

# Clone the repository
git clone https://github.com/sweepai/difflib-rs.git
cd difflib-rs

# Set up virtual environment
python -m venv venv
source venv/bin/activate

# Install build dependencies
pip install maturin pytest

# Build and install
maturin develop --release

Usage

from difflib_rs import unified_diff

# Compare two sequences of lines
a = ['line1', 'line2', 'line3']
b = ['line1', 'modified', 'line3']

diff = unified_diff(
    a, b,
    fromfile='original.txt',
    tofile='modified.txt',
    fromfiledate='2023-01-01',
    tofiledate='2023-01-02'
)

for line in diff:
    print(line, end='')

API

The unified_diff function accepts the same parameters as Python's difflib.unified_diff:

  • a, b: Sequences of lines to compare
  • fromfile, tofile: Filenames for the diff header
  • fromfiledate, tofiledate: File modification dates
  • n: Number of context lines (default: 3)
  • lineterm: Line terminator (default: '\n')

Development

# Activate virtual environment
source venv/bin/activate

# Run tests
python -m pytest tests/ -v

# Run benchmarks
python -m pytest tests/test_benchmark.py -s

# Build the package with optimizations
maturin develop --release

Author

Everything in this project was written by Sweep AI, an AI agent for Jetbrains IDEs.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

difflib_rs-0.1.0-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (254.9 kB view details)

Uploaded CPython 3.10+manylinux: glibc 2.17+ x86-64

difflib_rs-0.1.0-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (251.6 kB view details)

Uploaded CPython 3.10+manylinux: glibc 2.17+ ARM64

difflib_rs-0.1.0-cp310-abi3-macosx_11_0_arm64.whl (241.3 kB view details)

Uploaded CPython 3.10+macOS 11.0+ ARM64

File details

Details for the file difflib_rs-0.1.0-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for difflib_rs-0.1.0-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 07d1b2cb45129bc6d2a23787cea53efb254f31ac9f1d14e4eacdfdd727558d86
MD5 b006a5e55af388c35ed6643a32ffbf3f
BLAKE2b-256 ee107615e1541027e09e89e76b21e1466362e13a21d53e148f0920d004514d9f

See more details on using hashes here.

File details

Details for the file difflib_rs-0.1.0-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for difflib_rs-0.1.0-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 7a5394a344a3af11251216722fe0c2457be106681845cc465162aab878684182
MD5 d9ef88b4f25f23c43708f3eeaa39f511
BLAKE2b-256 f835475e553e4c5efb72d982f911b08f08f270d3507860677d1847ce8b68dada

See more details on using hashes here.

File details

Details for the file difflib_rs-0.1.0-cp310-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for difflib_rs-0.1.0-cp310-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 c2f4c0d6526a72a86220b6fb884d503a9a1640523e61d577d68a922c80415975
MD5 1d1691fd14960fe37df5f6ed8d70501a
BLAKE2b-256 f6c857b897d71cf9ab65349fe9bda90452d468e319451443418bcee613853969

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page