Skip to main content

C implementation of parts of difflib

Project description

cdifflib

Python difflib sequence matcher reimplemented in C.

Actually only contains reimplemented parts. Creates a CSequenceMatcher type which inherets most functions from difflib.SequenceMatcher.

cdifflib is about 4x the speed of the pure python difflib when diffing large streams.

Limitations

The C part of the code can only work on list rather than generic iterables, so anything that isn't a list will be converted to list in the CSequenceMatcher constructor. This may cause undesirable behavior if you're not expecting it.

Works with Python 2.7 and 3.6 (Should work on all 3.3+)

Usage

Can be used just like the difflib.SequenceMatcher as long as you pass lists. These examples are right out of the difflib docs:

>>> from cdifflib import CSequenceMatcher
>>> s = CSequenceMatcher(None, ' abcd', 'abcd abcd')
>>> s.find_longest_match(0, 5, 0, 9)
Match(a=1, b=0, size=4)
>>> s = CSequenceMatcher(lambda x: x == " ",
...                      "private Thread currentThread;",
...                      "private volatile Thread currentThread;")
>>> print round(s.ratio(), 3)
0.866

It's completely compatible, so you can replace the difflib version on startup and then other libraries will use CSequenceMatcher too, eg:

from cdifflib import CSequenceMatcher
import difflib
difflib.SequenceMatcher = CSequenceMatcher
import library_that_uses_difflib

# Now the library will transparantely be using the C SequenceMatcher - other
# things remain the same
library_that_uses_difflib.do_some_diffing()

Making

Set up a venv:

python -m venv .venv
source .venv/bin/activate

To install:

python -m build

To test:

python -m pytest tests/cdifflib_tests.py

License etc

This code lives at https://github.com/mduggan. See LICENSE for the license.

Changelog

  • 1.2.9 - Repackage again, no code change (#13)
  • 1.2.8 - Bump to fix version number in py file, no code change
  • 1.2.7 - Update for newer pythons (#12)
  • 1.2.6 - Clear state correctly when replacing seq1 (#10)
  • 1.2.5 - Fix some memory leaks (#7)
  • 1.2.4 - Repackage yet again using twine for pypi upload (no binary changes)
  • 1.2.3 - Repackage again with changelog update and corrected src package (no binary changes)
  • 1.2.2 - Repackage to add README.md in a way pypi supports (no binary changes)
  • 1.2.1 - Fix bug for longer sequences with "autojunk"
  • 1.2.0 - Python 3 support for other versions
  • 1.1.0 - Added Python 3.6 support (thanks Bclavie)
  • 1.0.4 - Changes to make it compile on MSVC++ compiler, no change for other platforms
  • 1.0.2 - Bugfix - also replace set_seq1 implementation so difflib.compare works with a CSequenceMatcher
  • 1.0.1 - Implement more bits in c to squeeze a bit more speed out
  • 1.0.0 - First release

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cdifflib-1.2.9.tar.gz (12.3 kB view details)

Uploaded Source

Built Distributions

cdifflib-1.2.9-cp313-cp313-macosx_15_0_arm64.whl (11.0 kB view details)

Uploaded CPython 3.13 macOS 15.0+ ARM64

cdifflib-1.2.9-cp312-cp312-macosx_15_0_arm64.whl (11.0 kB view details)

Uploaded CPython 3.12 macOS 15.0+ ARM64

cdifflib-1.2.9-cp311-cp311-macosx_15_0_arm64.whl (11.0 kB view details)

Uploaded CPython 3.11 macOS 15.0+ ARM64

cdifflib-1.2.9-cp310-cp310-macosx_15_0_arm64.whl (11.0 kB view details)

Uploaded CPython 3.10 macOS 15.0+ ARM64

cdifflib-1.2.9-cp39-cp39-macosx_15_0_arm64.whl (11.0 kB view details)

Uploaded CPython 3.9 macOS 15.0+ ARM64

File details

Details for the file cdifflib-1.2.9.tar.gz.

File metadata

  • Download URL: cdifflib-1.2.9.tar.gz
  • Upload date:
  • Size: 12.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.13.1

File hashes

Hashes for cdifflib-1.2.9.tar.gz
Algorithm Hash digest
SHA256 6286da08f72b7ddb5b40145dcb8f214ad913a86d72b1f62cc8d6cf7a92029590
MD5 6d3fa92cd4582b447d4b4653ea534841
BLAKE2b-256 52aadaefb1236e47561ca53f469f4832f625b38ad6db4e5c68e589dd72928d61

See more details on using hashes here.

File details

Details for the file cdifflib-1.2.9-cp313-cp313-macosx_15_0_arm64.whl.

File metadata

File hashes

Hashes for cdifflib-1.2.9-cp313-cp313-macosx_15_0_arm64.whl
Algorithm Hash digest
SHA256 c7113c018e1d8190ce6c00318ae5afe7e99c8e4e0b4b631ee79acde74949c2e8
MD5 fde1e78aa837073206ced6559b6cd07b
BLAKE2b-256 7c055071e0757237e7aa79a6256c1ddddebebab500e1807a1603739f777f37b1

See more details on using hashes here.

File details

Details for the file cdifflib-1.2.9-cp312-cp312-macosx_15_0_arm64.whl.

File metadata

File hashes

Hashes for cdifflib-1.2.9-cp312-cp312-macosx_15_0_arm64.whl
Algorithm Hash digest
SHA256 75a81d8a5e2b0ca055d3f7850fd0a29b488b086c91445841fa3113ac328410e0
MD5 21fc19ef850d38bdffc3086b3a51fa13
BLAKE2b-256 cd94caf01d3efe4aa31086217d20538ad9a8ef8a925b49771d34d4dab0295de8

See more details on using hashes here.

File details

Details for the file cdifflib-1.2.9-cp311-cp311-macosx_15_0_arm64.whl.

File metadata

File hashes

Hashes for cdifflib-1.2.9-cp311-cp311-macosx_15_0_arm64.whl
Algorithm Hash digest
SHA256 32c56f7895253b0734f42ba023a9c181b52d72f3d51afacc29d5ea8ee72e4643
MD5 165a0155db103341b2e1732f44adff61
BLAKE2b-256 ba35161f137709a77ae861dfdebb478a7dad323b3a7fd3c24b97799ccf48a2b7

See more details on using hashes here.

File details

Details for the file cdifflib-1.2.9-cp310-cp310-macosx_15_0_arm64.whl.

File metadata

File hashes

Hashes for cdifflib-1.2.9-cp310-cp310-macosx_15_0_arm64.whl
Algorithm Hash digest
SHA256 24219193d1d298ead211d4b628ad2124ffa1c0676890cea8fbacdeaf66a2369b
MD5 f51a6e8b40e18a989b067b60d3e86185
BLAKE2b-256 0a2a12cc95269e666ac40a20662c865677e24363fdca4d5c72566ad36b14d24a

See more details on using hashes here.

File details

Details for the file cdifflib-1.2.9-cp39-cp39-macosx_15_0_arm64.whl.

File metadata

File hashes

Hashes for cdifflib-1.2.9-cp39-cp39-macosx_15_0_arm64.whl
Algorithm Hash digest
SHA256 5a17ca0fc0a38c799b60243d74eb878e2599e4b60327d36c5cd33055d561eae1
MD5 360b44cab6892fb20f8fc3085ff761c5
BLAKE2b-256 5941a42aaf42ad648f6bb3a7c03471e3a15a98918321744d38fe45aa9ac8b8a3

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page