Skip to main content

C implementation of parts of difflib

Project description

cdifflib

Python difflib sequence matcher reimplemented in C.

Actually only contains reimplemented parts. Creates a CSequenceMatcher type which inherets most functions from difflib.SequenceMatcher.

cdifflib is about 4x the speed of the pure python difflib when diffing large streams.

Limitations

The C part of the code can only work on list rather than generic iterables, so anything that isn't a list will be converted to list in the CSequenceMatcher constructor. This may cause undesirable behavior if you're not expecting it.

Works with Python 2.7 and 3.6 (Should work on all 3.3+)

Usage

Can be used just like the difflib.SequenceMatcher as long as you pass lists. These examples are right out of the difflib docs:

>>> from cdifflib import CSequenceMatcher
>>> s = CSequenceMatcher(None, ' abcd', 'abcd abcd')
>>> s.find_longest_match(0, 5, 0, 9)
Match(a=1, b=0, size=4)
>>> s = CSequenceMatcher(lambda x: x == " ",
...                      "private Thread currentThread;",
...                      "private volatile Thread currentThread;")
>>> print round(s.ratio(), 3)
0.866

It's completely compatible, so you can replace the difflib version on startup and then other libraries will use CSequenceMatcher too, eg:

from cdifflib import CSequenceMatcher
import difflib
difflib.SequenceMatcher = CSequenceMatcher
import library_that_uses_difflib

# Now the library will transparantely be using the C SequenceMatcher - other
# things remain the same
library_that_uses_difflib.do_some_diffing()

Making

To install:

python setup.py install

To test:

python setup.py test

License etc

This code lives at https://github.com/mduggan. See LICENSE for the license.

Changelog

  • 1.2.6 - Clear state correctly when replacing seq1 (#10)
  • 1.2.5 - Fix some memory leaks (#7)
  • 1.2.4 - Repackage yet again using twine for pypi upload (no binary changes)
  • 1.2.3 - Repackage again with changelog update and corrected src package (no binary changes)
  • 1.2.2 - Repackage to add README.md in a way pypi supports (no binary changes)
  • 1.2.1 - Fix bug for longer sequences with "autojunk"
  • 1.2.0 - Python 3 support for other versions
  • 1.1.0 - Added Python 3.6 support (thanks Bclavie)
  • 1.0.4 - Changes to make it compile on MSVC++ compiler, no change for other platforms
  • 1.0.2 - Bugfix - also replace set_seq1 implementation so difflib.compare works with a CSequenceMatcher
  • 1.0.1 - Implement more bits in c to squeeze a bit more speed out
  • 1.0.0 - First release

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cdifflib-1.2.6.tar.gz (11.4 kB view details)

Uploaded Source

Built Distributions

cdifflib-1.2.6-cp310-cp310-macosx_12_0_x86_64.whl (10.4 kB view details)

Uploaded CPython 3.10 macOS 12.0+ x86-64

cdifflib-1.2.6-cp39-cp39-macosx_12_0_x86_64.whl (10.4 kB view details)

Uploaded CPython 3.9 macOS 12.0+ x86-64

cdifflib-1.2.6-cp38-cp38-macosx_12_0_x86_64.whl (10.5 kB view details)

Uploaded CPython 3.8 macOS 12.0+ x86-64

File details

Details for the file cdifflib-1.2.6.tar.gz.

File metadata

  • Download URL: cdifflib-1.2.6.tar.gz
  • Upload date:
  • Size: 11.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.13

File hashes

Hashes for cdifflib-1.2.6.tar.gz
Algorithm Hash digest
SHA256 57517c390392a71d59e9d7e799e9b685eaf9e07812fc8f234540ff19c4b03e66
MD5 354eccbc43a5164147d029edb384fa8e
BLAKE2b-256 ae4ce4ef44dbf1c0fce22b2e0e081d97efdea97de56fda4c09210ca60b9a6924

See more details on using hashes here.

File details

Details for the file cdifflib-1.2.6-cp310-cp310-macosx_12_0_x86_64.whl.

File metadata

File hashes

Hashes for cdifflib-1.2.6-cp310-cp310-macosx_12_0_x86_64.whl
Algorithm Hash digest
SHA256 bd236fc9e166e911f8ad87d89c1d1ade4e33df6f67e8c34fbe5f2bd89f0225f1
MD5 7803bb6b237edc5aa895538490263cc1
BLAKE2b-256 4347732da59152d969e303fa34c3e3476979a685b437d7139e72fcf2bb239cad

See more details on using hashes here.

File details

Details for the file cdifflib-1.2.6-cp39-cp39-macosx_12_0_x86_64.whl.

File metadata

File hashes

Hashes for cdifflib-1.2.6-cp39-cp39-macosx_12_0_x86_64.whl
Algorithm Hash digest
SHA256 90c3bc02f3812f8def2e5b901345795a73e0547e9ea7aab2153200f4c84cab44
MD5 bc44d507015e837dac0555fd45a7b93b
BLAKE2b-256 e595ba3dde1553956e21a2d5674367676827fe115cd44d9a351d7b61d4c8d854

See more details on using hashes here.

File details

Details for the file cdifflib-1.2.6-cp38-cp38-macosx_12_0_x86_64.whl.

File metadata

File hashes

Hashes for cdifflib-1.2.6-cp38-cp38-macosx_12_0_x86_64.whl
Algorithm Hash digest
SHA256 3b509f3a2b51abe45af36dc074a878ba3309a48968683a14e4104b46c2ef5b44
MD5 6951279d13cc34e7837dfbe9c2339901
BLAKE2b-256 cb618cb01ed8d9145b87f4aad2107df61c77cef0326218fd6a94f2b2865d4841

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page