C implementation of parts of difflib
Project description
cdifflib
Python difflib sequence matcher reimplemented in C.
Actually only contains reimplemented parts. Creates a CSequenceMatcher
type
which inherets most functions from difflib.SequenceMatcher
.
cdifflib
is about 4x the speed of the pure python difflib
when diffing
large streams.
Limitations
The C part of the code can only work on list
rather than generic iterables,
so anything that isn't a list
will be converted to list
in the
CSequenceMatcher
constructor. This may cause undesirable behavior if you're
not expecting it.
Works with Python 2.7 and 3.6 (Should work on all 3.3+)
Usage
Can be used just like the difflib.SequenceMatcher
as long as you pass lists. These examples are right out of the difflib docs:
>>> from cdifflib import CSequenceMatcher
>>> s = CSequenceMatcher(None, ' abcd', 'abcd abcd')
>>> s.find_longest_match(0, 5, 0, 9)
Match(a=1, b=0, size=4)
>>> s = CSequenceMatcher(lambda x: x == " ",
... "private Thread currentThread;",
... "private volatile Thread currentThread;")
>>> print round(s.ratio(), 3)
0.866
It's completely compatible, so you can replace the difflib version on startup and then other libraries will use CSequenceMatcher too, eg:
from cdifflib import CSequenceMatcher
import difflib
difflib.SequenceMatcher = CSequenceMatcher
import library_that_uses_difflib
# Now the library will transparantely be using the C SequenceMatcher - other
# things remain the same
library_that_uses_difflib.do_some_diffing()
Making
To install:
python setup.py install
To test:
python setup.py test
License etc
This code lives at https://github.com/mduggan. See LICENSE for the license.
Changelog
- 1.2.6 - Clear state correctly when replacing seq1 (#10)
- 1.2.5 - Fix some memory leaks (#7)
- 1.2.4 - Repackage yet again using twine for pypi upload (no binary changes)
- 1.2.3 - Repackage again with changelog update and corrected src package (no binary changes)
- 1.2.2 - Repackage to add README.md in a way pypi supports (no binary changes)
- 1.2.1 - Fix bug for longer sequences with "autojunk"
- 1.2.0 - Python 3 support for other versions
- 1.1.0 - Added Python 3.6 support (thanks Bclavie)
- 1.0.4 - Changes to make it compile on MSVC++ compiler, no change for other platforms
- 1.0.2 - Bugfix - also replace set_seq1 implementation so
difflib.compare
works with aCSequenceMatcher
- 1.0.1 - Implement more bits in c to squeeze a bit more speed out
- 1.0.0 - First release
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
File details
Details for the file cdifflib-1.2.6.tar.gz
.
File metadata
- Download URL: cdifflib-1.2.6.tar.gz
- Upload date:
- Size: 11.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 57517c390392a71d59e9d7e799e9b685eaf9e07812fc8f234540ff19c4b03e66 |
|
MD5 | 354eccbc43a5164147d029edb384fa8e |
|
BLAKE2b-256 | ae4ce4ef44dbf1c0fce22b2e0e081d97efdea97de56fda4c09210ca60b9a6924 |
File details
Details for the file cdifflib-1.2.6-cp310-cp310-macosx_12_0_x86_64.whl
.
File metadata
- Download URL: cdifflib-1.2.6-cp310-cp310-macosx_12_0_x86_64.whl
- Upload date:
- Size: 10.4 kB
- Tags: CPython 3.10, macOS 12.0+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | bd236fc9e166e911f8ad87d89c1d1ade4e33df6f67e8c34fbe5f2bd89f0225f1 |
|
MD5 | 7803bb6b237edc5aa895538490263cc1 |
|
BLAKE2b-256 | 4347732da59152d969e303fa34c3e3476979a685b437d7139e72fcf2bb239cad |
File details
Details for the file cdifflib-1.2.6-cp39-cp39-macosx_12_0_x86_64.whl
.
File metadata
- Download URL: cdifflib-1.2.6-cp39-cp39-macosx_12_0_x86_64.whl
- Upload date:
- Size: 10.4 kB
- Tags: CPython 3.9, macOS 12.0+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 90c3bc02f3812f8def2e5b901345795a73e0547e9ea7aab2153200f4c84cab44 |
|
MD5 | bc44d507015e837dac0555fd45a7b93b |
|
BLAKE2b-256 | e595ba3dde1553956e21a2d5674367676827fe115cd44d9a351d7b61d4c8d854 |
File details
Details for the file cdifflib-1.2.6-cp38-cp38-macosx_12_0_x86_64.whl
.
File metadata
- Download URL: cdifflib-1.2.6-cp38-cp38-macosx_12_0_x86_64.whl
- Upload date:
- Size: 10.5 kB
- Tags: CPython 3.8, macOS 12.0+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3b509f3a2b51abe45af36dc074a878ba3309a48968683a14e4104b46c2ef5b44 |
|
MD5 | 6951279d13cc34e7837dfbe9c2339901 |
|
BLAKE2b-256 | cb618cb01ed8d9145b87f4aad2107df61c77cef0326218fd6a94f2b2865d4841 |