C implementation of parts of difflib
Project description
cdifflib
========
[<img src="https://travis-ci.org/mduggan/cdifflib.svg?branch=master">](https://travis-ci.org/mduggan/cdifflib/)
Python [difflib](http://docs.python.org/2/library/difflib.html) sequence
matcher reimplemented in C.
Actually only contains reimplemented parts. Creates a `CSequenceMatcher` type
which inherets most functions from `difflib.SequenceMatcher`.
`cdifflib` is about 4x the speed of the pure python `difflib` when diffing
large streams.
Limitations
-----------
The C part of the code can only work on `list` rather than generic iterables,
so anything that isn't a `list` will be converted to `list` in the
`CSequenceMatcher` constructor. This may cause undesirable behavior if you're
not expecting it.
Works with Python 2.7 and 3.6 (Should work on all 3.3+)
Usage
-----
Can be used just like the `difflib.SequenceMatcher` as long as you pass lists. These examples are right out of the [difflib docs](http://docs.python.org/2/library/difflib.html):
```Python
>>> from cdifflib import CSequenceMatcher
>>> s = CSequenceMatcher(None, ' abcd', 'abcd abcd')
>>> s.find_longest_match(0, 5, 0, 9)
Match(a=1, b=0, size=4)
>>> s = CSequenceMatcher(lambda x: x == " ",
... "private Thread currentThread;",
... "private volatile Thread currentThread;")
>>> print round(s.ratio(), 3)
0.866
```
It's completely compatible, so you can replace the difflib version on startup
and then other libraries will use CSequenceMatcher too, eg:
```Python
from cdifflib import CSequenceMatcher
import difflib
difflib.SequenceMatcher = CSequenceMatcher
import library_that_uses_difflib
# Now the library will transparantely be using the C SequenceMatcher - other
# things remain the same
library_that_uses_difflib.do_some_diffing()
```
Making
------
To install:
```
python setup.py install
```
To test:
```
python setup.py test
```
License etc
-----------
This code lives at https://github.com/mduggan. See LICENSE for the license.
Changelog
---------
* 1.2.3 - Repackage again with a changelog update and corrected source package
* 1.2.2 - Repackage to add README.md in a way pypi supports
* 1.2.1 - Fix bug for longer sequences with "autojunk"
* 1.2.0 - Python 3 support for other versions
* 1.1.0 - Added Python 3.6 support (thanks Bclavie)
* 1.0.4 - Changes to make it compile on MSVC++ compiler, no change for other platforms
* 1.0.2 - Bugfix - also replace set_seq1 implementation so `difflib.compare` works with a `CSequenceMatcher`
* 1.0.1 - Implement more bits in c to squeeze a bit more speed out
* 1.0.0 - First release
========
[<img src="https://travis-ci.org/mduggan/cdifflib.svg?branch=master">](https://travis-ci.org/mduggan/cdifflib/)
Python [difflib](http://docs.python.org/2/library/difflib.html) sequence
matcher reimplemented in C.
Actually only contains reimplemented parts. Creates a `CSequenceMatcher` type
which inherets most functions from `difflib.SequenceMatcher`.
`cdifflib` is about 4x the speed of the pure python `difflib` when diffing
large streams.
Limitations
-----------
The C part of the code can only work on `list` rather than generic iterables,
so anything that isn't a `list` will be converted to `list` in the
`CSequenceMatcher` constructor. This may cause undesirable behavior if you're
not expecting it.
Works with Python 2.7 and 3.6 (Should work on all 3.3+)
Usage
-----
Can be used just like the `difflib.SequenceMatcher` as long as you pass lists. These examples are right out of the [difflib docs](http://docs.python.org/2/library/difflib.html):
```Python
>>> from cdifflib import CSequenceMatcher
>>> s = CSequenceMatcher(None, ' abcd', 'abcd abcd')
>>> s.find_longest_match(0, 5, 0, 9)
Match(a=1, b=0, size=4)
>>> s = CSequenceMatcher(lambda x: x == " ",
... "private Thread currentThread;",
... "private volatile Thread currentThread;")
>>> print round(s.ratio(), 3)
0.866
```
It's completely compatible, so you can replace the difflib version on startup
and then other libraries will use CSequenceMatcher too, eg:
```Python
from cdifflib import CSequenceMatcher
import difflib
difflib.SequenceMatcher = CSequenceMatcher
import library_that_uses_difflib
# Now the library will transparantely be using the C SequenceMatcher - other
# things remain the same
library_that_uses_difflib.do_some_diffing()
```
Making
------
To install:
```
python setup.py install
```
To test:
```
python setup.py test
```
License etc
-----------
This code lives at https://github.com/mduggan. See LICENSE for the license.
Changelog
---------
* 1.2.3 - Repackage again with a changelog update and corrected source package
* 1.2.2 - Repackage to add README.md in a way pypi supports
* 1.2.1 - Fix bug for longer sequences with "autojunk"
* 1.2.0 - Python 3 support for other versions
* 1.1.0 - Added Python 3.6 support (thanks Bclavie)
* 1.0.4 - Changes to make it compile on MSVC++ compiler, no change for other platforms
* 1.0.2 - Bugfix - also replace set_seq1 implementation so `difflib.compare` works with a `CSequenceMatcher`
* 1.0.1 - Implement more bits in c to squeeze a bit more speed out
* 1.0.0 - First release
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
cdifflib-1.2.3.tar.gz
(7.5 kB
view hashes)
Built Distributions
Close
Hashes for cdifflib-1.2.3.macosx-10.14-intel.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 37194bc73df6cd354cddf296bebf2f3f9aa903d974e8947c8c005e851c3582f8 |
|
MD5 | 3226eef181e9c25e13cabaed6f632e64 |
|
BLAKE2b-256 | 3853cffe28ebcccde275682b04c8b621fa5c147d71502ba5370bd390f8505c81 |
Close
Hashes for cdifflib-1.2.3-py3.7-macosx-10.14-x86_64.egg
Algorithm | Hash digest | |
---|---|---|
SHA256 | f43bd2ef635348e17da24b437be6cddc89b0518ca4016d3d0126bd57cbde52f9 |
|
MD5 | 8b3c8f8f9dc43d8cd7e2f6f34f83b0a5 |
|
BLAKE2b-256 | 0412bd1d833fa37c405cc4e881e1efe74527740699552a13ec7ae243ccb576ba |
Close
Hashes for cdifflib-1.2.3-py3.6-macosx-10.14-x86_64.egg
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9d25b2e63ac9a977bb658f484dfb1982a8698bbc6bb54fa9cfd0c06f8733242f |
|
MD5 | ac554035fe8cd413c756b01800dc5fae |
|
BLAKE2b-256 | f46e33d45f2cdf43aa191a25e04c01727cfcef965d2a91afbac6f57844812af8 |
Close
Hashes for cdifflib-1.2.3-py3.4-macosx-10.14-x86_64.egg
Algorithm | Hash digest | |
---|---|---|
SHA256 | 023fc1ef0e385de7588c0fc31348fd33587a851f457064380b351bcd8c5217a0 |
|
MD5 | d1877370fd07e677ad2b3ba91a9fe8b1 |
|
BLAKE2b-256 | de5fe6ab3652bf8bc9d6a148f4cf1b09611a25a130e94821b16cbd37dbed2314 |
Close
Hashes for cdifflib-1.2.3-py2.7-macosx-10.14-intel.egg
Algorithm | Hash digest | |
---|---|---|
SHA256 | cb04b706159e2f65d5286244a74bf35a5fccd2fe59e8735be2e23b176779de7f |
|
MD5 | b9a89330413149e1b480043711f9c31b |
|
BLAKE2b-256 | d2e209a8a1e17bc5d42798455511bbc3fecc0d7d247c87607bbe7c07b965dc4a |
Close
Hashes for cdifflib-1.2.3-cp37-cp37m-macosx_10_14_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ed22328831616dc285f05289634929b98a28f1e4768f61a6592a184e7dfd7127 |
|
MD5 | 30e9c275b32a4e2d65f13c8262a230c5 |
|
BLAKE2b-256 | c3fc355f2ef2af18acf3ec8818b764dfdaee30b73d8b3b706e198d2baea97e1f |
Close
Hashes for cdifflib-1.2.3-cp36-cp36m-macosx_10_14_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 20d85581a53e4c34f683dea9e9e8c1eb70fb42ba11a5e206360e23390d16fd7a |
|
MD5 | 5cfbbdf4a892944a4d9a0e364351c28f |
|
BLAKE2b-256 | ed80c1e11ff961b6aa9df5f0a0bc087d94812efb66390c4366956b449313883f |
Close
Hashes for cdifflib-1.2.3-cp34-cp34m-macosx_10_14_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7b250321160d55fed308ba2dac9bc2464f4b8668f87dc6ae14d3f2a33c8012e6 |
|
MD5 | d032461a286a979f2cb60118c1be58b1 |
|
BLAKE2b-256 | d229062fdfd2caf20f818106f85a6da1bccd04012b2387c5235dcf4177c62f08 |
Close
Hashes for cdifflib-1.2.3-cp27-cp27m-macosx_10_14_intel.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4dbbb83614d9296a5061c804521d02f669385d772b775a372a974660d84bbcd8 |
|
MD5 | ff1ef01931ff4776d3ee060056074aaf |
|
BLAKE2b-256 | 292bdf20bfa49f00dc6c78e0477dc9244742f9b709e6a97484206ed76a179ec6 |