Edit distance, Similarity and 2 sequence differences printing
Project description
Python C Extention 2 Sequence Compare
Edit distance, Similarity and 2 sequence differences printing.
How to Install?
pip install cdiffer
Requirement
- python3.6 or later
- python2.7
cdiffer.dist
Compute absolute Levenshtein distance of two strings.
Usage
dist(sequence, sequence)
Examples (it's hard to spell Levenshtein correctly):
Help on built-in function dist in module cdiffer:
dist(...)
Compute absolute Levenshtein distance of two strings.
dist(sequence, sequence)
Examples (it's hard to spell Levenshtein correctly):
>>> dist('coffee', 'cafe')
4
>>> dist(list('coffee'), list('cafe'))
4
>>> dist(tuple('coffee'), tuple('cafe'))
4
>>> dist(iter('coffee'), iter('cafe'))
4
>>> dist(range(4), range(5))
1
>>> dist('coffee', 'xxxxxx')
12
>>> dist('coffee', 'coffee')
0
cdiffer.similar
Compute similarity of two strings.
Usage
similar(sequence, sequence)
The similarity is a number between 0 and 1, base on levenshtein edit distance.
Examples
>>> from cdiffer import similar
>>>
>>> similar('coffee', 'cafe')
0.6
>>> similar('hoge', 'bar')
0.0
cdiffer.differ
Find sequence of edit operations transforming one string to another.
Usage
differ(source_sequence, destination_sequence, diffonly=False, rep_rate=60)
Examples
>>> from cdiffer import differ
>>>
>>> for x in differ('coffee', 'cafe'):
... print(x)
...
['equal', 0, 0, 'c', 'c']
['delete', 1, None,'o',None]
['insert', None, 1,None,'a']
['equal', 2, 2, 'f', 'f']
['delete', 3, None,'f',None]
['delete', 4, None,'e',None]
['equal', 5, 3, 'e', 'e']
>>> for x in differ('coffee', 'cafe', diffonly=True):
... print(x)
...
['delete', 1, None,'o',None]
['insert', None, 1,None,'a']
['delete', 3, None,'f',None]
['delete', 4, None,'e',None]
>>> for x in differ('coffee', 'cafe', rep_rate = 0):
... print(x)
...
['equal', 0, 0, 'c', 'c']
['replace', 1, 1, 'o', 'a']
['equal', 2, 2, 'f', 'f']
['delete', 3, None,'f',None]
['delete', 4, None,'e',None]
['equal', 5, 3, 'e', 'e']
>>> for x in differ('coffee', 'cafe', diffonly=True, rep_rate = 0):
... print(x)
...
['replace', 1, 1, 'o', 'a']
['delete', 3, None,'f',None]
['delete', 4, None,'e',None]
cdiffer.compare
compare and prety printing 2 sequence data.
Usage
compare(source_sequence, destination_sequence, diffonly=False, rep_rate=60, condition_value=" ---> ")
Examples
>>> from cdiffer import compare
... compare('coffee', 'cafe')
[[60, 'insert', 'c', 'a', 'f', 'e'],
[60, 'delete', 'c', 'o', 'f', 'f', 'e', 'e']]
Performance
C:\Windows\system>ipython
Python 3.7.7 (tags/v3.7.7:d7c567b08f, Mar 10 2020, 10:41:24) [MSC v.1900 64 bit (AMD64)]
Type 'copyright', 'credits' or 'license' for more information
IPython 7.21.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]: from cdiffer import *
In [2]: %timeit dist('coffee', 'cafe')
...: %timeit dist(list('coffee'), list('cafe'))
...: %timeit dist(tuple('coffee'), tuple('cafe'))
...: %timeit dist(iter('coffee'), iter('cafe'))
...: %timeit dist(range(4), range(5))
...: %timeit dist('coffee', 'xxxxxx')
...: %timeit dist('coffee', 'coffee')
125 ns ± 0.534 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
677 ns ± 2.3 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
638 ns ± 3.42 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
681 ns ± 2.16 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
843 ns ± 3.66 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
125 ns ± 0.417 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
50.5 ns ± 0.338 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
In [3]: %timeit similar('coffee', 'cafe')
...: %timeit similar(list('coffee'), list('cafe'))
...: %timeit similar(tuple('coffee'), tuple('cafe'))
...: %timeit similar(iter('coffee'), iter('cafe'))
...: %timeit similar(range(4), range(5))
...: %timeit similar('coffee', 'xxxxxx')
...: %timeit similar('coffee', 'coffee')
123 ns ± 0.301 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
680 ns ± 2.64 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
647 ns ± 1.78 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
680 ns ± 7.57 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
848 ns ± 4.19 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
130 ns ± 0.595 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
54.8 ns ± 0.691 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
In [4]: %timeit differ('coffee', 'cafe')
...: %timeit differ(list('coffee'), list('cafe'))
...: %timeit differ(tuple('coffee'), tuple('cafe'))
...: %timeit differ(iter('coffee'), iter('cafe'))
...: %timeit differ(range(4), range(5))
...: %timeit differ('coffee', 'xxxxxx')
...: %timeit differ('coffee', 'coffee')
735 ns ± 4.18 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
1.36 µs ± 5.17 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
1.31 µs ± 5.25 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
1.37 µs ± 5.04 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
1.33 µs ± 5.32 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
1.07 µs ± 6.75 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
638 ns ± 3.67 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [5]: a = dict(zip('012345', 'coffee'))
...: b = dict(zip('0123', 'cafe'))
...: %timeit dist(a, b)
...: %timeit similar(a, b)
...: %timeit differ(a, b)
524 ns ± 2.6 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
539 ns ± 2.23 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
1.07 µs ± 1.9 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [6]: %timeit compare("coffee", "cafe")
...: %timeit compare([list("abc"), list("abc")], [list("abc"), list("acc"), list("xtz")], rep_rate=50)
...: %timeit compare(["abc", "abc"], ["abc", "acc", "xtz"], rep_rate=40)
...: %timeit compare(["abc", "abc"], ["abc", "acc", "xtz"], rep_rate=50)
844 ns ± 3.88 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
3.32 µs ± 6.92 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
1.16 µs ± 3.94 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
1.3 µs ± 31.5 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
cdiffer-0.5.4.tar.gz
(21.2 kB
view hashes)
Built Distributions
cdiffer-0.5.4-cp39-cp39-win_amd64.whl
(663.1 kB
view hashes)
cdiffer-0.5.4-cp38-cp38-win_amd64.whl
(663.1 kB
view hashes)
Close
Hashes for cdiffer-0.5.4-cp39-cp39-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e9bdff1247afaff1dfcc66279397bcd87e4221f63a93809a2b6cb91ee9b2e929 |
|
MD5 | a634c2740f477f8b7d030f4c7050fa59 |
|
BLAKE2b-256 | c323d5a8688732de05c1be35f4c84c4825aaf102d59c3da8945fb3d6bf501eb7 |
Close
Hashes for cdiffer-0.5.4-cp39-cp39-manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 613293697685fb095ff097fda9bda0145444fa69d92f2d794835b6fa8e861d37 |
|
MD5 | 96c2b369b0656d257dc90297727abddd |
|
BLAKE2b-256 | 7e275455b0227744b29971fa98f4886f6d6f894e44779068571d86f3a987e5e9 |
Close
Hashes for cdiffer-0.5.4-cp39-cp39-manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | cd3a3ef4a71a57df0db4230cbc35545fc097968fbe4d53bf5590d75b329ca0db |
|
MD5 | 5df6b766c9696ad75f4e18b24f1adff6 |
|
BLAKE2b-256 | 3d09f8a19217ac3127c54f075f4dee223b9f59764fc2e6bbd2791c0802fda38d |
Close
Hashes for cdiffer-0.5.4-cp39-cp39-macosx_10_16_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e4676273a7d3fa31265b341980c5bfbed7b31bac2a55307b99fa8b2625e4bbbe |
|
MD5 | 0706887946480781f4d401a21274f604 |
|
BLAKE2b-256 | 447472590da4010099285c96513c4f1fd56860235f577d8a35a2c86ab71a26bf |
Close
Hashes for cdiffer-0.5.4-cp38-cp38-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2e4cbed751fe945b251933b02d3e1746f010b60a73710c5876f62ab310413615 |
|
MD5 | f3d4c1fb347c7c496a99487516e00e10 |
|
BLAKE2b-256 | f125d178048faa40f7bb987024ffde630335a1dfcb163d93acde6174a345b6f4 |
Close
Hashes for cdiffer-0.5.4-cp38-cp38-manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 055a07367c785ecc0b43cd6cdfb052e2beab65d4806e70f8e901dd04648a8ee5 |
|
MD5 | 032620784960a2b8daec21685aed435d |
|
BLAKE2b-256 | 64742e09cb65fc835b1ac3b47caf7acecc18b1fc53a6bd776f83850d77ebb6b9 |
Close
Hashes for cdiffer-0.5.4-cp38-cp38-manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 768d7c566b682cd730468ca867db9b954cccbbcf931775b4297a4e5b02e07e47 |
|
MD5 | 71632d462dfa8353484cf929c20393f1 |
|
BLAKE2b-256 | 893194c10056d18dc3fad9fec7e54aab0505ed8bcb46f927f33f4e942f4c8a43 |
Close
Hashes for cdiffer-0.5.4-cp38-cp38-macosx_10_16_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 89df4b68144a55f1387f9318f51405b99cdbf082b25a68c5ee4c9507c1c5e762 |
|
MD5 | 7a4a9a516bd93ac591d0270daa305d0b |
|
BLAKE2b-256 | e1cca96a454d8c9e55d0ad4be59be99ffeba3d828cd301bef7474eea8ddae74b |
Close
Hashes for cdiffer-0.5.4-cp37-cp37m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b8b45b7ec5115d1810d1724ae7f1d329a4ab7a22e1f208ed78ad9bdba136a4e5 |
|
MD5 | 15e099c3e4d34ca26cf9d70b8c2d9c5c |
|
BLAKE2b-256 | cbb92a8fc222c7bea27ba1b2e761c3b135cd0b7af14109a9a8348e2de47d27d1 |
Close
Hashes for cdiffer-0.5.4-cp37-cp37m-manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4ea32b5e5dcb04333150f10abee5ff01e3538ba8e1a0bf9dd90bb67c58546f61 |
|
MD5 | b6f43aa774477f7cef0423ffd26632cf |
|
BLAKE2b-256 | 5104902febe646545afb6517a110f2c9477a8781afa60d9bc1fc5b4e2d52e568 |
Close
Hashes for cdiffer-0.5.4-cp37-cp37m-manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5449729581a329ebd4c4dc8ac809fce98816712d47969b645d9fc455cf9d0923 |
|
MD5 | e68f41c0db816fd1045559e667b5b444 |
|
BLAKE2b-256 | e610653c24e569e066e940cba5bb6cac4846ebd638f9937ba61cb54c03c040b8 |
Close
Hashes for cdiffer-0.5.4-cp37-cp37m-macosx_10_16_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b1bd5586b73e3045eb2f5a143ab6396d894964d45ef2096cdd64d8746ad7758a |
|
MD5 | 0e672a9c452d547840abcf6d47169ce2 |
|
BLAKE2b-256 | 00a8b84e9dadfdc85b6ba937f2d308c36d08c023cf9acc3f31a55f531f487239 |
Close
Hashes for cdiffer-0.5.4-cp36-cp36m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 328f530fe0bb1424333b328fcfef21b9cb737ce0cd3bb9f0ffa5497bde858a61 |
|
MD5 | 9040bb41f2b80b30e0929b90cb45802c |
|
BLAKE2b-256 | 4290d7b971c5763eea4b9169506df48873a463090356c5a874237ab40fc22fa8 |
Close
Hashes for cdiffer-0.5.4-cp36-cp36m-manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4fcaaaedef7f3790d5f6497891f2fc2d7c05e56c54b68820fb803fd4bf545dfa |
|
MD5 | 2f0cbcec378dc49ba0b21030888b1738 |
|
BLAKE2b-256 | ac8769fa714125f207a0d8e0a004dfd6bf4d8b98d86149f42d49c6c29a836318 |
Close
Hashes for cdiffer-0.5.4-cp36-cp36m-manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0a9b0dc2b5b49e9f888895610c4c8fe2fca87258b6b02bad03fddc3fd6ff26f8 |
|
MD5 | a0b710f0627123b115153dea1747493e |
|
BLAKE2b-256 | 97a7075237f2b4adc8aee8299a1beb34df9ffcf24db53cc56af174f9fd536958 |
Close
Hashes for cdiffer-0.5.4-cp36-cp36m-macosx_10_16_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a72ece1f4fb645b766fff769715c922d7a2f01a65824c579d27dd94d969aa12a |
|
MD5 | 0d540bc5ee1578cef23a344391b9576b |
|
BLAKE2b-256 | fd0419882d5f5710bfc078a9e36afefeb1a207c2325e6ef4c5bf2d0431e35a4b |
Close
Hashes for cdiffer-0.5.4-cp27-cp27mu-manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 42d76061a52f434bc0b1fa714ffa6a7253dbd43660483dcba52c22ed6dc5d09e |
|
MD5 | c5149c427a9a5a8641a66a01bea4ad5a |
|
BLAKE2b-256 | 41cb3c04ff3f315a09b24e7cd42733a931b55f824e254df55c52d716404f9482 |
Close
Hashes for cdiffer-0.5.4-cp27-cp27mu-manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 60ea410d8948bafb1c51df66e5596d37b9af4789d6eb72ac4f06610d92e116ba |
|
MD5 | 5f67510741a3619f96ce476d4185b022 |
|
BLAKE2b-256 | c0dcf6f261ec2fd25300ae15091806bb8cd8925bea74a7df3bff8160ab9a885b |