Edit distance, Similarity and 2 sequence differences printing
Project description
Python C Extention 2 Sequence Compare
Edit distance, Similarity and 2 sequence differences printing.
How to Install?
pip install cdiffer
Requirement
- python3.6 or later
- python2.7
cdiffer.dist
Compute absolute Levenshtein distance of two strings.
Usage
dist(sequence, sequence)
Examples (it's hard to spell Levenshtein correctly):
Help on built-in function dist in module cdiffer:
dist(...)
Compute absolute Levenshtein distance of two strings.
dist(sequence, sequence)
Examples (it's hard to spell Levenshtein correctly):
>>> dist('coffee', 'cafe')
4
>>> dist(list('coffee'), list('cafe'))
4
>>> dist(tuple('coffee'), tuple('cafe'))
4
>>> dist(iter('coffee'), iter('cafe'))
4
>>> dist(range(4), range(5))
1
>>> dist('coffee', 'xxxxxx')
12
>>> dist('coffee', 'coffee')
0
cdiffer.similar
Compute similarity of two strings.
Usage
similar(sequence, sequence)
The similarity is a number between 0 and 1, base on levenshtein edit distance.
Examples
>>> from cdiffer import similar
>>>
>>> similar('coffee', 'cafe')
0.6
>>> similar('hoge', 'bar')
0.0
cdiffer.differ
Find sequence of edit operations transforming one string to another.
Usage
differ(source_sequence, destination_sequence, diffonly=False, rep_rate=60)
Examples
>>> from cdiffer import differ
>>>
>>> for x in differ('coffee', 'cafe'):
... print(x)
...
['equal', 0, 0, 'c', 'c']
['delete', 1, None,'o',None]
['insert', None, 1,None,'a']
['equal', 2, 2, 'f', 'f']
['delete', 3, None,'f',None]
['delete', 4, None,'e',None]
['equal', 5, 3, 'e', 'e']
>>> for x in differ('coffee', 'cafe', diffonly=True):
... print(x)
...
['delete', 1, None,'o',None]
['insert', None, 1,None,'a']
['delete', 3, None,'f',None]
['delete', 4, None,'e',None]
>>> for x in differ('coffee', 'cafe', rep_rate = 0):
... print(x)
...
['equal', 0, 0, 'c', 'c']
['replace', 1, 1, 'o', 'a']
['equal', 2, 2, 'f', 'f']
['delete', 3, None,'f',None]
['delete', 4, None,'e',None]
['equal', 5, 3, 'e', 'e']
>>> for x in differ('coffee', 'cafe', diffonly=True, rep_rate = 0):
... print(x)
...
['replace', 1, 1, 'o', 'a']
['delete', 3, None,'f',None]
['delete', 4, None,'e',None]
cdiffer.compare
compare and prety printing 2 sequence data.
Usage
compare(source_sequence, destination_sequence, diffonly=False, rep_rate=60, condition_value=" ---> ")
Examples
>>> from cdiffer import compare
... compare('coffee', 'cafe')
[[60, 'insert', 'c', 'a', 'f', 'e'],
[60, 'delete', 'c', 'o', 'f', 'f', 'e', 'e']]
Performance
C:\Windows\system>ipython
Python 3.7.7 (tags/v3.7.7:d7c567b08f, Mar 10 2020, 10:41:24) [MSC v.1900 64 bit (AMD64)]
Type 'copyright', 'credits' or 'license' for more information
IPython 7.21.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]: from cdiffer import *
In [2]: %timeit dist('coffee', 'cafe')
...: %timeit dist(list('coffee'), list('cafe'))
...: %timeit dist(tuple('coffee'), tuple('cafe'))
...: %timeit dist(iter('coffee'), iter('cafe'))
...: %timeit dist(range(4), range(5))
...: %timeit dist('coffee', 'xxxxxx')
...: %timeit dist('coffee', 'coffee')
125 ns ± 0.534 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
677 ns ± 2.3 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
638 ns ± 3.42 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
681 ns ± 2.16 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
843 ns ± 3.66 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
125 ns ± 0.417 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
50.5 ns ± 0.338 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
In [3]: %timeit similar('coffee', 'cafe')
...: %timeit similar(list('coffee'), list('cafe'))
...: %timeit similar(tuple('coffee'), tuple('cafe'))
...: %timeit similar(iter('coffee'), iter('cafe'))
...: %timeit similar(range(4), range(5))
...: %timeit similar('coffee', 'xxxxxx')
...: %timeit similar('coffee', 'coffee')
123 ns ± 0.301 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
680 ns ± 2.64 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
647 ns ± 1.78 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
680 ns ± 7.57 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
848 ns ± 4.19 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
130 ns ± 0.595 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
54.8 ns ± 0.691 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
In [4]: %timeit differ('coffee', 'cafe')
...: %timeit differ(list('coffee'), list('cafe'))
...: %timeit differ(tuple('coffee'), tuple('cafe'))
...: %timeit differ(iter('coffee'), iter('cafe'))
...: %timeit differ(range(4), range(5))
...: %timeit differ('coffee', 'xxxxxx')
...: %timeit differ('coffee', 'coffee')
735 ns ± 4.18 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
1.36 µs ± 5.17 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
1.31 µs ± 5.25 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
1.37 µs ± 5.04 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
1.33 µs ± 5.32 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
1.07 µs ± 6.75 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
638 ns ± 3.67 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [5]: a = dict(zip('012345', 'coffee'))
...: b = dict(zip('0123', 'cafe'))
...: %timeit dist(a, b)
...: %timeit similar(a, b)
...: %timeit differ(a, b)
524 ns ± 2.6 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
539 ns ± 2.23 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
1.07 µs ± 1.9 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [6]: %timeit compare("coffee", "cafe")
...: %timeit compare([list("abc"), list("abc")], [list("abc"), list("acc"), list("xtz")], rep_rate=50)
...: %timeit compare(["abc", "abc"], ["abc", "acc", "xtz"], rep_rate=40)
...: %timeit compare(["abc", "abc"], ["abc", "acc", "xtz"], rep_rate=50)
844 ns ± 3.88 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
3.32 µs ± 6.92 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
1.16 µs ± 3.94 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
1.3 µs ± 31.5 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
cdiffer-0.5.3.tar.gz
(21.2 kB
view hashes)
Built Distributions
cdiffer-0.5.3-cp39-cp39-win_amd64.whl
(663.1 kB
view hashes)
cdiffer-0.5.3-cp38-cp38-win_amd64.whl
(663.1 kB
view hashes)
Close
Hashes for cdiffer-0.5.3-cp39-cp39-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 507a3a209ff5484c08be71d339d62119723efd561cd30b85f13fcdd762f93007 |
|
MD5 | 48ee87adc50bf77b4615047ae387d944 |
|
BLAKE2b-256 | ece15409a043a535ca24520ecacb6abdc954d7e1da8a802d84ba92251f655a6e |
Close
Hashes for cdiffer-0.5.3-cp39-cp39-manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b063810b3ff1743e49b5336c006186bf18aede48dcaf083e66fb97536a5190ec |
|
MD5 | 33fe5d5a61a6602d7e34249d5dbad322 |
|
BLAKE2b-256 | e7c4cb58019c47162de2433bd3907561aa1edd4b57f52ef12eba3bf228191d54 |
Close
Hashes for cdiffer-0.5.3-cp39-cp39-manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7ea8c4d892b1c8564e88ed9480acd4ae8be6f4cae5b2e4681ace4d0ad84cea76 |
|
MD5 | fb9109f236c386c5353ab951a5f15283 |
|
BLAKE2b-256 | 951b1e6c00a95dcc5671fd3a67d8e4c9cc4cd4db63486d2a21aec366d2cc9dd9 |
Close
Hashes for cdiffer-0.5.3-cp39-cp39-macosx_10_16_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e268d70549bc2b292611892311d40a365f7bcb4f9a7a2b712090575bae3b0aee |
|
MD5 | 82a2aace6478183676bdbdaeccef9e03 |
|
BLAKE2b-256 | 4863f8e10c580a27d2061e77f63baf383eaeaa82389728f840cd956c7e5139b9 |
Close
Hashes for cdiffer-0.5.3-cp38-cp38-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d7e6b443d1d356ae64dfec840936f2d6a02b05c5d88e7be84264bda01a9136bd |
|
MD5 | f7f03b44cbb712fdff49e53dcde6a43c |
|
BLAKE2b-256 | c0e37d59d51a3bf3ae2ce4a79c9bea0af7751df73158312d67e528af7a7018af |
Close
Hashes for cdiffer-0.5.3-cp38-cp38-manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ce04e0a7ea88edd2fe04656a26f14a275410bbc70e8d7969825e9168ed912f72 |
|
MD5 | 7f1a081754ae470cdbef868110f78c16 |
|
BLAKE2b-256 | 74ba08a6ed1ec00e3086ee539b20f266d11ce2a9ca1a2c8ea355ce193c834f74 |
Close
Hashes for cdiffer-0.5.3-cp38-cp38-manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | dc3e45cf3dc15327a4acfb608a851c4cf3815f300128b3d1c67eaf8e298aaffe |
|
MD5 | 08bd03cf9adcd67eaefe9da663715503 |
|
BLAKE2b-256 | 756c3c668c4d6aee3b49f4963ad7304993267fda5b7d56c73d260527c6a764bc |
Close
Hashes for cdiffer-0.5.3-cp38-cp38-macosx_10_16_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4e9e05bcec3157674aed2b6e4524c1c10f20c8cdf2e7a1f22efb17858559d5a2 |
|
MD5 | dcfe0a9b62423d32af652fb3c83bd786 |
|
BLAKE2b-256 | 77716bec30eb078586d99fe7c1bf07c10fe6027da83ed730c4037c09efd6c85a |
Close
Hashes for cdiffer-0.5.3-cp37-cp37m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 70f8a5bb322480ac5bd762877917ae7f89b8f2bd91c12efff435a447c093e8c7 |
|
MD5 | a3ae1ba169d38a0db0364daf78328be9 |
|
BLAKE2b-256 | a965a134431cc1393e1dbac5389b6102701ca6518a8d3b6f02e40ea5a96e692a |
Close
Hashes for cdiffer-0.5.3-cp37-cp37m-manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4beeb792636dc9624dd71e3eb22cb44caf1ab7df9949bdee52e8c0c4afc029a9 |
|
MD5 | da5d4cd4647fcec5aec3d85b1816819f |
|
BLAKE2b-256 | a85316adf42fd1c4c81a696a89f2125f2bf25dd928633db5b81c55d27d6787c9 |
Close
Hashes for cdiffer-0.5.3-cp37-cp37m-manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 947d17b8b91b837a195ec453eabe42f1a0a9d0c811e9a18b105561086943e3c5 |
|
MD5 | a20d96eccf0a5eb8e812eaf1ea217b7e |
|
BLAKE2b-256 | 98129d488a2a3c910004efbc33270ff5e1c74f855246095ecd06011e17353b36 |
Close
Hashes for cdiffer-0.5.3-cp37-cp37m-macosx_10_16_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9e47a4c1b935720361846a897f03b231f723cc10c1443ed1aff1154e87bf6660 |
|
MD5 | a0f0e43ab6ef013602523debd4704b64 |
|
BLAKE2b-256 | cfa03d88467308afe7e9fa13a4fdf6296388dd293c4341619d9dc22f635e21ce |
Close
Hashes for cdiffer-0.5.3-cp36-cp36m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3dbde37c06c658cb00e961e5d12f79362da2fb0d096caf3bd4067b9459e01f13 |
|
MD5 | 08e20d9270e5f708f1a606dfcec0ba83 |
|
BLAKE2b-256 | c7a52432d9408c6baf24c2ce3efb8974458209669998d1b16410a827d7e6ee21 |
Close
Hashes for cdiffer-0.5.3-cp36-cp36m-manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b9d28da4771de8d0693a00c7b755b14c436c789cb2a48a1f06d207612c20a676 |
|
MD5 | 6cf3cd41ffcae3b7c1f23daaec3a3013 |
|
BLAKE2b-256 | 445c3fa6e1fb6aa1f832ae3b465b93ce0612525510065079c348e4bc0f53cc15 |
Close
Hashes for cdiffer-0.5.3-cp36-cp36m-manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a5b3852aa9f932a8e705d64809ea4ea2560ab9be4dc009d8d8d24d300f66dba6 |
|
MD5 | 8933220f000ca5f1ce724c061314b1f0 |
|
BLAKE2b-256 | 29c9135a392bfbff1dc365bb574cf45c4538fe1cfd76b1ac031c494c85511ad6 |
Close
Hashes for cdiffer-0.5.3-cp36-cp36m-macosx_10_16_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9fda427f972faa40b70a678b892227d7cbfa1cc9466df6c5bf0cd4c0b518d720 |
|
MD5 | d8ca14d50397ed1942ca13badfc5bfde |
|
BLAKE2b-256 | 8873b6b733c07443d1874d3bafc3bc2e37cdf050dfb3b98e25e98e690050ffd0 |
Close
Hashes for cdiffer-0.5.3-cp27-cp27mu-manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5e4a1c3d466c8058651b8b0f9f8ace0f3c2d93675cee5e7488d869aed9106f0a |
|
MD5 | 08df81143b5b8a03ea61f0acd0758170 |
|
BLAKE2b-256 | 6d86979553b82d88e28aad2e53b08938cde3a792ca054472216975d68249bb91 |
Close
Hashes for cdiffer-0.5.3-cp27-cp27mu-manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3b18e17780955f1dfe3e044787bb34e115410c662d20be6e67181480a4f8204d |
|
MD5 | 7a544d93de681a32057ebfa022ff1cb8 |
|
BLAKE2b-256 | 452c7aa5f1f272004d4170fea4afb628c898ddcffc3f3a30fde4bf26c00a14f7 |