Edit distance, Similarity and 2 sequence differences printing
Project description
Python C Extention 2 Sequence Compare
Edit distance, Similarity and 2 sequence differences printing.
How to Install?
pip install cdiffer
Requirement
- python3.6 or later
cdiffer.dist
Compute absolute Levenshtein distance of two strings.
Usage
dist(sequence, sequence)
Examples (it's hard to spell Levenshtein correctly):
Help on built-in function dist in module cdiffer:
dist(...)
Compute absolute Levenshtein distance of two strings.
dist(sequence, sequence)
Examples (it's hard to spell Levenshtein correctly):
>>> dist('coffee', 'cafe')
4
>>> dist(list('coffee'), list('cafe'))
4
>>> dist(tuple('coffee'), tuple('cafe'))
4
>>> dist(iter('coffee'), iter('cafe'))
4
>>> dist(range(4), range(5))
1
>>> dist('coffee', 'xxxxxx')
12
>>> dist('coffee', 'coffee')
0
cdiffer.similar
Compute similarity of two strings.
Usage
similar(sequence, sequence)
The similarity is a number between 0 and 1, base on levenshtein edit distance.
Examples
>>> from cdiffer import similar
>>>
>>> similar('coffee', 'cafe')
0.6
>>> similar('hoge', 'bar')
0.0
cdiffer.differ
Find sequence of edit operations transforming one string to another.
Usage
differ(source_sequence, destination_sequence, diffonly=False, rep_rate=60)
Examples
>>> from cdiffer import differ
>>>
>>> for x in differ('coffee', 'cafe'):
... print(x)
...
['equal', 0, 0, 'c', 'c']
['delete', 1, None,'o',None]
['insert', None, 1,None,'a']
['equal', 2, 2, 'f', 'f']
['delete', 3, None,'f',None]
['delete', 4, None,'e',None]
['equal', 5, 3, 'e', 'e']
>>> for x in differ('coffee', 'cafe', diffonly=True):
... print(x)
...
['delete', 1, None,'o',None]
['insert', None, 1,None,'a']
['delete', 3, None,'f',None]
['delete', 4, None,'e',None]
>>> for x in differ('coffee', 'cafe', rep_rate = 0):
... print(x)
...
['equal', 0, 0, 'c', 'c']
['replace', 1, 1, 'o', 'a']
['equal', 2, 2, 'f', 'f']
['delete', 3, None,'f',None]
['delete', 4, None,'e',None]
['equal', 5, 3, 'e', 'e']
>>> for x in differ('coffee', 'cafe', diffonly=True, rep_rate = 0):
... print(x)
...
['replace', 1, 1, 'o', 'a']
['delete', 3, None,'f',None]
['delete', 4, None,'e',None]
cdiffer.compare
compare and prety printing 2 sequence data.
Usage
compare(source_sequence, destination_sequence, diffonly=False, rep_rate=60, condition_value=" ---> ")
Examples
>>> from cdiffer import compare
... compare('coffee', 'cafe')
[[60, 'insert', 'c', 'a', 'f', 'e'],
[60, 'delete', 'c', 'o', 'f', 'f', 'e', 'e']]
Performance
C:\Windows\system>ipython
Python 3.7.7 (tags/v3.7.7:d7c567b08f, Mar 10 2020, 10:41:24) [MSC v.1900 64 bit (AMD64)]
Type 'copyright', 'credits' or 'license' for more information
IPython 7.21.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]: from cdiffer import *
In [2]: %timeit dist('coffee', 'cafe')
...: %timeit dist(list('coffee'), list('cafe'))
...: %timeit dist(tuple('coffee'), tuple('cafe'))
...: %timeit dist(iter('coffee'), iter('cafe'))
...: %timeit dist(range(4), range(5))
...: %timeit dist('coffee', 'xxxxxx')
...: %timeit dist('coffee', 'coffee')
125 ns ± 0.534 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
677 ns ± 2.3 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
638 ns ± 3.42 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
681 ns ± 2.16 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
843 ns ± 3.66 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
125 ns ± 0.417 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
50.5 ns ± 0.338 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
In [3]: %timeit similar('coffee', 'cafe')
...: %timeit similar(list('coffee'), list('cafe'))
...: %timeit similar(tuple('coffee'), tuple('cafe'))
...: %timeit similar(iter('coffee'), iter('cafe'))
...: %timeit similar(range(4), range(5))
...: %timeit similar('coffee', 'xxxxxx')
...: %timeit similar('coffee', 'coffee')
123 ns ± 0.301 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
680 ns ± 2.64 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
647 ns ± 1.78 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
680 ns ± 7.57 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
848 ns ± 4.19 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
130 ns ± 0.595 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
54.8 ns ± 0.691 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
In [4]: %timeit differ('coffee', 'cafe')
...: %timeit differ(list('coffee'), list('cafe'))
...: %timeit differ(tuple('coffee'), tuple('cafe'))
...: %timeit differ(iter('coffee'), iter('cafe'))
...: %timeit differ(range(4), range(5))
...: %timeit differ('coffee', 'xxxxxx')
...: %timeit differ('coffee', 'coffee')
735 ns ± 4.18 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
1.36 µs ± 5.17 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
1.31 µs ± 5.25 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
1.37 µs ± 5.04 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
1.33 µs ± 5.32 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
1.07 µs ± 6.75 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
638 ns ± 3.67 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [5]: a = dict(zip('012345', 'coffee'))
...: b = dict(zip('0123', 'cafe'))
...: %timeit dist(a, b)
...: %timeit similar(a, b)
...: %timeit differ(a, b)
524 ns ± 2.6 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
539 ns ± 2.23 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
1.07 µs ± 1.9 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [6]: %timeit compare("coffee", "cafe")
...: %timeit compare([list("abc"), list("abc")], [list("abc"), list("acc"), list("xtz")], rep_rate=50)
...: %timeit compare(["abc", "abc"], ["abc", "acc", "xtz"], rep_rate=40)
...: %timeit compare(["abc", "abc"], ["abc", "acc", "xtz"], rep_rate=50)
844 ns ± 3.88 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
3.32 µs ± 6.92 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
1.16 µs ± 3.94 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
1.3 µs ± 31.5 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
cdiffer-0.6.10.tar.gz
(29.7 kB
view hashes)
Built Distributions
Close
Hashes for cdiffer-0.6.10-cp39-cp39-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e82f2d70babc12caf6a63102d2733e3acc8534cf26aec6abaedd1beda7bc0712 |
|
MD5 | e0661b642871c0f4fc0ac4e1acdcb859 |
|
BLAKE2b-256 | e9f60780307fa530c470b3c41e64a210dc5b74e4f45b1cfc8e5531c0cd848d1b |
Close
Hashes for cdiffer-0.6.10-cp39-cp39-manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b4effe8d705fdbc3550dca402324b300055f01726b626343fd7813a661b18912 |
|
MD5 | efff2d90825e8c6ff992863b8605dc3b |
|
BLAKE2b-256 | 042d8b7478bbd54cf26695cb4b4dfa1f821533dd2e9eb66294a4faf0da4488e9 |
Close
Hashes for cdiffer-0.6.10-cp39-cp39-manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | bb34c145bb03d43b02418caa6e7154a9871511af4f06ab401b0186043d06eabe |
|
MD5 | 9e4533ca30109e1990377dcfc93a0fcc |
|
BLAKE2b-256 | 6b1d35339eac038dbca20d287c0879782d8b7770e8455973e4bc77712c6ab4ac |
Close
Hashes for cdiffer-0.6.10-cp39-cp39-macosx_10_16_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3f3ef75d109787e37c6657d1a37420366429160853cf7afd948404d5468653eb |
|
MD5 | 53f21cde1eff2fb9294c2300914a0a4a |
|
BLAKE2b-256 | cc5632f785f3111b5d598c9b837a3e2726f544d506f1baccdf9aa12583893b01 |
Close
Hashes for cdiffer-0.6.10-cp38-cp38-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ff7800899d88ef6656c2764e5f6edcc52a4fff81896fe5849ee1624cb6d01f1e |
|
MD5 | cf6911f63bca7c86d71d7dd820e1cf7a |
|
BLAKE2b-256 | 00c3d20f6640b1233d59f6c08b9cc3c05e551527ad2d739e6f20581a402ea36b |
Close
Hashes for cdiffer-0.6.10-cp38-cp38-manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ad893349bd96f9b1049ca3930c91f790ac4c2e61dc5de196686375afb7497ae0 |
|
MD5 | e3fd8a217b47f45d85746dd769a5bce1 |
|
BLAKE2b-256 | fc015980f7642741701b32d966da087b2a0669022b3fd6ffe7cb2d947fa1f494 |
Close
Hashes for cdiffer-0.6.10-cp38-cp38-macosx_10_16_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 07469b7cea0439ff83d4ed657261c0d3473c347a7b736813d4d4e89dbb07f45f |
|
MD5 | cc278f949f3f34772e8b18ed1bc00740 |
|
BLAKE2b-256 | c285fb7edb885226374fbe679dfa02e7f632718a381a10f237ebdf91089e3280 |
Close
Hashes for cdiffer-0.6.10-cp37-cp37m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 493ecfb0b5d2a3bd15c4a1a258de31ce833def5975a56bf0b99f06222f6119c4 |
|
MD5 | 939387ad7debf8e19d82bf270e73c90a |
|
BLAKE2b-256 | e679677cff34afd3135a5f8774d18a2b29be5bfedf1b26ea7e684073da158450 |
Close
Hashes for cdiffer-0.6.10-cp37-cp37m-manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4ce7e4a7624ed1206d06baff892e34a44986412150d636d2518a27474f297038 |
|
MD5 | 0c981b8d9a1816e5180aa800c4450774 |
|
BLAKE2b-256 | cf393a111b5b7094be7456fe1838d3b15d997ce4b9271b9599abb3b758c60e6e |
Close
Hashes for cdiffer-0.6.10-cp37-cp37m-manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5c8d0c9335964fae6415439c56f936e713e9dc24129af8f83b75129545feb3fa |
|
MD5 | aa3afe5c448b7a5d17fb2c5dc0e00ba2 |
|
BLAKE2b-256 | db8cad411f6b7efc187423ccf81986553685823e6cdd15e2e3f152b3edd05554 |
Close
Hashes for cdiffer-0.6.10-cp37-cp37m-macosx_10_16_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 076e67fd2e867cd1526802d30f6b4a2d146b941b9fd00075d4e0cd7b086fcebc |
|
MD5 | ff182bed1a8a2a4049d1021889c4052d |
|
BLAKE2b-256 | 7000c0e890e695cc3823f0784a7cbcc7ebf230713455deb8395b0555613e0773 |
Close
Hashes for cdiffer-0.6.10-cp36-cp36m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 192304c24ffd9cdac096082121c5d4cf730730dcdc42c0e45de8b590b383c707 |
|
MD5 | 306e51ee388cab24c2fa0c1657d8ceaf |
|
BLAKE2b-256 | e5b811111de5912a8758e3ea4e8246ca5fabc3f337833cb64aecd910ebdf14a1 |
Close
Hashes for cdiffer-0.6.10-cp36-cp36m-manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4356457b4deafeafe9d5a3f81e467b45577f830662cb6ad274443396d4968d27 |
|
MD5 | 3b30a6064c2ca635cc5a4b801834a437 |
|
BLAKE2b-256 | 543658e8f662f9f2c92dddbd8def7477acdcb98d0e79282366606cae1dd56c32 |
Close
Hashes for cdiffer-0.6.10-cp36-cp36m-macosx_10_16_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ffa54e4ffc13135b2ee620fcedc31ae5a4e4cf46a9ac25fb6ec4d279165eb6d9 |
|
MD5 | 736053715fd3d98d3d2e6deb8179487d |
|
BLAKE2b-256 | 0f14dac0347707b3f100440c7b23a641dc9416691402bc0bfbdb003dfbafc8c7 |