Edit distance, Similarity and 2 sequence differences printing
Project description
Python C Extention 2 Sequence Compare
Edit distance, Similarity and 2 sequence differences printing.
How to Install?
pip install cdiffer
Requirement
- python3.6 or later
cdiffer.dist
Compute absolute Levenshtein distance of two strings.
Usage
dist(sequence, sequence)
Examples (it's hard to spell Levenshtein correctly):
Help on built-in function dist in module cdiffer:
dist(...)
Compute absolute Levenshtein distance of two strings.
dist(sequence, sequence)
Examples (it's hard to spell Levenshtein correctly):
>>> dist('coffee', 'cafe')
4
>>> dist(list('coffee'), list('cafe'))
4
>>> dist(tuple('coffee'), tuple('cafe'))
4
>>> dist(iter('coffee'), iter('cafe'))
4
>>> dist(range(4), range(5))
1
>>> dist('coffee', 'xxxxxx')
12
>>> dist('coffee', 'coffee')
0
cdiffer.similar
Compute similarity of two strings.
Usage
similar(sequence, sequence)
The similarity is a number between 0 and 1, base on levenshtein edit distance.
Examples
>>> from cdiffer import similar
>>>
>>> similar('coffee', 'cafe')
0.6
>>> similar('hoge', 'bar')
0.0
cdiffer.differ
Find sequence of edit operations transforming one string to another.
Usage
differ(source_sequence, destination_sequence, diffonly=False, rep_rate=60)
Examples
>>> from cdiffer import differ
>>>
>>> for x in differ('coffee', 'cafe'):
... print(x)
...
['equal', 0, 0, 'c', 'c']
['delete', 1, None,'o',None]
['insert', None, 1,None,'a']
['equal', 2, 2, 'f', 'f']
['delete', 3, None,'f',None]
['delete', 4, None,'e',None]
['equal', 5, 3, 'e', 'e']
>>> for x in differ('coffee', 'cafe', diffonly=True):
... print(x)
...
['delete', 1, None,'o',None]
['insert', None, 1,None,'a']
['delete', 3, None,'f',None]
['delete', 4, None,'e',None]
>>> for x in differ('coffee', 'cafe', rep_rate = 0):
... print(x)
...
['equal', 0, 0, 'c', 'c']
['replace', 1, 1, 'o', 'a']
['equal', 2, 2, 'f', 'f']
['delete', 3, None,'f',None]
['delete', 4, None,'e',None]
['equal', 5, 3, 'e', 'e']
>>> for x in differ('coffee', 'cafe', diffonly=True, rep_rate = 0):
... print(x)
...
['replace', 1, 1, 'o', 'a']
['delete', 3, None,'f',None]
['delete', 4, None,'e',None]
cdiffer.compare
This Function is compare and prety printing 2 sequence data.
Usage
compare(source_sequence, destination_sequence, diffonly=False, rep_rate=60, condition_value=" ---> ")
Parameters :
arg1 -> iterable : left comare target data.
arg2 -> iterable : right comare target data.
keya -> callable one argument function : Using sort and compare with key about `a` object.
keyb -> callable one argument function : Using sort and compare with key about `a` object.
header -> bool : output data with header(True) or without header(False). <default True>
diffonly -> bool : output data with equal data(False) or without equal data(True). <default False>
rep_rate -> int: Threshold to be considered as replacement.(-1 ~ 100). -1: allways replacement.
startidx -> int: output record index starting number. <default `0`>
condition_value -> str : Conjunctions for comparison.
na_value -> str: if not found data when filled value.
delete_sign_value -> str: if deleted data when adding sign value.
insert_sign_value -> str: if insert data when adding sign value.
Return : Lists of List
1st column -> matching rate (0 ~ 100).
2nd column -> matching tagname (unicode string).
3rd over -> compare data.
Examples
In [1]: from cdiffer import compare
... compare('coffee', 'cafe')
[['tag', 'index_a', 'index_b', 'data'],
['equal', 0, 0, 'c'],
['insert', '-', 1, 'ADD ---> a'],
['delete', 1, '-', 'o ---> DEL'],
['equal', 2, 2, 'f'],
['delete', 3, '-', 'f ---> DEL'],
['equal', 4, 3, 'e'],
['delete', 5, '-', 'e ---> DEL']]
In [2]: compare([list("abc"), list("abc")], [list("abc"), list("acc"), list("xtz")], rep_rate=50)
[['tag', 'index_a', 'index_b', 'COL_00', 'COL_01', 'COL_02', 'COL_03'],
['equal', 0, 0, 'a', 'b', 'c'],
['replace', 1, 1, 'a', 'b ---> DEL', 'ADD ---> c', 'c'],
['insert', '-', 2, 'ADD ---> x', 'ADD ---> t', 'ADD ---> z']]
In [3]: compare(["abc", "abc"], ["abc", "acc", "xtz"], rep_rate=40)
[['tag', 'index_a', 'index_b', 'data'],
['equal', 0, 0, 'abc'],
['replace', 1, 1, 'abc ---> acc'],
['insert', '-', 2, 'ADD ---> xtz']]
In [4]: compare(["abc", "abc"], ["abc", "acc", "xtz"], rep_rate=50)
[['tag', 'index_a', 'index_b', 'data'],
['equal', 0, 0, 'abc'],
['replace', 1, 1, 'abc ---> acc'],
['insert', '-', 2, 'ADD ---> xtz']]
Performance
C:\Windows\system>ipython
Python 3.7.7 (tags/v3.7.7:d7c567b08f, Mar 10 2020, 10:41:24) [MSC v.1900 64 bit (AMD64)]
Type 'copyright', 'credits' or 'license' for more information
IPython 7.21.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]: from cdiffer import *
In [2]: %timeit dist('coffee', 'cafe')
...: %timeit dist(list('coffee'), list('cafe'))
...: %timeit dist(tuple('coffee'), tuple('cafe'))
...: %timeit dist(iter('coffee'), iter('cafe'))
...: %timeit dist(range(4), range(5))
...: %timeit dist('coffee', 'xxxxxx')
...: %timeit dist('coffee', 'coffee')
125 ns ± 0.534 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
677 ns ± 2.3 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
638 ns ± 3.42 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
681 ns ± 2.16 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
843 ns ± 3.66 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
125 ns ± 0.417 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
50.5 ns ± 0.338 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
In [3]: %timeit similar('coffee', 'cafe')
...: %timeit similar(list('coffee'), list('cafe'))
...: %timeit similar(tuple('coffee'), tuple('cafe'))
...: %timeit similar(iter('coffee'), iter('cafe'))
...: %timeit similar(range(4), range(5))
...: %timeit similar('coffee', 'xxxxxx')
...: %timeit similar('coffee', 'coffee')
123 ns ± 0.301 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
680 ns ± 2.64 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
647 ns ± 1.78 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
680 ns ± 7.57 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
848 ns ± 4.19 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
130 ns ± 0.595 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
54.8 ns ± 0.691 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
In [4]: %timeit differ('coffee', 'cafe')
...: %timeit differ(list('coffee'), list('cafe'))
...: %timeit differ(tuple('coffee'), tuple('cafe'))
...: %timeit differ(iter('coffee'), iter('cafe'))
...: %timeit differ(range(4), range(5))
...: %timeit differ('coffee', 'xxxxxx')
...: %timeit differ('coffee', 'coffee')
735 ns ± 4.18 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
1.36 µs ± 5.17 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
1.31 µs ± 5.25 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
1.37 µs ± 5.04 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
1.33 µs ± 5.32 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
1.07 µs ± 6.75 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
638 ns ± 3.67 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [5]: a = dict(zip('012345', 'coffee'))
...: b = dict(zip('0123', 'cafe'))
...: %timeit dist(a, b)
...: %timeit similar(a, b)
...: %timeit differ(a, b)
524 ns ± 2.6 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
539 ns ± 2.23 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
1.07 µs ± 1.9 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [6]: %timeit compare("coffee", "cafe")
...: %timeit compare([list("abc"), list("abc")], [list("abc"), list("acc"), list("xtz")], rep_rate=50)
...: %timeit compare(["abc", "abc"], ["abc", "acc", "xtz"], rep_rate=40)
...: %timeit compare(["abc", "abc"], ["abc", "acc", "xtz"], rep_rate=50)
844 ns ± 3.88 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
3.32 µs ± 6.92 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
1.16 µs ± 3.94 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
1.3 µs ± 31.5 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for cdiffer-0.7.1-cp39-cp39-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a77f98f12590572033c92fede50289c94672905fd29daa6e071af070e8c8940b |
|
MD5 | 9007ea327917a1c9f3ee387eb0f3dac8 |
|
BLAKE2b-256 | 47e2e93bd5bc7f6d0c23b55d908ba2c6ac8b570bf581395522603eb74f7b33de |
Hashes for cdiffer-0.7.1-cp39-cp39-manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e0c056cbbe39d550b7577ae9398db6a5f793ad179cc37bdf9d3dfd57ffb5324f |
|
MD5 | ef7fcd5e65b5dc8042cc7c7b33a3081a |
|
BLAKE2b-256 | 21c0727a3d44bafa41054db4d21d8185e60cdda22d550e9d3fc725b847f99cb5 |
Hashes for cdiffer-0.7.1-cp39-cp39-macosx_10_16_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7aa67becfdd81dd6c3f3f9a1da0d6723dc956387540ccc601246c9f8caad8b8c |
|
MD5 | b4a9311428543163554e6da4230d2e4e |
|
BLAKE2b-256 | 638ce54bd0906565cbf8f99e125829c7c5794a1005d3c3961e600d360c2c40e0 |
Hashes for cdiffer-0.7.1-cp38-cp38-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 83aeb97fc9faa2426432f05cf03589367c3373b41b8dd8ba76f2bc456a64f339 |
|
MD5 | 6a473f8acb8ff8363a77c8bbfcf7ddf8 |
|
BLAKE2b-256 | 005c08e715375546201b415a156dc6d42eb2d58d9c6eb07ebaf7a9ca65f0d4b5 |
Hashes for cdiffer-0.7.1-cp38-cp38-manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 27b32fa13d7d96700ffa84bd29d8bbbd2e6a0f8a0c2c6521ee1463e84e5cc392 |
|
MD5 | 579b36fb1bf1396bba4effc81ff8014a |
|
BLAKE2b-256 | 5a06758a3d025ddd290acb15c0008c0688abaa8c320d6fd297b5d5c86dd713c4 |
Hashes for cdiffer-0.7.1-cp38-cp38-macosx_10_16_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b2f2e758ede300e5845a7053094a10db2470413ca9bee0a184a5d13e3d61ab21 |
|
MD5 | 88325bb4ffa7d29b7899fb59ec9f84b7 |
|
BLAKE2b-256 | a67a92674541119f27eba4122d98854b0663011a14b23e78cbe18ce3e0189baf |
Hashes for cdiffer-0.7.1-cp37-cp37m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9a1cf9c4cfbf985abe281cfa3171073849442cf5c38d00437d963331f66c2ed5 |
|
MD5 | 618d93bb30deb263b58f5c526cf1272e |
|
BLAKE2b-256 | 19c3d48822f9b7e2ce14b4f64bc832df4addc3ce660ada79c13bd10a7d77ffbb |
Hashes for cdiffer-0.7.1-cp37-cp37m-manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f6d14b86f7f908d2f73b1da68ca2db0ea01a42dda644aa133f7c0a36d6342976 |
|
MD5 | fc83c5c82b38e5770c94a99919f38f94 |
|
BLAKE2b-256 | d8495c4e00f5dcccebe402ebc7aa24fd12b9501556b87a979168cfe3812fbf92 |
Hashes for cdiffer-0.7.1-cp37-cp37m-macosx_10_16_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 03fd47f42282a253458e2768d57cbe9c8ce6fa448880390d3f44aeee44342bca |
|
MD5 | cb67d0065c8bf7961678340864ae40c5 |
|
BLAKE2b-256 | a86de88245608a0a770393244293ec5e6e1fda5e3d1f6b237633078720f758fd |
Hashes for cdiffer-0.7.1-cp36-cp36m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | bd7aa30f0c9b627d43562c6f730414288dbfdf0d4868269f3c729044aa13c070 |
|
MD5 | 5df4bd05e877b1eadd34ec7959489870 |
|
BLAKE2b-256 | 23813aa04403b1fd634c694ec2c2e3f5b8355df1881f6802c53fe632e7461dca |
Hashes for cdiffer-0.7.1-cp36-cp36m-manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3b5efd85eb24a6e33dcef30b1de13497abc3c39848fc424b3042ad828cae4a01 |
|
MD5 | 8a5061805a41b48f02d3f671340394b8 |
|
BLAKE2b-256 | c6322c4482f98fbea267436c5e3a8b796be1b46e3a936c7d3ff20ead0041ef4b |
Hashes for cdiffer-0.7.1-cp36-cp36m-macosx_10_16_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 07f6ff8e354673c86d16fa9e1f2f232e0e846ee73b4931cbb1af3a0566ea2dbc |
|
MD5 | c463a3199ef69f9e2e3683dfb15be898 |
|
BLAKE2b-256 | b54376eabe6f0b14fcfb6f6ec9339b6ca73d572efe40a04d10b6922877c57eab |