fuzzysearch is useful for finding approximate subsequence matches
Project description
fuzzysearch is a Python library for fuzzy substring searches. It implements efficient ad-hoc searching for approximate sub-sequences. Matching is done using a generalized Levenshtein Distance metric, with configurable parameters.
Free software: MIT license
Documentation: http://fuzzysearch.rtfd.org.
Installation
Just install using pip:
$ pip install fuzzysearch
Features
Fuzzy sub-sequence search: Find parts of a sequence which match a given sub-sequence.
Easy to use: A single function to call which returns a list of matches.
Set a maximum Levenshtein Distance for matches, including individual limits for the number of substitutions, insertions and/or deletions allowed for near-matches.
Includes optimized implementations for specific use-cases, e.g. allowing only substitutions.
Simple Examples
Just call find_near_matches() with the sequence to search, the sub-sequence you’re looking for, and the matching parameters:
>>> from fuzzysearch import find_near_matches
# search for 'PATTERN' with a maximum Levenshtein Distance of 1
>>> find_near_matches('PATTERN', '---PATERN---', max_l_dist=1)
[Match(start=3, end=9, dist=1)]
>>> sequence = '''\
GACTAGCACTGTAGGGATAACAATTTCACACAGGTGGACAATTACATTGAAAATCACAGATTGGTCACACACACA
TTGGACATACATAGAAACACACACACATACATTAGATACGAACATAGAAACACACATTAGACGCGTACATAGACA
CAAACACATTGACAGGCAGTTCAGATGATGACGCCCGACTGATACTCGCGTAGTCGTGGGAGGCAAGGCACACAG
GGGATAGG'''
>>> subsequence = 'TGCACTGTAGGGATAACAAT' # distance = 1
>>> find_near_matches(subsequence, sequence, max_l_dist=2)
[Match(start=3, end=24, dist=1)]
Advanced Search Criteria
The search function supports four possible match criteria, which may be supplied in any combination:
maximum Levenshtein distance
maximum # of subsitutions
maximum # of deletions (elements appearing in the pattern search for, which are skipped in the matching sub-sequence)
maximum # of insertions (elements added in the matching sub-sequence which don’t appear in the pattern search for)
Not supplying a criterion means that there is no limit for it. For this reason, one must always supply max_l_dist and/or all other criteria.
>>> find_near_matches('PATTERN', '---PATERN---', max_l_dist=1)
[Match(start=3, end=9, dist=1)]
# this will not match since max-deletions is set to zero
>>> find_near_matches('PATTERN', '---PATERN---', max_l_dist=1, max_deletions=0)
[]
# note that a deletion + insertion may be combined to match a substution
>>> find_near_matches('PATTERN', '---PAT-ERN---', max_deletions=1, max_insertions=1, max_substitutions=0)
[Match(start=3, end=10, dist=1)] # the Levenshtein distance is still 1
# ... but deletion + insertion may also match other, non-substitution differences
>>> find_near_matches('PATTERN', '---PATERRN---', max_deletions=1, max_insertions=1, max_substitutions=0)
[Match(start=3, end=10, dist=2)]
History
0.3.0 (2015-02-12)
Added C extensions for several search functions as well as internal functions
Use C extensions if available, or pure-Python implementations otherwise
setup.py attempts to build C extensions, but installs without if build fails
Added --noexts setup.py option to avoid trying to build the C extensions
Greatly improved testing and coverage
0.2.2 (2014-03-27)
Added support for searching through BioPython Seq objects
Added specialized search function allowing only subsitutions and insertions
Fixed several bugs
0.2.1 (2014-03-14)
Fixed major match grouping bug
0.2.0 (2013-03-13)
New utility function find_near_matches() for easier use
Additional documentation
0.1.0 (2013-11-12)
Two working implementations
Extensive test suite; all tests passing
Full support for Python 2.6-2.7 and 3.1-3.3
Bumped status from Pre-Alpha to Alpha
0.0.1 (2013-11-01)
First release on PyPI.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file fuzzysearch-0.4.0.tar.gz.
File metadata
- Download URL: fuzzysearch-0.4.0.tar.gz
- Upload date:
- Size: 61.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
653f9bcbf86d81123deafcf7f10d883b96060568d39d3322fda091a098c320b8
|
|
| MD5 |
1f4985d247e86bf5833c81914066a6db
|
|
| BLAKE2b-256 |
b44cc6e2a26047a591de2613adaeb9111d069673cdc1d3ffe4e49473678a5e35
|
File details
Details for the file fuzzysearch-0.4.0-cp36-cp36m-win_amd64.whl.
File metadata
- Download URL: fuzzysearch-0.4.0-cp36-cp36m-win_amd64.whl
- Upload date:
- Size: 57.2 kB
- Tags: CPython 3.6m, Windows x86-64
- Uploaded using Trusted Publishing? No
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
91ce4b771645e7d1a873cab445ae1ecf3bab196360953d7ba0f0b06df42db668
|
|
| MD5 |
42613c3e5166f3ec6baac82b3f46c1c2
|
|
| BLAKE2b-256 |
222c7f92da0d59f78b05f6dd2f30d02b852a02d19f6b3a0c1c31bd76e249003f
|
File details
Details for the file fuzzysearch-0.4.0-cp36-cp36m-win32.whl.
File metadata
- Download URL: fuzzysearch-0.4.0-cp36-cp36m-win32.whl
- Upload date:
- Size: 53.0 kB
- Tags: CPython 3.6m, Windows x86
- Uploaded using Trusted Publishing? No
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f330ddfe8ff45ea13bc54af3ce8cb6209c2e5aae170569e9de3c2c6c7de87897
|
|
| MD5 |
71145447261856469c17ba6485078d7a
|
|
| BLAKE2b-256 |
421b49f929081a96340ded49981fc58ed2fdf1d6a9cb754eeb80b6273c15a9e9
|
File details
Details for the file fuzzysearch-0.4.0-cp36-cp36m-macosx_10_12_x86_64.whl.
File metadata
- Download URL: fuzzysearch-0.4.0-cp36-cp36m-macosx_10_12_x86_64.whl
- Upload date:
- Size: 60.9 kB
- Tags: CPython 3.6m, macOS 10.12+ x86-64
- Uploaded using Trusted Publishing? No
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1df0e1db5ec765c130ae0fe2191e07bf7847a26fe57dfd31159ae787b18ccab3
|
|
| MD5 |
362c0bec4027f170b859b0abc24884b3
|
|
| BLAKE2b-256 |
5a80ddb1b4d904334c4397928535b1596f53e2490014988aa0dacdf494eb4b8c
|
File details
Details for the file fuzzysearch-0.4.0-cp35-cp35m-win_amd64.whl.
File metadata
- Download URL: fuzzysearch-0.4.0-cp35-cp35m-win_amd64.whl
- Upload date:
- Size: 56.7 kB
- Tags: CPython 3.5m, Windows x86-64
- Uploaded using Trusted Publishing? No
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
46cc175e908b17c11b0c80de82a5cba13f05408f7f57189804dc5308bf8186c9
|
|
| MD5 |
1fa61f9cd11681070d73770b89e9e80b
|
|
| BLAKE2b-256 |
882edf5683c097507c3a4b0f1ae5323932580fb6b8e70ae5d2506bfec54ec0d3
|
File details
Details for the file fuzzysearch-0.4.0-cp35-cp35m-win32.whl.
File metadata
- Download URL: fuzzysearch-0.4.0-cp35-cp35m-win32.whl
- Upload date:
- Size: 53.0 kB
- Tags: CPython 3.5m, Windows x86
- Uploaded using Trusted Publishing? No
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
202a7f3876dc2edb55bd10a59d3b3197b6306a174efd2d6f63d7dae3a199a07a
|
|
| MD5 |
740b211a331d460c20695f167f6a97e8
|
|
| BLAKE2b-256 |
4b2d01002cc81f5d5b7ad32f5938a1f3217d75f9b23b8895b3734453e6361c4f
|
File details
Details for the file fuzzysearch-0.4.0-cp35-cp35m-macosx_10_12_x86_64.whl.
File metadata
- Download URL: fuzzysearch-0.4.0-cp35-cp35m-macosx_10_12_x86_64.whl
- Upload date:
- Size: 60.4 kB
- Tags: CPython 3.5m, macOS 10.12+ x86-64
- Uploaded using Trusted Publishing? No
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
538ecfa346465f42d7ad4e3c6e39995848a0330fd5e9d0731093b2ddc36fe628
|
|
| MD5 |
8f353bbee4125e62148a185f6242d432
|
|
| BLAKE2b-256 |
afd7005f5511b766930faa076ac044827837aba6487f7e429bd18a41f4192a57
|
File details
Details for the file fuzzysearch-0.4.0-cp34-cp34m-win32.whl.
File metadata
- Download URL: fuzzysearch-0.4.0-cp34-cp34m-win32.whl
- Upload date:
- Size: 50.7 kB
- Tags: CPython 3.4m, Windows x86
- Uploaded using Trusted Publishing? No
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
05db594f75533d767291acc77caf558b9307c5b757f65b94352f26d4947d8cc7
|
|
| MD5 |
2713ad4e062e22a0cd6083c43b6a7119
|
|
| BLAKE2b-256 |
0783c097b06b7ff677330bb25df61363dced12b62de74d3e2da611864e9699ef
|
File details
Details for the file fuzzysearch-0.4.0-cp34-cp34m-macosx_10_12_x86_64.whl.
File metadata
- Download URL: fuzzysearch-0.4.0-cp34-cp34m-macosx_10_12_x86_64.whl
- Upload date:
- Size: 60.9 kB
- Tags: CPython 3.4m, macOS 10.12+ x86-64
- Uploaded using Trusted Publishing? No
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4b51fa9abe0afad6790b88557513c37c40bd62580d3dcda9c81e9437f6f0d941
|
|
| MD5 |
89fec49a50065afce3116ed3e659c05b
|
|
| BLAKE2b-256 |
47c34977d7a520347e2c112aa2cee43f74d6c4ee83be4a7851fb3df478b1abca
|
File details
Details for the file fuzzysearch-0.4.0-cp33-cp33m-win32.whl.
File metadata
- Download URL: fuzzysearch-0.4.0-cp33-cp33m-win32.whl
- Upload date:
- Size: 50.7 kB
- Tags: CPython 3.3m, Windows x86
- Uploaded using Trusted Publishing? No
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a0cce7d17a94fd5cf40559bb2ce9a26a390cc193c609bd8e3f7bdc7e78336da8
|
|
| MD5 |
b6a95640f010e7e314c9236560f1b84a
|
|
| BLAKE2b-256 |
3bb7124e28c43a233cf73fd460f6b306fa18a38b28ac19a8a7556d4ec96f926b
|
File details
Details for the file fuzzysearch-0.4.0-cp33-cp33m-macosx_10_12_x86_64.whl.
File metadata
- Download URL: fuzzysearch-0.4.0-cp33-cp33m-macosx_10_12_x86_64.whl
- Upload date:
- Size: 61.1 kB
- Tags: CPython 3.3m, macOS 10.12+ x86-64
- Uploaded using Trusted Publishing? No
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a9e389b2ff58aacf741927edf184a065446c8de0396b4e5f128dccab8a7477f3
|
|
| MD5 |
b5a1fe05765f268a5b5b68d0b297a99d
|
|
| BLAKE2b-256 |
aa1a4bfd41d26fff9348bbf5a955d1db328bc5dcc3bcbe67571fa7b3a42df336
|
File details
Details for the file fuzzysearch-0.4.0-cp32-cp32m-macosx_10_12_x86_64.whl.
File metadata
- Download URL: fuzzysearch-0.4.0-cp32-cp32m-macosx_10_12_x86_64.whl
- Upload date:
- Size: 61.0 kB
- Tags: CPython 3.2m, macOS 10.12+ x86-64
- Uploaded using Trusted Publishing? No
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e65f8e30f6ef2b3fca1e14759903385c8c7496e1c9596147dec5db174b7f3c93
|
|
| MD5 |
9675d8b6a6376c14d81d3a18a934cf5d
|
|
| BLAKE2b-256 |
fd6600fac312df7aabfd48f1eb9a23f9dc148f37ebcd8f703369d6592c197395
|
File details
Details for the file fuzzysearch-0.4.0-cp27-cp27m-win_amd64.whl.
File metadata
- Download URL: fuzzysearch-0.4.0-cp27-cp27m-win_amd64.whl
- Upload date:
- Size: 51.8 kB
- Tags: CPython 2.7m, Windows x86-64
- Uploaded using Trusted Publishing? No
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9ba0980ea1810f4dc770ac91a081e81e1072f84e850eca6c9117fee604ae88d2
|
|
| MD5 |
6a19071a03ec685e3c8b27a73aec74a7
|
|
| BLAKE2b-256 |
c108a1bf2a82ca46c47284cc8aabf703606199e54626f91d655428845019336c
|
File details
Details for the file fuzzysearch-0.4.0-cp27-cp27m-win32.whl.
File metadata
- Download URL: fuzzysearch-0.4.0-cp27-cp27m-win32.whl
- Upload date:
- Size: 50.4 kB
- Tags: CPython 2.7m, Windows x86
- Uploaded using Trusted Publishing? No
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0d80bb9b501a11e94408e7921eaa54103cef7dd5958482ccbc07938d64eac59b
|
|
| MD5 |
e6b5ea9848490287e44556314c0d5433
|
|
| BLAKE2b-256 |
6dc701d613e440d5ae35bc55ce6763c1f77647a49d599582bce6c101361bf81e
|
File details
Details for the file fuzzysearch-0.4.0-cp27-cp27m-macosx_10_12_x86_64.whl.
File metadata
- Download URL: fuzzysearch-0.4.0-cp27-cp27m-macosx_10_12_x86_64.whl
- Upload date:
- Size: 61.0 kB
- Tags: CPython 2.7m, macOS 10.12+ x86-64
- Uploaded using Trusted Publishing? No
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a344d2cbeb4b50713bdaf6e8213f7a3fd420fc2aff69062c1d45ad423da94c2b
|
|
| MD5 |
022b50c75fef1d1ca45b7737fffee0a0
|
|
| BLAKE2b-256 |
5d251138fadb633b37b26e2a2cccb309ab3fc9e9997da8cae35dc101906313f3
|
File details
Details for the file fuzzysearch-0.4.0-cp26-cp26m-win_amd64.whl.
File metadata
- Download URL: fuzzysearch-0.4.0-cp26-cp26m-win_amd64.whl
- Upload date:
- Size: 53.0 kB
- Tags: CPython 2.6m, Windows x86-64
- Uploaded using Trusted Publishing? No
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9cc0dbdaddd03b91ba207b927b8e0e7dbeda3b3655d253278baa8df5a3dc4ff9
|
|
| MD5 |
2103b8b0e5cb3935e7830b592c48f297
|
|
| BLAKE2b-256 |
e6e014084170be52672c8e51208b339e33e96a76939e9bfc066d2d1a4b25c745
|
File details
Details for the file fuzzysearch-0.4.0-cp26-cp26m-win32.whl.
File metadata
- Download URL: fuzzysearch-0.4.0-cp26-cp26m-win32.whl
- Upload date:
- Size: 50.6 kB
- Tags: CPython 2.6m, Windows x86
- Uploaded using Trusted Publishing? No
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9413ee9baf2e213fe5ace3d4d78bc110a79a5a999cd2fbc422f695ebb3fa1aef
|
|
| MD5 |
f31cc3761775756627be40dcff94cc00
|
|
| BLAKE2b-256 |
051d482965a850aae0cab4dcd7a279f4cdf4bfa541ae0bb5da6763828b3052c8
|
File details
Details for the file fuzzysearch-0.4.0-cp26-cp26m-macosx_10_12_x86_64.whl.
File metadata
- Download URL: fuzzysearch-0.4.0-cp26-cp26m-macosx_10_12_x86_64.whl
- Upload date:
- Size: 60.3 kB
- Tags: CPython 2.6m, macOS 10.12+ x86-64
- Uploaded using Trusted Publishing? No
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a12a89ee09198da64aad1ed2a4b71aec5a8eab3f6c96b854af2e94e01d3c15f2
|
|
| MD5 |
042fb1fa19f1baf2938f7846a6c10320
|
|
| BLAKE2b-256 |
e9cf661dc3e6735a482d8b8e10755b5c521b924d2fc2992cf8e7db03b7005ab0
|