fuzzysearch is useful for finding approximate subsequence matches
Project description
fuzzysearch is useful for finding approximate subsequence matches
Free software: MIT license
Documentation: http://fuzzysearch.rtfd.org.
Installation
Just install using pip:
$ pip install fuzzysearch
Features
Fuzzy sub-sequence search: Find parts of a sequence which match a given sub-sequence up to a given maximum Levenshtein distance.
Set individual limits for the number of substitutions, insertions and/or deletions allowed for a near-match.
Includes optimized implementations for specific use-cases, e.g. only allowing substitutions in near-matches.
Simple Example
You can usually just use the find_near_matches() utility function, which chooses a suitable fuzzy search implementation according to the given parameters:
>>> from fuzzysearch import find_near_matches
>>> find_near_matches('PATTERN', 'aaaPATERNaaa', max_l_dist=1)
[Match(start=3, end=9, dist=1)]
Advanced Example
If needed you can choose a specific search implementation, such as find_near_matches_with_ngrams():
>>> sequence = '''\
GACTAGCACTGTAGGGATAACAATTTCACACAGGTGGACAATTACATTGAAAATCACAGATTGGTCACACACACA
TTGGACATACATAGAAACACACACACATACATTAGATACGAACATAGAAACACACATTAGACGCGTACATAGACA
CAAACACATTGACAGGCAGTTCAGATGATGACGCCCGACTGATACTCGCGTAGTCGTGGGAGGCAAGGCACACAG
GGGATAGG'''
>>> subsequence = 'TGCACTGTAGGGATAACAAT' #distance 1
>>> max_distance = 2
>>> from fuzzysearch import find_near_matches_with_ngrams
>>> find_near_matches_with_ngrams(subsequence, sequence, max_distance)
[Match(start=3, end=24, dist=1)]
History
0.3.0 (2015-02-12)
Added C extensions for several search functions as well as internal functions
Use C extensions if available, or pure-Python implementations otherwise
setup.py attempts to build C extensions, but installs without if build fails
Added --noexts setup.py option to avoid trying to build the C extensions
Greatly improved testing and coverage
0.2.2 (2014-03-27)
Added support for searching through BioPython Seq objects
Added specialized search function allowing only subsitutions and insertions
Fixed several bugs
0.2.1 (2014-03-14)
Fixed major match grouping bug
0.2.0 (2013-03-13)
New utility function find_near_matches() for easier use
Additional documentation
0.1.0 (2013-11-12)
Two working implementations
Extensive test suite; all tests passing
Full support for Python 2.6-2.7 and 3.1-3.3
Bumped status from Pre-Alpha to Alpha
0.0.1 (2013-11-01)
First release on PyPI.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for fuzzysearch-0.3.0-cp34-cp34m-macosx_10_8_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 356dcc2c0e77df37cd957f1a2e025f07485c0cbf6294aac599f95a043da77418 |
|
MD5 | 59a88726871e7ac50418e72b9cbfff7f |
|
BLAKE2b-256 | 5658d668d5c0fd4c191cc24e1024fdc8b27a73659878012b8c19a9ca8422b672 |
Hashes for fuzzysearch-0.3.0-cp33-cp33m-macosx_10_8_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7742dd2d9c423b09e36c86748ecb73f738d0a262466e0e05d440e61cd2809950 |
|
MD5 | 00ecc9cb6f7d34eaad512bb4d9c070a6 |
|
BLAKE2b-256 | 0b607db92eb570a9b2175cb126a6da94a63dd7e873a3d81588d6b4a00189d1ce |
Hashes for fuzzysearch-0.3.0-cp32-cp32m-macosx_10_8_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 261bd36950585b5b599b4551994d6a9bf75b565fb3606b3dc65ef771f13b8785 |
|
MD5 | eeeba71fc3780f3122ef4d9b66d522c8 |
|
BLAKE2b-256 | 22b69570e8e8a53834975ddeaa829cba2acb8e493ecd6c87f8e951acde82a5cf |
Hashes for fuzzysearch-0.3.0-cp27-none-macosx_10_8_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2d5555d58edbaad226bbddb05ab6a12de6ce93c384fd27f356ee2bc3cd0ac9e5 |
|
MD5 | d0867dafbd5403fdb9656b4f4bc926ef |
|
BLAKE2b-256 | 473cee7482967a78f16ab7172a19e34b2cfdabff5b5eb20cfd60a9ba4f9d349f |
Hashes for fuzzysearch-0.3.0-cp26-none-macosx_10_8_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 072b1792c57f28dff8954e08737bba001f82681ac3030d4a5b632c47ff208c99 |
|
MD5 | c5339950ee88e5683b99cb34a7206cc4 |
|
BLAKE2b-256 | 40c720ae90cd58004a57b19f934ac6a09aee2af9668c4908376df1bfaf8f61ac |