fuzzysearch is useful for finding approximate subsequence matches
Project description
fuzzysearch is useful for finding approximate subsequence matches
Free software: MIT license
Documentation: http://fuzzysearch.rtfd.org.
Features
Fuzzy sub-sequence search: Find parts of a sequence which match a given sub-sequence up to a given maximum Levenshtein distance.
Simple Example
You can usually just use the find_near_matches() utility function, which chooses a suitable fuzzy search implementation according to the given parameters:
>>> from fuzzysearch import find_near_matches
>>> find_near_matches('PATTERN', 'aaaPATERNaaa', max_l_dist=1)
[Match(start=3, end=9, dist=1)]
Advanced Example
If needed you can choose a specific search implementation, such as find_near_matches_with_ngrams():
>>> sequence = '''\
GACTAGCACTGTAGGGATAACAATTTCACACAGGTGGACAATTACATTGAAAATCACAGATTGGTCACACACACA
TTGGACATACATAGAAACACACACACATACATTAGATACGAACATAGAAACACACATTAGACGCGTACATAGACA
CAAACACATTGACAGGCAGTTCAGATGATGACGCCCGACTGATACTCGCGTAGTCGTGGGAGGCAAGGCACACAG
GGGATAGG'''
>>> subsequence = 'TGCACTGTAGGGATAACAAT' #distance 1
>>> max_distance = 2
>>> from fuzzysearch import find_near_matches_with_ngrams
>>> find_near_matches_with_ngrams(subsequence, sequence, max_distance)
[Match(start=3, end=24, dist=1)]
History
0.1.0 (2013-11-12)
Two working implementations
Extensive test suite; all tests passing
Full support for Python 2.6-2.7 and 3.1-3.3
Bumped status from Pre-Alpha to Alpha
0.0.1 (2013-11-01)
First release on PyPI.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file fuzzysearch-0.2.0.tar.gz.
File metadata
- Download URL: fuzzysearch-0.2.0.tar.gz
- Upload date:
- Size: 7.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
21eb2ab04698c42740160a5fde7edfb14c4d1bbece0a24711c8e7e25e5b1afd1
|
|
| MD5 |
6b411ac88406ffe52f21ff98cbfe61df
|
|
| BLAKE2b-256 |
e9cc798296a5e5a65a10c89bfd298c96f4c936cfc574dd014f60216c823937cd
|