fuzzysearch is useful for finding approximate subsequence matches
Project description
fuzzysearch is useful for finding approximate subsequence matches
Free software: MIT license
Documentation: http://fuzzysearch.rtfd.org.
Installation
Just install using pip:
$ pip install fuzzysearch
Features
Fuzzy sub-sequence search: Find parts of a sequence which match a given sub-sequence up to a given maximum Levenshtein distance.
Set individual limits for the number of substitutions, insertions and/or deletions allowed for a near-match.
Includes optimized implementations for specific use-cases, e.g. only allowing substitutions in near-matches.
Simple Example
You can usually just use the find_near_matches() utility function, which chooses a suitable fuzzy search implementation according to the given parameters:
>>> from fuzzysearch import find_near_matches
>>> find_near_matches('PATTERN', 'aaaPATERNaaa', max_l_dist=1)
[Match(start=3, end=9, dist=1)]
Advanced Example
If needed you can choose a specific search implementation, such as find_near_matches_with_ngrams():
>>> sequence = '''\
GACTAGCACTGTAGGGATAACAATTTCACACAGGTGGACAATTACATTGAAAATCACAGATTGGTCACACACACA
TTGGACATACATAGAAACACACACACATACATTAGATACGAACATAGAAACACACATTAGACGCGTACATAGACA
CAAACACATTGACAGGCAGTTCAGATGATGACGCCCGACTGATACTCGCGTAGTCGTGGGAGGCAAGGCACACAG
GGGATAGG'''
>>> subsequence = 'TGCACTGTAGGGATAACAAT' #distance 1
>>> max_distance = 2
>>> from fuzzysearch import find_near_matches_with_ngrams
>>> find_near_matches_with_ngrams(subsequence, sequence, max_distance)
[Match(start=3, end=24, dist=1)]
History
0.3.0 (2015-02-12)
Added C extensions for several search functions as well as internal functions
Use C extensions if available, or pure-Python implementations otherwise
setup.py attempts to build C extensions, but installs without if build fails
Added --noexts setup.py option to avoid trying to build the C extensions
Greatly improved testing and coverage
0.2.2 (2014-03-27)
Added support for searching through BioPython Seq objects
Added specialized search function allowing only subsitutions and insertions
Fixed several bugs
0.2.1 (2014-03-14)
Fixed major match grouping bug
0.2.0 (2013-03-13)
New utility function find_near_matches() for easier use
Additional documentation
0.1.0 (2013-11-12)
Two working implementations
Extensive test suite; all tests passing
Full support for Python 2.6-2.7 and 3.1-3.3
Bumped status from Pre-Alpha to Alpha
0.0.1 (2013-11-01)
First release on PyPI.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file fuzzysearch-0.3.0.tar.gz.
File metadata
- Download URL: fuzzysearch-0.3.0.tar.gz
- Upload date:
- Size: 52.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3450da8997f982dfffa77c13fdbac4958f0658a12fc5cafaca880275468d7d79
|
|
| MD5 |
8da26e7e42aa7ef88638eb94ad43ef6a
|
|
| BLAKE2b-256 |
9821388da53564725e2442de3f6a81dab023bc2d422e7843eda3fae207c1202e
|
File details
Details for the file fuzzysearch-0.3.0-cp34-cp34m-macosx_10_8_x86_64.whl.
File metadata
- Download URL: fuzzysearch-0.3.0-cp34-cp34m-macosx_10_8_x86_64.whl
- Upload date:
- Size: 55.0 kB
- Tags: CPython 3.4m, macOS 10.8+ x86-64
- Uploaded using Trusted Publishing? No
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
356dcc2c0e77df37cd957f1a2e025f07485c0cbf6294aac599f95a043da77418
|
|
| MD5 |
59a88726871e7ac50418e72b9cbfff7f
|
|
| BLAKE2b-256 |
5658d668d5c0fd4c191cc24e1024fdc8b27a73659878012b8c19a9ca8422b672
|
File details
Details for the file fuzzysearch-0.3.0-cp33-cp33m-macosx_10_8_x86_64.whl.
File metadata
- Download URL: fuzzysearch-0.3.0-cp33-cp33m-macosx_10_8_x86_64.whl
- Upload date:
- Size: 54.9 kB
- Tags: CPython 3.3m, macOS 10.8+ x86-64
- Uploaded using Trusted Publishing? No
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7742dd2d9c423b09e36c86748ecb73f738d0a262466e0e05d440e61cd2809950
|
|
| MD5 |
00ecc9cb6f7d34eaad512bb4d9c070a6
|
|
| BLAKE2b-256 |
0b607db92eb570a9b2175cb126a6da94a63dd7e873a3d81588d6b4a00189d1ce
|
File details
Details for the file fuzzysearch-0.3.0-cp32-cp32m-macosx_10_8_x86_64.whl.
File metadata
- Download URL: fuzzysearch-0.3.0-cp32-cp32m-macosx_10_8_x86_64.whl
- Upload date:
- Size: 54.8 kB
- Tags: CPython 3.2m, macOS 10.8+ x86-64
- Uploaded using Trusted Publishing? No
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
261bd36950585b5b599b4551994d6a9bf75b565fb3606b3dc65ef771f13b8785
|
|
| MD5 |
eeeba71fc3780f3122ef4d9b66d522c8
|
|
| BLAKE2b-256 |
22b69570e8e8a53834975ddeaa829cba2acb8e493ecd6c87f8e951acde82a5cf
|
File details
Details for the file fuzzysearch-0.3.0-cp27-none-macosx_10_8_x86_64.whl.
File metadata
- Download URL: fuzzysearch-0.3.0-cp27-none-macosx_10_8_x86_64.whl
- Upload date:
- Size: 54.4 kB
- Tags: CPython 2.7, macOS 10.8+ x86-64
- Uploaded using Trusted Publishing? No
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2d5555d58edbaad226bbddb05ab6a12de6ce93c384fd27f356ee2bc3cd0ac9e5
|
|
| MD5 |
d0867dafbd5403fdb9656b4f4bc926ef
|
|
| BLAKE2b-256 |
473cee7482967a78f16ab7172a19e34b2cfdabff5b5eb20cfd60a9ba4f9d349f
|
File details
Details for the file fuzzysearch-0.3.0-cp26-none-macosx_10_8_x86_64.whl.
File metadata
- Download URL: fuzzysearch-0.3.0-cp26-none-macosx_10_8_x86_64.whl
- Upload date:
- Size: 54.4 kB
- Tags: CPython 2.6, macOS 10.8+ x86-64
- Uploaded using Trusted Publishing? No
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
072b1792c57f28dff8954e08737bba001f82681ac3030d4a5b632c47ff208c99
|
|
| MD5 |
c5339950ee88e5683b99cb34a7206cc4
|
|
| BLAKE2b-256 |
40c720ae90cd58004a57b19f934ac6a09aee2af9668c4908376df1bfaf8f61ac
|