Skip to main content

fuzzysearch is useful for finding approximate subsequence matches

Project description

https://badge.fury.io/py/fuzzysearch.png https://travis-ci.org/taleinat/fuzzysearch.png?branch=master https://coveralls.io/repos/taleinat/fuzzysearch/badge.png?branch=master https://pypip.in/d/fuzzysearch/badge.png

fuzzysearch is useful for finding approximate subsequence matches

Installation

Just install using pip:

$ pip install fuzzysearch

Features

  • Fuzzy sub-sequence search: Find parts of a sequence which match a given sub-sequence up to a given maximum Levenshtein distance.

  • Set individual limits for the number of substitutions, insertions and/or deletions allowed for a near-match.

  • Includes optimized implementations for specific use-cases, e.g. only allowing substitutions in near-matches.

Simple Example

You can usually just use the find_near_matches() utility function, which chooses a suitable fuzzy search implementation according to the given parameters:

>>> from fuzzysearch import find_near_matches
>>> find_near_matches('PATTERN', 'aaaPATERNaaa', max_l_dist=1)
[Match(start=3, end=9, dist=1)]

Advanced Example

If needed you can choose a specific search implementation, such as find_near_matches_with_ngrams():

>>> sequence = '''\
GACTAGCACTGTAGGGATAACAATTTCACACAGGTGGACAATTACATTGAAAATCACAGATTGGTCACACACACA
TTGGACATACATAGAAACACACACACATACATTAGATACGAACATAGAAACACACATTAGACGCGTACATAGACA
CAAACACATTGACAGGCAGTTCAGATGATGACGCCCGACTGATACTCGCGTAGTCGTGGGAGGCAAGGCACACAG
GGGATAGG'''
>>> subsequence = 'TGCACTGTAGGGATAACAAT' #distance 1
>>> max_distance = 2

>>> from fuzzysearch import find_near_matches_with_ngrams
>>> find_near_matches_with_ngrams(subsequence, sequence, max_distance)
[Match(start=3, end=24, dist=1)]

History

0.3.0 (2015-02-12)

  • Added C extensions for several search functions as well as internal functions

  • Use C extensions if available, or pure-Python implementations otherwise

  • setup.py attempts to build C extensions, but installs without if build fails

  • Added --noexts setup.py option to avoid trying to build the C extensions

  • Greatly improved testing and coverage

0.2.2 (2014-03-27)

  • Added support for searching through BioPython Seq objects

  • Added specialized search function allowing only subsitutions and insertions

  • Fixed several bugs

0.2.1 (2014-03-14)

  • Fixed major match grouping bug

0.2.0 (2013-03-13)

  • New utility function find_near_matches() for easier use

  • Additional documentation

0.1.0 (2013-11-12)

  • Two working implementations

  • Extensive test suite; all tests passing

  • Full support for Python 2.6-2.7 and 3.1-3.3

  • Bumped status from Pre-Alpha to Alpha

0.0.1 (2013-11-01)

  • First release on PyPI.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fuzzysearch-0.3.0.tar.gz (52.7 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

fuzzysearch-0.3.0-cp34-cp34m-macosx_10_8_x86_64.whl (55.0 kB view details)

Uploaded CPython 3.4mmacOS 10.8+ x86-64

fuzzysearch-0.3.0-cp33-cp33m-macosx_10_8_x86_64.whl (54.9 kB view details)

Uploaded CPython 3.3mmacOS 10.8+ x86-64

fuzzysearch-0.3.0-cp32-cp32m-macosx_10_8_x86_64.whl (54.8 kB view details)

Uploaded CPython 3.2mmacOS 10.8+ x86-64

fuzzysearch-0.3.0-cp27-none-macosx_10_8_x86_64.whl (54.4 kB view details)

Uploaded CPython 2.7macOS 10.8+ x86-64

fuzzysearch-0.3.0-cp26-none-macosx_10_8_x86_64.whl (54.4 kB view details)

Uploaded CPython 2.6macOS 10.8+ x86-64

File details

Details for the file fuzzysearch-0.3.0.tar.gz.

File metadata

  • Download URL: fuzzysearch-0.3.0.tar.gz
  • Upload date:
  • Size: 52.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for fuzzysearch-0.3.0.tar.gz
Algorithm Hash digest
SHA256 3450da8997f982dfffa77c13fdbac4958f0658a12fc5cafaca880275468d7d79
MD5 8da26e7e42aa7ef88638eb94ad43ef6a
BLAKE2b-256 9821388da53564725e2442de3f6a81dab023bc2d422e7843eda3fae207c1202e

See more details on using hashes here.

File details

Details for the file fuzzysearch-0.3.0-cp34-cp34m-macosx_10_8_x86_64.whl.

File metadata

File hashes

Hashes for fuzzysearch-0.3.0-cp34-cp34m-macosx_10_8_x86_64.whl
Algorithm Hash digest
SHA256 356dcc2c0e77df37cd957f1a2e025f07485c0cbf6294aac599f95a043da77418
MD5 59a88726871e7ac50418e72b9cbfff7f
BLAKE2b-256 5658d668d5c0fd4c191cc24e1024fdc8b27a73659878012b8c19a9ca8422b672

See more details on using hashes here.

File details

Details for the file fuzzysearch-0.3.0-cp33-cp33m-macosx_10_8_x86_64.whl.

File metadata

File hashes

Hashes for fuzzysearch-0.3.0-cp33-cp33m-macosx_10_8_x86_64.whl
Algorithm Hash digest
SHA256 7742dd2d9c423b09e36c86748ecb73f738d0a262466e0e05d440e61cd2809950
MD5 00ecc9cb6f7d34eaad512bb4d9c070a6
BLAKE2b-256 0b607db92eb570a9b2175cb126a6da94a63dd7e873a3d81588d6b4a00189d1ce

See more details on using hashes here.

File details

Details for the file fuzzysearch-0.3.0-cp32-cp32m-macosx_10_8_x86_64.whl.

File metadata

File hashes

Hashes for fuzzysearch-0.3.0-cp32-cp32m-macosx_10_8_x86_64.whl
Algorithm Hash digest
SHA256 261bd36950585b5b599b4551994d6a9bf75b565fb3606b3dc65ef771f13b8785
MD5 eeeba71fc3780f3122ef4d9b66d522c8
BLAKE2b-256 22b69570e8e8a53834975ddeaa829cba2acb8e493ecd6c87f8e951acde82a5cf

See more details on using hashes here.

File details

Details for the file fuzzysearch-0.3.0-cp27-none-macosx_10_8_x86_64.whl.

File metadata

File hashes

Hashes for fuzzysearch-0.3.0-cp27-none-macosx_10_8_x86_64.whl
Algorithm Hash digest
SHA256 2d5555d58edbaad226bbddb05ab6a12de6ce93c384fd27f356ee2bc3cd0ac9e5
MD5 d0867dafbd5403fdb9656b4f4bc926ef
BLAKE2b-256 473cee7482967a78f16ab7172a19e34b2cfdabff5b5eb20cfd60a9ba4f9d349f

See more details on using hashes here.

File details

Details for the file fuzzysearch-0.3.0-cp26-none-macosx_10_8_x86_64.whl.

File metadata

File hashes

Hashes for fuzzysearch-0.3.0-cp26-none-macosx_10_8_x86_64.whl
Algorithm Hash digest
SHA256 072b1792c57f28dff8954e08737bba001f82681ac3030d4a5b632c47ff208c99
MD5 c5339950ee88e5683b99cb34a7206cc4
BLAKE2b-256 40c720ae90cd58004a57b19f934ac6a09aee2af9668c4908376df1bfaf8f61ac

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page