Skip to main content

fuzzysearch is useful for finding approximate subsequence matches

Project description

Latest Version Build & Tests Status Test Coverage Downloads Wheels Supported Python versions Supported Python implementations License

fuzzysearch is a Python library for fuzzy substring searches. It implements efficient ad-hoc searching for approximate sub-sequences. Matching is done using a generalized Levenshtein Distance metric, with configurable parameters.

Installation

Just install using pip:

$ pip install fuzzysearch

Features

  • Fuzzy sub-sequence search: Find parts of a sequence which match a given sub-sequence.

  • Easy to use: A single function to call which returns a list of matches.

  • Set a maximum Levenshtein Distance for matches, including individual limits for the number of substitutions, insertions and/or deletions allowed for near-matches.

  • Includes optimized implementations for specific use-cases, e.g. allowing only substitutions.

Simple Examples

Just call find_near_matches() with the sequence to search, the sub-sequence you’re looking for, and the matching parameters:

>>> from fuzzysearch import find_near_matches
# search for 'PATTERN' with a maximum Levenshtein Distance of 1
>>> find_near_matches('PATTERN', '---PATERN---', max_l_dist=1)
[Match(start=3, end=9, dist=1)]
>>> sequence = '''\
GACTAGCACTGTAGGGATAACAATTTCACACAGGTGGACAATTACATTGAAAATCACAGATTGGTCACACACACA
TTGGACATACATAGAAACACACACACATACATTAGATACGAACATAGAAACACACATTAGACGCGTACATAGACA
CAAACACATTGACAGGCAGTTCAGATGATGACGCCCGACTGATACTCGCGTAGTCGTGGGAGGCAAGGCACACAG
GGGATAGG'''
>>> subsequence = 'TGCACTGTAGGGATAACAAT' # distance = 1
>>> find_near_matches(subsequence, sequence, max_l_dist=2)
[Match(start=3, end=24, dist=1)]

Advanced Search Criteria

The search function supports four possible match criteria, which may be supplied in any combination:

  • maximum Levenshtein distance

  • maximum # of subsitutions

  • maximum # of deletions (elements appearing in the pattern search for, which are skipped in the matching sub-sequence)

  • maximum # of insertions (elements added in the matching sub-sequence which don’t appear in the pattern search for)

Not supplying a criterion means that there is no limit for it. For this reason, one must always supply max_l_dist and/or all other criteria.

>>> find_near_matches('PATTERN', '---PATERN---', max_l_dist=1)
[Match(start=3, end=9, dist=1)]

# this will not match since max-deletions is set to zero
>>> find_near_matches('PATTERN', '---PATERN---', max_l_dist=1, max_deletions=0)
[]

# note that a deletion + insertion may be combined to match a substution
>>> find_near_matches('PATTERN', '---PAT-ERN---', max_deletions=1, max_insertions=1, max_substitutions=0)
[Match(start=3, end=10, dist=1)] # the Levenshtein distance is still 1

# ... but deletion + insertion may also match other, non-substitution differences
>>> find_near_matches('PATTERN', '---PATERRN---', max_deletions=1, max_insertions=1, max_substitutions=0)
[Match(start=3, end=10, dist=2)]

History

0.3.0 (2015-02-12)

  • Added C extensions for several search functions as well as internal functions

  • Use C extensions if available, or pure-Python implementations otherwise

  • setup.py attempts to build C extensions, but installs without if build fails

  • Added --noexts setup.py option to avoid trying to build the C extensions

  • Greatly improved testing and coverage

0.2.2 (2014-03-27)

  • Added support for searching through BioPython Seq objects

  • Added specialized search function allowing only subsitutions and insertions

  • Fixed several bugs

0.2.1 (2014-03-14)

  • Fixed major match grouping bug

0.2.0 (2013-03-13)

  • New utility function find_near_matches() for easier use

  • Additional documentation

0.1.0 (2013-11-12)

  • Two working implementations

  • Extensive test suite; all tests passing

  • Full support for Python 2.6-2.7 and 3.1-3.3

  • Bumped status from Pre-Alpha to Alpha

0.0.1 (2013-11-01)

  • First release on PyPI.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fuzzysearch-0.4.0.tar.gz (61.6 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

fuzzysearch-0.4.0-cp36-cp36m-win_amd64.whl (57.2 kB view details)

Uploaded CPython 3.6mWindows x86-64

fuzzysearch-0.4.0-cp36-cp36m-win32.whl (53.0 kB view details)

Uploaded CPython 3.6mWindows x86

fuzzysearch-0.4.0-cp36-cp36m-macosx_10_12_x86_64.whl (60.9 kB view details)

Uploaded CPython 3.6mmacOS 10.12+ x86-64

fuzzysearch-0.4.0-cp35-cp35m-win_amd64.whl (56.7 kB view details)

Uploaded CPython 3.5mWindows x86-64

fuzzysearch-0.4.0-cp35-cp35m-win32.whl (53.0 kB view details)

Uploaded CPython 3.5mWindows x86

fuzzysearch-0.4.0-cp35-cp35m-macosx_10_12_x86_64.whl (60.4 kB view details)

Uploaded CPython 3.5mmacOS 10.12+ x86-64

fuzzysearch-0.4.0-cp34-cp34m-win32.whl (50.7 kB view details)

Uploaded CPython 3.4mWindows x86

fuzzysearch-0.4.0-cp34-cp34m-macosx_10_12_x86_64.whl (60.9 kB view details)

Uploaded CPython 3.4mmacOS 10.12+ x86-64

fuzzysearch-0.4.0-cp33-cp33m-win32.whl (50.7 kB view details)

Uploaded CPython 3.3mWindows x86

fuzzysearch-0.4.0-cp33-cp33m-macosx_10_12_x86_64.whl (61.1 kB view details)

Uploaded CPython 3.3mmacOS 10.12+ x86-64

fuzzysearch-0.4.0-cp32-cp32m-macosx_10_12_x86_64.whl (61.0 kB view details)

Uploaded CPython 3.2mmacOS 10.12+ x86-64

fuzzysearch-0.4.0-cp27-cp27m-win_amd64.whl (51.8 kB view details)

Uploaded CPython 2.7mWindows x86-64

fuzzysearch-0.4.0-cp27-cp27m-win32.whl (50.4 kB view details)

Uploaded CPython 2.7mWindows x86

fuzzysearch-0.4.0-cp27-cp27m-macosx_10_12_x86_64.whl (61.0 kB view details)

Uploaded CPython 2.7mmacOS 10.12+ x86-64

fuzzysearch-0.4.0-cp26-cp26m-win_amd64.whl (53.0 kB view details)

Uploaded CPython 2.6mWindows x86-64

fuzzysearch-0.4.0-cp26-cp26m-win32.whl (50.6 kB view details)

Uploaded CPython 2.6mWindows x86

fuzzysearch-0.4.0-cp26-cp26m-macosx_10_12_x86_64.whl (60.3 kB view details)

Uploaded CPython 2.6mmacOS 10.12+ x86-64

File details

Details for the file fuzzysearch-0.4.0.tar.gz.

File metadata

  • Download URL: fuzzysearch-0.4.0.tar.gz
  • Upload date:
  • Size: 61.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for fuzzysearch-0.4.0.tar.gz
Algorithm Hash digest
SHA256 653f9bcbf86d81123deafcf7f10d883b96060568d39d3322fda091a098c320b8
MD5 1f4985d247e86bf5833c81914066a6db
BLAKE2b-256 b44cc6e2a26047a591de2613adaeb9111d069673cdc1d3ffe4e49473678a5e35

See more details on using hashes here.

File details

Details for the file fuzzysearch-0.4.0-cp36-cp36m-win_amd64.whl.

File metadata

File hashes

Hashes for fuzzysearch-0.4.0-cp36-cp36m-win_amd64.whl
Algorithm Hash digest
SHA256 91ce4b771645e7d1a873cab445ae1ecf3bab196360953d7ba0f0b06df42db668
MD5 42613c3e5166f3ec6baac82b3f46c1c2
BLAKE2b-256 222c7f92da0d59f78b05f6dd2f30d02b852a02d19f6b3a0c1c31bd76e249003f

See more details on using hashes here.

File details

Details for the file fuzzysearch-0.4.0-cp36-cp36m-win32.whl.

File metadata

File hashes

Hashes for fuzzysearch-0.4.0-cp36-cp36m-win32.whl
Algorithm Hash digest
SHA256 f330ddfe8ff45ea13bc54af3ce8cb6209c2e5aae170569e9de3c2c6c7de87897
MD5 71145447261856469c17ba6485078d7a
BLAKE2b-256 421b49f929081a96340ded49981fc58ed2fdf1d6a9cb754eeb80b6273c15a9e9

See more details on using hashes here.

File details

Details for the file fuzzysearch-0.4.0-cp36-cp36m-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for fuzzysearch-0.4.0-cp36-cp36m-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 1df0e1db5ec765c130ae0fe2191e07bf7847a26fe57dfd31159ae787b18ccab3
MD5 362c0bec4027f170b859b0abc24884b3
BLAKE2b-256 5a80ddb1b4d904334c4397928535b1596f53e2490014988aa0dacdf494eb4b8c

See more details on using hashes here.

File details

Details for the file fuzzysearch-0.4.0-cp35-cp35m-win_amd64.whl.

File metadata

File hashes

Hashes for fuzzysearch-0.4.0-cp35-cp35m-win_amd64.whl
Algorithm Hash digest
SHA256 46cc175e908b17c11b0c80de82a5cba13f05408f7f57189804dc5308bf8186c9
MD5 1fa61f9cd11681070d73770b89e9e80b
BLAKE2b-256 882edf5683c097507c3a4b0f1ae5323932580fb6b8e70ae5d2506bfec54ec0d3

See more details on using hashes here.

File details

Details for the file fuzzysearch-0.4.0-cp35-cp35m-win32.whl.

File metadata

File hashes

Hashes for fuzzysearch-0.4.0-cp35-cp35m-win32.whl
Algorithm Hash digest
SHA256 202a7f3876dc2edb55bd10a59d3b3197b6306a174efd2d6f63d7dae3a199a07a
MD5 740b211a331d460c20695f167f6a97e8
BLAKE2b-256 4b2d01002cc81f5d5b7ad32f5938a1f3217d75f9b23b8895b3734453e6361c4f

See more details on using hashes here.

File details

Details for the file fuzzysearch-0.4.0-cp35-cp35m-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for fuzzysearch-0.4.0-cp35-cp35m-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 538ecfa346465f42d7ad4e3c6e39995848a0330fd5e9d0731093b2ddc36fe628
MD5 8f353bbee4125e62148a185f6242d432
BLAKE2b-256 afd7005f5511b766930faa076ac044827837aba6487f7e429bd18a41f4192a57

See more details on using hashes here.

File details

Details for the file fuzzysearch-0.4.0-cp34-cp34m-win32.whl.

File metadata

File hashes

Hashes for fuzzysearch-0.4.0-cp34-cp34m-win32.whl
Algorithm Hash digest
SHA256 05db594f75533d767291acc77caf558b9307c5b757f65b94352f26d4947d8cc7
MD5 2713ad4e062e22a0cd6083c43b6a7119
BLAKE2b-256 0783c097b06b7ff677330bb25df61363dced12b62de74d3e2da611864e9699ef

See more details on using hashes here.

File details

Details for the file fuzzysearch-0.4.0-cp34-cp34m-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for fuzzysearch-0.4.0-cp34-cp34m-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 4b51fa9abe0afad6790b88557513c37c40bd62580d3dcda9c81e9437f6f0d941
MD5 89fec49a50065afce3116ed3e659c05b
BLAKE2b-256 47c34977d7a520347e2c112aa2cee43f74d6c4ee83be4a7851fb3df478b1abca

See more details on using hashes here.

File details

Details for the file fuzzysearch-0.4.0-cp33-cp33m-win32.whl.

File metadata

File hashes

Hashes for fuzzysearch-0.4.0-cp33-cp33m-win32.whl
Algorithm Hash digest
SHA256 a0cce7d17a94fd5cf40559bb2ce9a26a390cc193c609bd8e3f7bdc7e78336da8
MD5 b6a95640f010e7e314c9236560f1b84a
BLAKE2b-256 3bb7124e28c43a233cf73fd460f6b306fa18a38b28ac19a8a7556d4ec96f926b

See more details on using hashes here.

File details

Details for the file fuzzysearch-0.4.0-cp33-cp33m-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for fuzzysearch-0.4.0-cp33-cp33m-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 a9e389b2ff58aacf741927edf184a065446c8de0396b4e5f128dccab8a7477f3
MD5 b5a1fe05765f268a5b5b68d0b297a99d
BLAKE2b-256 aa1a4bfd41d26fff9348bbf5a955d1db328bc5dcc3bcbe67571fa7b3a42df336

See more details on using hashes here.

File details

Details for the file fuzzysearch-0.4.0-cp32-cp32m-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for fuzzysearch-0.4.0-cp32-cp32m-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 e65f8e30f6ef2b3fca1e14759903385c8c7496e1c9596147dec5db174b7f3c93
MD5 9675d8b6a6376c14d81d3a18a934cf5d
BLAKE2b-256 fd6600fac312df7aabfd48f1eb9a23f9dc148f37ebcd8f703369d6592c197395

See more details on using hashes here.

File details

Details for the file fuzzysearch-0.4.0-cp27-cp27m-win_amd64.whl.

File metadata

File hashes

Hashes for fuzzysearch-0.4.0-cp27-cp27m-win_amd64.whl
Algorithm Hash digest
SHA256 9ba0980ea1810f4dc770ac91a081e81e1072f84e850eca6c9117fee604ae88d2
MD5 6a19071a03ec685e3c8b27a73aec74a7
BLAKE2b-256 c108a1bf2a82ca46c47284cc8aabf703606199e54626f91d655428845019336c

See more details on using hashes here.

File details

Details for the file fuzzysearch-0.4.0-cp27-cp27m-win32.whl.

File metadata

File hashes

Hashes for fuzzysearch-0.4.0-cp27-cp27m-win32.whl
Algorithm Hash digest
SHA256 0d80bb9b501a11e94408e7921eaa54103cef7dd5958482ccbc07938d64eac59b
MD5 e6b5ea9848490287e44556314c0d5433
BLAKE2b-256 6dc701d613e440d5ae35bc55ce6763c1f77647a49d599582bce6c101361bf81e

See more details on using hashes here.

File details

Details for the file fuzzysearch-0.4.0-cp27-cp27m-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for fuzzysearch-0.4.0-cp27-cp27m-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 a344d2cbeb4b50713bdaf6e8213f7a3fd420fc2aff69062c1d45ad423da94c2b
MD5 022b50c75fef1d1ca45b7737fffee0a0
BLAKE2b-256 5d251138fadb633b37b26e2a2cccb309ab3fc9e9997da8cae35dc101906313f3

See more details on using hashes here.

File details

Details for the file fuzzysearch-0.4.0-cp26-cp26m-win_amd64.whl.

File metadata

File hashes

Hashes for fuzzysearch-0.4.0-cp26-cp26m-win_amd64.whl
Algorithm Hash digest
SHA256 9cc0dbdaddd03b91ba207b927b8e0e7dbeda3b3655d253278baa8df5a3dc4ff9
MD5 2103b8b0e5cb3935e7830b592c48f297
BLAKE2b-256 e6e014084170be52672c8e51208b339e33e96a76939e9bfc066d2d1a4b25c745

See more details on using hashes here.

File details

Details for the file fuzzysearch-0.4.0-cp26-cp26m-win32.whl.

File metadata

File hashes

Hashes for fuzzysearch-0.4.0-cp26-cp26m-win32.whl
Algorithm Hash digest
SHA256 9413ee9baf2e213fe5ace3d4d78bc110a79a5a999cd2fbc422f695ebb3fa1aef
MD5 f31cc3761775756627be40dcff94cc00
BLAKE2b-256 051d482965a850aae0cab4dcd7a279f4cdf4bfa541ae0bb5da6763828b3052c8

See more details on using hashes here.

File details

Details for the file fuzzysearch-0.4.0-cp26-cp26m-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for fuzzysearch-0.4.0-cp26-cp26m-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 a12a89ee09198da64aad1ed2a4b71aec5a8eab3f6c96b854af2e94e01d3c15f2
MD5 042fb1fa19f1baf2938f7846a6c10320
BLAKE2b-256 e9cf661dc3e6735a482d8b8e10755b5c521b924d2fc2992cf8e7db03b7005ab0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page