Skip to main content

fuzzysearch is useful for finding approximate subsequence matches

Project description

Latest Version Build & Tests Status Test Coverage Downloads Wheels Supported Python versions Supported Python implementations License

fuzzysearch is a Python library for fuzzy substring searches. It implements efficient ad-hoc searching for approximate sub-sequences. Matching is done using a generalized Levenshtein Distance metric, with configurable parameters.

Installation

Just install using pip:

$ pip install fuzzysearch

Features

  • Fuzzy sub-sequence search: Find parts of a sequence which match a given sub-sequence.

  • Easy to use: A single function to call which returns a list of matches.

  • Set a maximum Levenshtein Distance for matches, including individual limits for the number of substitutions, insertions and/or deletions allowed for near-matches.

  • Includes optimized implementations for specific use-cases, e.g. allowing only substitutions.

Simple Examples

Just call find_near_matches() with the sequence to search, the sub-sequence you’re looking for, and the matching parameters:

>>> from fuzzysearch import find_near_matches
# search for 'PATTERN' with a maximum Levenshtein Distance of 1
>>> find_near_matches('PATTERN', '---PATERN---', max_l_dist=1)
[Match(start=3, end=9, dist=1)]
>>> sequence = '''\
GACTAGCACTGTAGGGATAACAATTTCACACAGGTGGACAATTACATTGAAAATCACAGATTGGTCACACACACA
TTGGACATACATAGAAACACACACACATACATTAGATACGAACATAGAAACACACATTAGACGCGTACATAGACA
CAAACACATTGACAGGCAGTTCAGATGATGACGCCCGACTGATACTCGCGTAGTCGTGGGAGGCAAGGCACACAG
GGGATAGG'''
>>> subsequence = 'TGCACTGTAGGGATAACAAT' # distance = 1
>>> find_near_matches(subsequence, sequence, max_l_dist=2)
[Match(start=3, end=24, dist=1)]

Advanced Search Criteria

The search function supports four possible match criteria, which may be supplied in any combination:

  • maximum Levenshtein distance

  • maximum # of subsitutions

  • maximum # of deletions (elements appearing in the pattern search for, which are skipped in the matching sub-sequence)

  • maximum # of insertions (elements added in the matching sub-sequence which don’t appear in the pattern search for)

Not supplying a criterion means that there is no limit for it. For this reason, one must always supply max_l_dist and/or all other criteria.

>>> find_near_matches('PATTERN', '---PATERN---', max_l_dist=1)
[Match(start=3, end=9, dist=1)]

# this will not match since max-deletions is set to zero
>>> find_near_matches('PATTERN', '---PATERN---', max_l_dist=1, max_deletions=0)
[]

# note that a deletion + insertion may be combined to match a substution
>>> find_near_matches('PATTERN', '---PAT-ERN---', max_deletions=1, max_insertions=1, max_substitutions=0)
[Match(start=3, end=10, dist=1)] # the Levenshtein distance is still 1

# ... but deletion + insertion may also match other, non-substitution differences
>>> find_near_matches('PATTERN', '---PATERRN---', max_deletions=1, max_insertions=1, max_substitutions=0)
[Match(start=3, end=10, dist=2)]

History

0.3.0 (2015-02-12)

  • Added C extensions for several search functions as well as internal functions

  • Use C extensions if available, or pure-Python implementations otherwise

  • setup.py attempts to build C extensions, but installs without if build fails

  • Added --noexts setup.py option to avoid trying to build the C extensions

  • Greatly improved testing and coverage

0.2.2 (2014-03-27)

  • Added support for searching through BioPython Seq objects

  • Added specialized search function allowing only subsitutions and insertions

  • Fixed several bugs

0.2.1 (2014-03-14)

  • Fixed major match grouping bug

0.2.0 (2013-03-13)

  • New utility function find_near_matches() for easier use

  • Additional documentation

0.1.0 (2013-11-12)

  • Two working implementations

  • Extensive test suite; all tests passing

  • Full support for Python 2.6-2.7 and 3.1-3.3

  • Bumped status from Pre-Alpha to Alpha

0.0.1 (2013-11-01)

  • First release on PyPI.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fuzzysearch-0.5.0.tar.gz (61.7 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

fuzzysearch-0.5.0.1-cp36-cp36m-win32.whl (53.0 kB view details)

Uploaded CPython 3.6mWindows x86

fuzzysearch-0.5.0-cp36-cp36m-win_amd64.whl (57.1 kB view details)

Uploaded CPython 3.6mWindows x86-64

fuzzysearch-0.5.0-cp36-cp36m-macosx_10_12_x86_64.whl (60.7 kB view details)

Uploaded CPython 3.6mmacOS 10.12+ x86-64

fuzzysearch-0.5.0-cp35-cp35m-win_amd64.whl (56.6 kB view details)

Uploaded CPython 3.5mWindows x86-64

fuzzysearch-0.5.0-cp35-cp35m-win32.whl (53.0 kB view details)

Uploaded CPython 3.5mWindows x86

fuzzysearch-0.5.0-cp35-cp35m-macosx_10_12_x86_64.whl (60.2 kB view details)

Uploaded CPython 3.5mmacOS 10.12+ x86-64

fuzzysearch-0.5.0-cp34-cp34m-win32.whl (50.7 kB view details)

Uploaded CPython 3.4mWindows x86

fuzzysearch-0.5.0-cp34-cp34m-macosx_10_12_x86_64.whl (60.7 kB view details)

Uploaded CPython 3.4mmacOS 10.12+ x86-64

fuzzysearch-0.5.0-cp33-cp33m-win32.whl (50.7 kB view details)

Uploaded CPython 3.3mWindows x86

fuzzysearch-0.5.0-cp33-cp33m-macosx_10_12_x86_64.whl (60.9 kB view details)

Uploaded CPython 3.3mmacOS 10.12+ x86-64

fuzzysearch-0.5.0-cp27-cp27m-win_amd64.whl (51.7 kB view details)

Uploaded CPython 2.7mWindows x86-64

fuzzysearch-0.5.0-cp27-cp27m-win32.whl (50.3 kB view details)

Uploaded CPython 2.7mWindows x86

fuzzysearch-0.5.0-cp27-cp27m-macosx_10_12_x86_64.whl (60.8 kB view details)

Uploaded CPython 2.7mmacOS 10.12+ x86-64

fuzzysearch-0.5.0-cp26-cp26m-win_amd64.whl (52.9 kB view details)

Uploaded CPython 2.6mWindows x86-64

fuzzysearch-0.5.0-cp26-cp26m-win32.whl (50.5 kB view details)

Uploaded CPython 2.6mWindows x86

fuzzysearch-0.5.0-cp26-cp26m-macosx_10_12_x86_64.whl (60.1 kB view details)

Uploaded CPython 2.6mmacOS 10.12+ x86-64

File details

Details for the file fuzzysearch-0.5.0.tar.gz.

File metadata

  • Download URL: fuzzysearch-0.5.0.tar.gz
  • Upload date:
  • Size: 61.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for fuzzysearch-0.5.0.tar.gz
Algorithm Hash digest
SHA256 52db01818c1adceb6a9e740e59c7ed1f4f349441d4cae24e0f8473ed02170662
MD5 13b13fd0f2d9efe889d8f888d7653f45
BLAKE2b-256 dab005df3ec469906acfb1ce15d7ac4b4e201e6e2b0f159ab239caad1b5f0887

See more details on using hashes here.

File details

Details for the file fuzzysearch-0.5.0.1-cp36-cp36m-win32.whl.

File metadata

File hashes

Hashes for fuzzysearch-0.5.0.1-cp36-cp36m-win32.whl
Algorithm Hash digest
SHA256 5fda967c4d64082d6f4cf42f01dea16511ddb7f0f2d8b5da8a1785a4bb0b2625
MD5 20c93b374a441fa278345a47f6d455f0
BLAKE2b-256 dcb02a6c1706e139283a4c4d2d296c38b54b5d53c2622fc3b680770d8a3479bb

See more details on using hashes here.

File details

Details for the file fuzzysearch-0.5.0-cp36-cp36m-win_amd64.whl.

File metadata

File hashes

Hashes for fuzzysearch-0.5.0-cp36-cp36m-win_amd64.whl
Algorithm Hash digest
SHA256 288a3ec43ced224575799b455022fdd5bc0759cbb96694531b5691e3870ad56b
MD5 d7b0cb693a7c92ef7055232381fa8457
BLAKE2b-256 96115bfbfbbf930c9f20bcc465045e9f3d95cced3369812ea30086baca24d8bf

See more details on using hashes here.

File details

Details for the file fuzzysearch-0.5.0-cp36-cp36m-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for fuzzysearch-0.5.0-cp36-cp36m-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 2fd952a8436d746aca18bdb96d713b1c8637ec2496598c4c59dc9844ac9bd89e
MD5 2fdcd1ec0f47cad6a2c2107bab558788
BLAKE2b-256 0a20be1655203203f9c79319bcb93968b6ddefa9b83cd494c32145b63996ed95

See more details on using hashes here.

File details

Details for the file fuzzysearch-0.5.0-cp35-cp35m-win_amd64.whl.

File metadata

File hashes

Hashes for fuzzysearch-0.5.0-cp35-cp35m-win_amd64.whl
Algorithm Hash digest
SHA256 8aed2a7fb056421934d64c6b8237ac45670c32bcd43c550bf6a3b96e3543e8af
MD5 b75ae139aab9b6094864d87932f02114
BLAKE2b-256 ba12c4dc8c22c7b8e6bc00d906c6a771b28182a5cb1641ce87ae253f34b976b0

See more details on using hashes here.

File details

Details for the file fuzzysearch-0.5.0-cp35-cp35m-win32.whl.

File metadata

File hashes

Hashes for fuzzysearch-0.5.0-cp35-cp35m-win32.whl
Algorithm Hash digest
SHA256 b7fb231cd742484a1815f592661f2d694eaf5f3d8b591c8b5420437aa5b29b89
MD5 9df92bacaecbb2532720ddbd17f40687
BLAKE2b-256 e372362f3c506a26ae5c66dc11b4577dc1b04d750764c3363029ddd6bcf74717

See more details on using hashes here.

File details

Details for the file fuzzysearch-0.5.0-cp35-cp35m-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for fuzzysearch-0.5.0-cp35-cp35m-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 f7802828ec4ab4fa7d21312bd9cc6e501d43b4ca0f233b3c8a631f375a227441
MD5 9a238c0f0dda11eaf8df6840f5dc419c
BLAKE2b-256 61edd5bbebcbfd2f83b9391dd13e670acbfbe7ae8d390098e07d7edf4070e8a7

See more details on using hashes here.

File details

Details for the file fuzzysearch-0.5.0-cp34-cp34m-win32.whl.

File metadata

File hashes

Hashes for fuzzysearch-0.5.0-cp34-cp34m-win32.whl
Algorithm Hash digest
SHA256 9865f2d28b83459905c473176d785d842dbbf0ce49936886112cb164d76eb413
MD5 8f03f2d24e5b945ced7d29bca6cf57b1
BLAKE2b-256 a10ecc29fb7995c1690eee120c692f7ac695d8d350ed6d42937b488910a9d6d8

See more details on using hashes here.

File details

Details for the file fuzzysearch-0.5.0-cp34-cp34m-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for fuzzysearch-0.5.0-cp34-cp34m-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 c6c6e6ca4c21c0e222dc758c1d28c2923051c5be5c927b33cc381ee5e5534413
MD5 7cb6c96b22944574b028bdfa32d3c3dc
BLAKE2b-256 3c6f7a1f064061ec7e31542c004a59b942f25fe89d08cae65f001ecf852e85ee

See more details on using hashes here.

File details

Details for the file fuzzysearch-0.5.0-cp33-cp33m-win32.whl.

File metadata

File hashes

Hashes for fuzzysearch-0.5.0-cp33-cp33m-win32.whl
Algorithm Hash digest
SHA256 a4f291e5c06bfcb507546c49e43005563cdcd598ddf5972a0112e6b97d7c60ca
MD5 4aada2e12ceaad33fde3780e340a4956
BLAKE2b-256 e02a343033cb6a8c2c48e15da28aa38db8e2174c66430e5f29ea41898d3bd558

See more details on using hashes here.

File details

Details for the file fuzzysearch-0.5.0-cp33-cp33m-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for fuzzysearch-0.5.0-cp33-cp33m-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 51f1a1bac5097cfa81937ebae41425d01e99a53ccdd099044cb6501c765f49c3
MD5 3a748a442e54013b54a0ce0d17477891
BLAKE2b-256 cab4ec4ae2f5b870a2e18f356385b56be69fb8c563220ab46d86b5f9c476b7d5

See more details on using hashes here.

File details

Details for the file fuzzysearch-0.5.0-cp27-cp27m-win_amd64.whl.

File metadata

File hashes

Hashes for fuzzysearch-0.5.0-cp27-cp27m-win_amd64.whl
Algorithm Hash digest
SHA256 379a31674f9c852c361624a1928758b4cb465fc4a5acb8d6dfc4314f2068cc3d
MD5 aa9b0504b330a617f12c86af348ad31a
BLAKE2b-256 83759496d0eecccd97ac190bb3b179e9764f1ba5fdceb8cdea276ada6eed93b4

See more details on using hashes here.

File details

Details for the file fuzzysearch-0.5.0-cp27-cp27m-win32.whl.

File metadata

File hashes

Hashes for fuzzysearch-0.5.0-cp27-cp27m-win32.whl
Algorithm Hash digest
SHA256 206a6030caa3625de89a0885fb444e8ebfb33731dfb6c4a54a3cbba2eceb6e90
MD5 f5b2093f52e423716ace8986755dbd0a
BLAKE2b-256 2dac21c5323d46f2bec09f6d1f51e980a4457e4d9a6bfff3ee86a1bcd0e3b511

See more details on using hashes here.

File details

Details for the file fuzzysearch-0.5.0-cp27-cp27m-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for fuzzysearch-0.5.0-cp27-cp27m-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 8bb91cf1bcbc59e7a22dc0800aa1fea168b6c19a0894f31bec17e0f1ca8c9e02
MD5 f8cfeffe9e62dbdf47623d60435f5d88
BLAKE2b-256 ef6c5acc429b31ff544bec0d15cdbcc3430dd7b9f6e7e2c20f7e3e22f6780311

See more details on using hashes here.

File details

Details for the file fuzzysearch-0.5.0-cp26-cp26m-win_amd64.whl.

File metadata

File hashes

Hashes for fuzzysearch-0.5.0-cp26-cp26m-win_amd64.whl
Algorithm Hash digest
SHA256 4ee248840f9644836357d5298e55b5b908e4db96a57a3b270a43c5c9c36309d0
MD5 830c38fa52e6f2422c2637fb0764b053
BLAKE2b-256 4a885011547b0e661b9c53a21d0aa225c1612c327bdf1b305aa4e04e078cbce5

See more details on using hashes here.

File details

Details for the file fuzzysearch-0.5.0-cp26-cp26m-win32.whl.

File metadata

File hashes

Hashes for fuzzysearch-0.5.0-cp26-cp26m-win32.whl
Algorithm Hash digest
SHA256 b78aeeb77faad02d542eef935fd351f9486f360c6356a9d90288cd5b7dbbab52
MD5 a88effc847debc2f4e0feeb1e480fe73
BLAKE2b-256 61675563dc6bb6ad6c9442953f83885414d51d7837631d28d4ca8e4023874121

See more details on using hashes here.

File details

Details for the file fuzzysearch-0.5.0-cp26-cp26m-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for fuzzysearch-0.5.0-cp26-cp26m-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 1e0a660f133cba7d158eb7cb7cd37ba1b223dc0287c92d1efdd8a6f6f7addd91
MD5 beedd4df1d2249c0a75e0b9c05b8021a
BLAKE2b-256 c1de0729201aad5d3bc648f08501f83cb0b1c89cb190160a19dc7fe4e601818e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page