Skip to main content

fuzzysearch is useful for finding approximate subsequence matches

Project description

Latest Version Build & Tests Status Test Coverage Downloads Wheels Supported Python versions Supported Python implementations License

fuzzysearch is a Python library for fuzzy substring searches. It implements efficient ad-hoc searching for approximate sub-sequences. Matching is done using a generalized Levenshtein Distance metric, with configurable parameters.

Installation

Just install using pip:

$ pip install fuzzysearch

Features

  • Fuzzy sub-sequence search: Find parts of a sequence which match a given sub-sequence.

  • Easy to use: A single function to call which returns a list of matches.

  • Set a maximum Levenshtein Distance for matches, including individual limits for the number of substitutions, insertions and/or deletions allowed for near-matches.

  • Includes optimized implementations for specific use-cases, e.g. allowing only substitutions.

Simple Examples

Just call find_near_matches() with the sequence to search, the sub-sequence you’re looking for, and the matching parameters:

>>> from fuzzysearch import find_near_matches
# search for 'PATTERN' with a maximum Levenshtein Distance of 1
>>> find_near_matches('PATTERN', '---PATERN---', max_l_dist=1)
[Match(start=3, end=9, dist=1)]
>>> sequence = '''\
GACTAGCACTGTAGGGATAACAATTTCACACAGGTGGACAATTACATTGAAAATCACAGATTGGTCACACACACA
TTGGACATACATAGAAACACACACACATACATTAGATACGAACATAGAAACACACATTAGACGCGTACATAGACA
CAAACACATTGACAGGCAGTTCAGATGATGACGCCCGACTGATACTCGCGTAGTCGTGGGAGGCAAGGCACACAG
GGGATAGG'''
>>> subsequence = 'TGCACTGTAGGGATAACAAT' # distance = 1
>>> find_near_matches(subsequence, sequence, max_l_dist=2)
[Match(start=3, end=24, dist=1)]

Advanced Search Criteria

The search function supports four possible match criteria, which may be supplied in any combination:

  • maximum Levenshtein distance

  • maximum # of subsitutions

  • maximum # of deletions (elements appearing in the pattern search for, which are skipped in the matching sub-sequence)

  • maximum # of insertions (elements added in the matching sub-sequence which don’t appear in the pattern search for)

Not supplying a criterion means that there is no limit for it. For this reason, one must always supply max_l_dist and/or all other criteria.

>>> find_near_matches('PATTERN', '---PATERN---', max_l_dist=1)
[Match(start=3, end=9, dist=1)]

# this will not match since max-deletions is set to zero
>>> find_near_matches('PATTERN', '---PATERN---', max_l_dist=1, max_deletions=0)
[]

# note that a deletion + insertion may be combined to match a substution
>>> find_near_matches('PATTERN', '---PAT-ERN---', max_deletions=1, max_insertions=1, max_substitutions=0)
[Match(start=3, end=10, dist=1)] # the Levenshtein distance is still 1

# ... but deletion + insertion may also match other, non-substitution differences
>>> find_near_matches('PATTERN', '---PATERRN---', max_deletions=1, max_insertions=1, max_substitutions=0)
[Match(start=3, end=10, dist=2)]

History

0.3.0 (2015-02-12)

  • Added C extensions for several search functions as well as internal functions

  • Use C extensions if available, or pure-Python implementations otherwise

  • setup.py attempts to build C extensions, but installs without if build fails

  • Added --noexts setup.py option to avoid trying to build the C extensions

  • Greatly improved testing and coverage

0.2.2 (2014-03-27)

  • Added support for searching through BioPython Seq objects

  • Added specialized search function allowing only subsitutions and insertions

  • Fixed several bugs

0.2.1 (2014-03-14)

  • Fixed major match grouping bug

0.2.0 (2013-03-13)

  • New utility function find_near_matches() for easier use

  • Additional documentation

0.1.0 (2013-11-12)

  • Two working implementations

  • Extensive test suite; all tests passing

  • Full support for Python 2.6-2.7 and 3.1-3.3

  • Bumped status from Pre-Alpha to Alpha

0.0.1 (2013-11-01)

  • First release on PyPI.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fuzzysearch-0.6.0.tar.gz (100.1 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

fuzzysearch-0.6.0-cp37-cp37m-win_amd64.whl (78.9 kB view details)

Uploaded CPython 3.7mWindows x86-64

fuzzysearch-0.6.0-cp37-cp37m-win32.whl (71.3 kB view details)

Uploaded CPython 3.7mWindows x86

fuzzysearch-0.6.0-cp36-cp36m-win_amd64.whl (78.7 kB view details)

Uploaded CPython 3.6mWindows x86-64

fuzzysearch-0.6.0-cp36-cp36m-win32.whl (71.2 kB view details)

Uploaded CPython 3.6mWindows x86

fuzzysearch-0.6.0-cp35-cp35m-win_amd64.whl (77.4 kB view details)

Uploaded CPython 3.5mWindows x86-64

fuzzysearch-0.6.0-cp35-cp35m-win32.whl (70.4 kB view details)

Uploaded CPython 3.5mWindows x86

fuzzysearch-0.6.0-cp34-cp34m-win32.whl (66.5 kB view details)

Uploaded CPython 3.4mWindows x86

fuzzysearch-0.6.0-cp27-cp27m-win_amd64.whl (68.1 kB view details)

Uploaded CPython 2.7mWindows x86-64

fuzzysearch-0.6.0-cp27-cp27m-win32.whl (65.6 kB view details)

Uploaded CPython 2.7mWindows x86

File details

Details for the file fuzzysearch-0.6.0.tar.gz.

File metadata

  • Download URL: fuzzysearch-0.6.0.tar.gz
  • Upload date:
  • Size: 100.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.18.4 setuptools/40.6.2 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.6.3

File hashes

Hashes for fuzzysearch-0.6.0.tar.gz
Algorithm Hash digest
SHA256 a5c4c41168cc6201b610a7fb51aac53d3191a9d064fe420f55540e6945e33ef7
MD5 943de1683f047e4f8cfd528a75c3ce37
BLAKE2b-256 bd95e71c2a2784af15012eb3ce6e1faad227cfc1b183197e91a407c60ff4ddd6

See more details on using hashes here.

File details

Details for the file fuzzysearch-0.6.0-cp37-cp37m-win_amd64.whl.

File metadata

  • Download URL: fuzzysearch-0.6.0-cp37-cp37m-win_amd64.whl
  • Upload date:
  • Size: 78.9 kB
  • Tags: CPython 3.7m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.18.4 setuptools/40.6.2 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.6.3

File hashes

Hashes for fuzzysearch-0.6.0-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 8c90b7fdba6645d8eb3033ee01f434768c6a33539ce620cd7706aca00bb8f298
MD5 515f1a9dfb3e6e6dc29c456ddc177ac7
BLAKE2b-256 4e3dfaa3edfbd5a085d750108ce13d419f0d5339ef5ac880b3c73f762ad73c19

See more details on using hashes here.

File details

Details for the file fuzzysearch-0.6.0-cp37-cp37m-win32.whl.

File metadata

  • Download URL: fuzzysearch-0.6.0-cp37-cp37m-win32.whl
  • Upload date:
  • Size: 71.3 kB
  • Tags: CPython 3.7m, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.18.4 setuptools/40.6.2 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.6.3

File hashes

Hashes for fuzzysearch-0.6.0-cp37-cp37m-win32.whl
Algorithm Hash digest
SHA256 ce2eba97d2c0ea88d229b8154ac3dcd4e716baa9f3dfdff9bc61d5e3a690f442
MD5 7e678c532d64d42a5ec8fb100bccca0b
BLAKE2b-256 f2d9defa0a135d0a34de443d88c27987c799afd0a041040198a7d69cf8dc0a7e

See more details on using hashes here.

File details

Details for the file fuzzysearch-0.6.0-cp36-cp36m-win_amd64.whl.

File metadata

  • Download URL: fuzzysearch-0.6.0-cp36-cp36m-win_amd64.whl
  • Upload date:
  • Size: 78.7 kB
  • Tags: CPython 3.6m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.18.4 setuptools/40.6.2 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.6.3

File hashes

Hashes for fuzzysearch-0.6.0-cp36-cp36m-win_amd64.whl
Algorithm Hash digest
SHA256 003e6fa4dfadccb23fbab1f99e8f054aab485c9e4fcd5a945760f043a3a82895
MD5 27735ee486a7975534d77bd576020906
BLAKE2b-256 50097d61becde114b2f76402236b95bd99cf4ed3ce73ba019b63a1c480b075d6

See more details on using hashes here.

File details

Details for the file fuzzysearch-0.6.0-cp36-cp36m-win32.whl.

File metadata

  • Download URL: fuzzysearch-0.6.0-cp36-cp36m-win32.whl
  • Upload date:
  • Size: 71.2 kB
  • Tags: CPython 3.6m, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.18.4 setuptools/40.6.2 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.6.3

File hashes

Hashes for fuzzysearch-0.6.0-cp36-cp36m-win32.whl
Algorithm Hash digest
SHA256 62d141ece81dd4397990c7e7aeaac33235f24ae4222595222dcdee770bec568a
MD5 97f6f833f0eeaead4427966303fb0dc6
BLAKE2b-256 bbb40d5935e4d6d4bef27bd60f89123a6472a75b78f1ee6a71d1d16bbe5beafa

See more details on using hashes here.

File details

Details for the file fuzzysearch-0.6.0-cp35-cp35m-win_amd64.whl.

File metadata

  • Download URL: fuzzysearch-0.6.0-cp35-cp35m-win_amd64.whl
  • Upload date:
  • Size: 77.4 kB
  • Tags: CPython 3.5m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.18.4 setuptools/40.6.2 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.6.3

File hashes

Hashes for fuzzysearch-0.6.0-cp35-cp35m-win_amd64.whl
Algorithm Hash digest
SHA256 37256de399fa231808f3412ed2d35d84e28bf8ecf55bdaa0a9c45698bd976fba
MD5 cbca16f403a4c2c7aa0204212e91007b
BLAKE2b-256 297e5dcaba1b8cf7a8b19bf83a1402c9da856c45d9d5d1355b8e1df81e1430ef

See more details on using hashes here.

File details

Details for the file fuzzysearch-0.6.0-cp35-cp35m-win32.whl.

File metadata

  • Download URL: fuzzysearch-0.6.0-cp35-cp35m-win32.whl
  • Upload date:
  • Size: 70.4 kB
  • Tags: CPython 3.5m, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.18.4 setuptools/40.6.2 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.6.3

File hashes

Hashes for fuzzysearch-0.6.0-cp35-cp35m-win32.whl
Algorithm Hash digest
SHA256 bfd4c947d06db792b567103e732a3b3994fa7da192a8dd69dd4eb6d29340a329
MD5 eaf10e4e3541e5b745d5633e066d5dbe
BLAKE2b-256 e84dc99a5e80c738eb8e0f20086fac217edd82d6f5ca217d7cfa89612ce2f87c

See more details on using hashes here.

File details

Details for the file fuzzysearch-0.6.0-cp34-cp34m-win32.whl.

File metadata

  • Download URL: fuzzysearch-0.6.0-cp34-cp34m-win32.whl
  • Upload date:
  • Size: 66.5 kB
  • Tags: CPython 3.4m, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.18.4 setuptools/40.6.2 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.6.3

File hashes

Hashes for fuzzysearch-0.6.0-cp34-cp34m-win32.whl
Algorithm Hash digest
SHA256 fa04ac28ce3fe453fd52acf9c2ce2f8c4dd6dbfdd9eaa60fa93fed23b722dca8
MD5 ab3b350e155646c32a6c3e805ae76101
BLAKE2b-256 d06131cae81bca66ddd2f709ac6ffc7cab809b0de76103e7c847418d17c3d80f

See more details on using hashes here.

File details

Details for the file fuzzysearch-0.6.0-cp27-cp27m-win_amd64.whl.

File metadata

  • Download URL: fuzzysearch-0.6.0-cp27-cp27m-win_amd64.whl
  • Upload date:
  • Size: 68.1 kB
  • Tags: CPython 2.7m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.18.4 setuptools/40.6.2 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.6.3

File hashes

Hashes for fuzzysearch-0.6.0-cp27-cp27m-win_amd64.whl
Algorithm Hash digest
SHA256 d23278149406965fbec614c5cba088cb2bfb61a11463373b6db9ca59a87a5fba
MD5 09c4d3a0c2db5d288e3be93ca71191f6
BLAKE2b-256 f1bce7c1bdea40040d2e3c244405b8a9247c978560de2b3c7782ec45952284cd

See more details on using hashes here.

File details

Details for the file fuzzysearch-0.6.0-cp27-cp27m-win32.whl.

File metadata

  • Download URL: fuzzysearch-0.6.0-cp27-cp27m-win32.whl
  • Upload date:
  • Size: 65.6 kB
  • Tags: CPython 2.7m, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.18.4 setuptools/40.6.2 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.6.3

File hashes

Hashes for fuzzysearch-0.6.0-cp27-cp27m-win32.whl
Algorithm Hash digest
SHA256 20448b0871f98a97cfe3a261cb3f2d224d01f77558531fdfa0d28ae930215949
MD5 30ee390a6194dc8c7fecfa9db886387c
BLAKE2b-256 03073f711ee9e62da012b9080530dc6260a2542cbc4cc188c568da8ba6cc3d25

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page