Skip to main content
This is a pre-production deployment of Warehouse. Changes made here affect the production instance of PyPI (pypi.python.org).
Help us improve Python packaging - Donate today!

fuzzysearch is useful for finding approximate subsequence matches

Project Description

fuzzysearch is a Python library for fuzzy substring searches. It implements efficient ad-hoc searching for approximate sub-sequences. Matching is done using a generalized Levenshtein Distance metric, with configurable parameters.

Installation

Just install using pip:

$ pip install fuzzysearch

Features

  • Fuzzy sub-sequence search: Find parts of a sequence which match a given sub-sequence.
  • Easy to use: A single function to call which returns a list of matches.
  • Set a maximum Levenshtein Distance for matches, including individual limits for the number of substitutions, insertions and/or deletions allowed for near-matches.
  • Includes optimized implementations for specific use-cases, e.g. allowing only substitutions.

Simple Examples

Just call find_near_matches() with the sequence to search, the sub-sequence you’re looking for, and the matching parameters:

>>> from fuzzysearch import find_near_matches
# search for 'PATTERN' with a maximum Levenshtein Distance of 1
>>> find_near_matches('PATTERN', '---PATERN---', max_l_dist=1)
[Match(start=3, end=9, dist=1)]
>>> sequence = '''\
GACTAGCACTGTAGGGATAACAATTTCACACAGGTGGACAATTACATTGAAAATCACAGATTGGTCACACACACA
TTGGACATACATAGAAACACACACACATACATTAGATACGAACATAGAAACACACATTAGACGCGTACATAGACA
CAAACACATTGACAGGCAGTTCAGATGATGACGCCCGACTGATACTCGCGTAGTCGTGGGAGGCAAGGCACACAG
GGGATAGG'''
>>> subsequence = 'TGCACTGTAGGGATAACAAT' # distance = 1
>>> find_near_matches(subsequence, sequence, max_l_dist=2)
[Match(start=3, end=24, dist=1)]

Advanced Search Criteria

The search function supports four possible match criteria, which may be supplied in any combination:

  • maximum Levenshtein distance
  • maximum # of subsitutions
  • maximum # of deletions (elements appearing in the pattern search for, which are skipped in the matching sub-sequence)
  • maximum # of insertions (elements added in the matching sub-sequence which don’t appear in the pattern search for)

Not supplying a criterion means that there is no limit for it. For this reason, one must always supply max_l_dist and/or all other criteria.

>>> find_near_matches('PATTERN', '---PATERN---', max_l_dist=1)
[Match(start=3, end=9, dist=1)]

# this will not match since max-deletions is set to zero
>>> find_near_matches('PATTERN', '---PATERN---', max_l_dist=1, max_deletions=0)
[]

# note that a deletion + insertion may be combined to match a substution
>>> find_near_matches('PATTERN', '---PAT-ERN---', max_deletions=1, max_insertions=1, max_substitutions=0)
[Match(start=3, end=10, dist=1)] # the Levenshtein distance is still 1

# ... but deletion + insertion may also match other, non-substitution differences
>>> find_near_matches('PATTERN', '---PATERRN---', max_deletions=1, max_insertions=1, max_substitutions=0)
[Match(start=3, end=10, dist=2)]

History

0.3.0 (2015-02-12)

  • Added C extensions for several search functions as well as internal functions
  • Use C extensions if available, or pure-Python implementations otherwise
  • setup.py attempts to build C extensions, but installs without if build fails
  • Added --noexts setup.py option to avoid trying to build the C extensions
  • Greatly improved testing and coverage

0.2.2 (2014-03-27)

  • Added support for searching through BioPython Seq objects
  • Added specialized search function allowing only subsitutions and insertions
  • Fixed several bugs

0.2.1 (2014-03-14)

  • Fixed major match grouping bug

0.2.0 (2013-03-13)

  • New utility function find_near_matches() for easier use
  • Additional documentation

0.1.0 (2013-11-12)

  • Two working implementations
  • Extensive test suite; all tests passing
  • Full support for Python 2.6-2.7 and 3.1-3.3
  • Bumped status from Pre-Alpha to Alpha

0.0.1 (2013-11-01)

  • First release on PyPI.
Release History

Release History

This version
History Node

0.5.0

History Node

0.4.0

History Node

0.3.0

History Node

0.2.2

History Node

0.2.1

History Node

0.2.0

History Node

0.1.0

Download Files

Download Files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

File Name & Checksum SHA256 Checksum Help Version File Type Upload Date
fuzzysearch-0.5.0.1-cp36-cp36m-win32.whl (53.0 kB) Copy SHA256 Checksum SHA256 cp36 Wheel Sep 7, 2017
fuzzysearch-0.5.0-cp26-cp26m-macosx_10_12_x86_64.whl (60.1 kB) Copy SHA256 Checksum SHA256 cp26 Wheel Sep 7, 2017
fuzzysearch-0.5.0-cp26-cp26m-win32.whl (50.5 kB) Copy SHA256 Checksum SHA256 cp26 Wheel Sep 7, 2017
fuzzysearch-0.5.0-cp26-cp26m-win_amd64.whl (52.9 kB) Copy SHA256 Checksum SHA256 cp26 Wheel Sep 7, 2017
fuzzysearch-0.5.0-cp27-cp27m-macosx_10_12_x86_64.whl (60.8 kB) Copy SHA256 Checksum SHA256 cp27 Wheel Sep 7, 2017
fuzzysearch-0.5.0-cp27-cp27m-win32.whl (50.3 kB) Copy SHA256 Checksum SHA256 cp27 Wheel Sep 7, 2017
fuzzysearch-0.5.0-cp27-cp27m-win_amd64.whl (51.7 kB) Copy SHA256 Checksum SHA256 cp27 Wheel Sep 7, 2017
fuzzysearch-0.5.0-cp33-cp33m-macosx_10_12_x86_64.whl (60.9 kB) Copy SHA256 Checksum SHA256 cp33 Wheel Sep 7, 2017
fuzzysearch-0.5.0-cp33-cp33m-win32.whl (50.7 kB) Copy SHA256 Checksum SHA256 cp33 Wheel Sep 7, 2017
fuzzysearch-0.5.0-cp34-cp34m-macosx_10_12_x86_64.whl (60.7 kB) Copy SHA256 Checksum SHA256 cp34 Wheel Sep 7, 2017
fuzzysearch-0.5.0-cp34-cp34m-win32.whl (50.7 kB) Copy SHA256 Checksum SHA256 cp34 Wheel Sep 7, 2017
fuzzysearch-0.5.0-cp35-cp35m-macosx_10_12_x86_64.whl (60.2 kB) Copy SHA256 Checksum SHA256 cp35 Wheel Sep 7, 2017
fuzzysearch-0.5.0-cp35-cp35m-win32.whl (53.0 kB) Copy SHA256 Checksum SHA256 cp35 Wheel Sep 7, 2017
fuzzysearch-0.5.0-cp35-cp35m-win_amd64.whl (56.6 kB) Copy SHA256 Checksum SHA256 cp35 Wheel Sep 7, 2017
fuzzysearch-0.5.0-cp36-cp36m-macosx_10_12_x86_64.whl (60.7 kB) Copy SHA256 Checksum SHA256 cp36 Wheel Sep 7, 2017
fuzzysearch-0.5.0-cp36-cp36m-win_amd64.whl (57.1 kB) Copy SHA256 Checksum SHA256 cp36 Wheel Sep 7, 2017
fuzzysearch-0.5.0.tar.gz (61.7 kB) Copy SHA256 Checksum SHA256 Source Sep 7, 2017

Supported By

WebFaction WebFaction Technical Writing Elastic Elastic Search Pingdom Pingdom Monitoring Dyn Dyn DNS Sentry Sentry Error Logging CloudAMQP CloudAMQP RabbitMQ Heroku Heroku PaaS Kabu Creative Kabu Creative UX & Design Fastly Fastly CDN DigiCert DigiCert EV Certificate Rackspace Rackspace Cloud Servers DreamHost DreamHost Log Hosting