Skip to main content

fuzzysearch is useful for finding approximate subsequence matches

Project description

Latest Version Build & Tests Status Test Coverage Downloads Wheels Supported Python versions Supported Python implementations License

fuzzysearch is a Python library for fuzzy substring searches. It implements efficient ad-hoc searching for approximate sub-sequences. Matching is done using a generalized Levenshtein Distance metric, with configurable parameters.

Installation

Just install using pip:

$ pip install fuzzysearch

Features

  • Fuzzy sub-sequence search: Find parts of a sequence which match a given sub-sequence.

  • Easy to use: A single function to call which returns a list of matches.

  • Set a maximum Levenshtein Distance for matches, including individual limits for the number of substitutions, insertions and/or deletions allowed for near-matches.

  • Includes optimized implementations for specific use-cases, e.g. allowing only substitutions.

Simple Examples

Just call find_near_matches() with the sequence to search, the sub-sequence you’re looking for, and the matching parameters:

>>> from fuzzysearch import find_near_matches
# search for 'PATTERN' with a maximum Levenshtein Distance of 1
>>> find_near_matches('PATTERN', '---PATERN---', max_l_dist=1)
[Match(start=3, end=9, dist=1)]
>>> sequence = '''\
GACTAGCACTGTAGGGATAACAATTTCACACAGGTGGACAATTACATTGAAAATCACAGATTGGTCACACACACA
TTGGACATACATAGAAACACACACACATACATTAGATACGAACATAGAAACACACATTAGACGCGTACATAGACA
CAAACACATTGACAGGCAGTTCAGATGATGACGCCCGACTGATACTCGCGTAGTCGTGGGAGGCAAGGCACACAG
GGGATAGG'''
>>> subsequence = 'TGCACTGTAGGGATAACAAT' # distance = 1
>>> find_near_matches(subsequence, sequence, max_l_dist=2)
[Match(start=3, end=24, dist=1)]

Advanced Search Criteria

The search function supports four possible match criteria, which may be supplied in any combination:

  • maximum Levenshtein distance

  • maximum # of subsitutions

  • maximum # of deletions (elements appearing in the pattern search for, which are skipped in the matching sub-sequence)

  • maximum # of insertions (elements added in the matching sub-sequence which don’t appear in the pattern search for)

Not supplying a criterion means that there is no limit for it. For this reason, one must always supply max_l_dist and/or all other criteria.

>>> find_near_matches('PATTERN', '---PATERN---', max_l_dist=1)
[Match(start=3, end=9, dist=1)]

# this will not match since max-deletions is set to zero
>>> find_near_matches('PATTERN', '---PATERN---', max_l_dist=1, max_deletions=0)
[]

# note that a deletion + insertion may be combined to match a substution
>>> find_near_matches('PATTERN', '---PAT-ERN---', max_deletions=1, max_insertions=1, max_substitutions=0)
[Match(start=3, end=10, dist=1)] # the Levenshtein distance is still 1

# ... but deletion + insertion may also match other, non-substitution differences
>>> find_near_matches('PATTERN', '---PATERRN---', max_deletions=1, max_insertions=1, max_substitutions=0)
[Match(start=3, end=10, dist=2)]

History

0.6.1 (2018-12-08)

  • Fixed some C compiler warnings for the C and Cython modules

0.6.0 (2018-12-07)

  • Dropped support for Python versions 2.6, 3.2 and 3.3

  • Added support and testing for Python 3.7

  • Optimized the n-grams Levenshtein search for long sub-sequences

  • Further optimized the n-grams Levenshtein search

  • Cython versions of the optimized parts of the n-grams Levenshtein search

0.5.0 (2017-09-05)

  • Fixed search_exact_byteslike() to support supplying start and end indexes

  • Added support for lists, tuples and other Sequence types to search_exact()

  • Fixed a bug where find_near_matches() could return a wrong Match.end with max_l_dist=0

  • Added more tests and improved some existing ones.

0.4.0 (2017-07-06)

  • Added support and testing for Python 3.5 and 3.6

  • Many small improvements to README, setup.py and CI testing

0.3.0 (2015-02-12)

  • Added C extensions for several search functions as well as internal functions

  • Use C extensions if available, or pure-Python implementations otherwise

  • setup.py attempts to build C extensions, but installs without if build fails

  • Added --noexts setup.py option to avoid trying to build the C extensions

  • Greatly improved testing and coverage

0.2.2 (2014-03-27)

  • Added support for searching through BioPython Seq objects

  • Added specialized search function allowing only subsitutions and insertions

  • Fixed several bugs

0.2.1 (2014-03-14)

  • Fixed major match grouping bug

0.2.0 (2013-03-13)

  • New utility function find_near_matches() for easier use

  • Additional documentation

0.1.0 (2013-11-12)

  • Two working implementations

  • Extensive test suite; all tests passing

  • Full support for Python 2.6-2.7 and 3.1-3.3

  • Bumped status from Pre-Alpha to Alpha

0.0.1 (2013-11-01)

  • First release on PyPI.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fuzzysearch-0.6.1.tar.gz (99.2 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

fuzzysearch-0.6.1-cp37-cp37m-win_amd64.whl (77.7 kB view details)

Uploaded CPython 3.7mWindows x86-64

fuzzysearch-0.6.1-cp37-cp37m-win32.whl (70.4 kB view details)

Uploaded CPython 3.7mWindows x86

fuzzysearch-0.6.1-cp37-cp37m-macosx_10_14_x86_64.whl (75.5 kB view details)

Uploaded CPython 3.7mmacOS 10.14+ x86-64

fuzzysearch-0.6.1-cp36-cp36m-win_amd64.whl (77.5 kB view details)

Uploaded CPython 3.6mWindows x86-64

fuzzysearch-0.6.1-cp36-cp36m-win32.whl (70.2 kB view details)

Uploaded CPython 3.6mWindows x86

fuzzysearch-0.6.1-cp36-cp36m-macosx_10_14_x86_64.whl (75.1 kB view details)

Uploaded CPython 3.6mmacOS 10.14+ x86-64

fuzzysearch-0.6.1-cp35-cp35m-win_amd64.whl (76.3 kB view details)

Uploaded CPython 3.5mWindows x86-64

fuzzysearch-0.6.1-cp35-cp35m-win32.whl (69.1 kB view details)

Uploaded CPython 3.5mWindows x86

fuzzysearch-0.6.1-cp35-cp35m-macosx_10_14_x86_64.whl (73.3 kB view details)

Uploaded CPython 3.5mmacOS 10.14+ x86-64

fuzzysearch-0.6.1-cp34-cp34m-win32.whl (64.7 kB view details)

Uploaded CPython 3.4mWindows x86

fuzzysearch-0.6.1-cp34-cp34m-macosx_10_14_x86_64.whl (73.2 kB view details)

Uploaded CPython 3.4mmacOS 10.14+ x86-64

fuzzysearch-0.6.1-cp27-cp27m-win_amd64.whl (66.9 kB view details)

Uploaded CPython 2.7mWindows x86-64

fuzzysearch-0.6.1-cp27-cp27m-win32.whl (63.9 kB view details)

Uploaded CPython 2.7mWindows x86

fuzzysearch-0.6.1-cp27-cp27m-macosx_10_14_x86_64.whl (72.0 kB view details)

Uploaded CPython 2.7mmacOS 10.14+ x86-64

File details

Details for the file fuzzysearch-0.6.1.tar.gz.

File metadata

  • Download URL: fuzzysearch-0.6.1.tar.gz
  • Upload date:
  • Size: 99.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.18.4 setuptools/40.6.2 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.6.3

File hashes

Hashes for fuzzysearch-0.6.1.tar.gz
Algorithm Hash digest
SHA256 5e910f227e12ac2deff8501a4f3e96d12cc06598774c39a7842037e2b106c5ca
MD5 7a2dd221ace7bb171fdc5d88786ea438
BLAKE2b-256 199d2b5a9d5f60ccd6ac6d5b11f3c73be4600ac9136d439a2ad9a2653940c520

See more details on using hashes here.

File details

Details for the file fuzzysearch-0.6.1-cp37-cp37m-win_amd64.whl.

File metadata

  • Download URL: fuzzysearch-0.6.1-cp37-cp37m-win_amd64.whl
  • Upload date:
  • Size: 77.7 kB
  • Tags: CPython 3.7m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.18.4 setuptools/40.6.2 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.6.3

File hashes

Hashes for fuzzysearch-0.6.1-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 bf35b9d05c6f1ff990727323a615f6400f20fd54a271980c8b25651e26409a90
MD5 12e0f695f71ff2832d6816c845470f89
BLAKE2b-256 99b6339c466115d982fdb525a23b153a0984ee2b635f1f3688c9d117773119c8

See more details on using hashes here.

File details

Details for the file fuzzysearch-0.6.1-cp37-cp37m-win32.whl.

File metadata

  • Download URL: fuzzysearch-0.6.1-cp37-cp37m-win32.whl
  • Upload date:
  • Size: 70.4 kB
  • Tags: CPython 3.7m, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.18.4 setuptools/40.6.2 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.6.3

File hashes

Hashes for fuzzysearch-0.6.1-cp37-cp37m-win32.whl
Algorithm Hash digest
SHA256 ced7aa4706a4a2fcf4dffefbb63b106d27fc86a65821dc19d3a02b8bea163714
MD5 bd264915ce10656658413e87008bc2cf
BLAKE2b-256 a70539aa0dd869a2d4f68ed1e2efacfb9f95286e69e4d9e9a45922fb3f898466

See more details on using hashes here.

File details

Details for the file fuzzysearch-0.6.1-cp37-cp37m-macosx_10_14_x86_64.whl.

File metadata

  • Download URL: fuzzysearch-0.6.1-cp37-cp37m-macosx_10_14_x86_64.whl
  • Upload date:
  • Size: 75.5 kB
  • Tags: CPython 3.7m, macOS 10.14+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.20.1 setuptools/40.6.2 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/2.7.15

File hashes

Hashes for fuzzysearch-0.6.1-cp37-cp37m-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 179f10ba7379a31789866b85a0aee576a57c35810be15a53a8ef3070de7eba6b
MD5 adbccfe6a19b4eb5bf232474a458f93d
BLAKE2b-256 d7aaeb490a4cbaa9f40c042ec1a4418b371e6bd2bdfae010df1a1ba41d0ea9ca

See more details on using hashes here.

File details

Details for the file fuzzysearch-0.6.1-cp36-cp36m-win_amd64.whl.

File metadata

  • Download URL: fuzzysearch-0.6.1-cp36-cp36m-win_amd64.whl
  • Upload date:
  • Size: 77.5 kB
  • Tags: CPython 3.6m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.18.4 setuptools/40.6.2 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.6.3

File hashes

Hashes for fuzzysearch-0.6.1-cp36-cp36m-win_amd64.whl
Algorithm Hash digest
SHA256 aa70a48e96a197df7c8c5afef54b2c6987a56ce2fabfd5dcb2c4a815729f3a3d
MD5 3bd90a7fdae82f32aa1b0414fe75025d
BLAKE2b-256 15f4a2d37889a60a00480e759b19f7dc5d2c6bc46721e02840982fccc69f5d72

See more details on using hashes here.

File details

Details for the file fuzzysearch-0.6.1-cp36-cp36m-win32.whl.

File metadata

  • Download URL: fuzzysearch-0.6.1-cp36-cp36m-win32.whl
  • Upload date:
  • Size: 70.2 kB
  • Tags: CPython 3.6m, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.18.4 setuptools/40.6.2 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.6.3

File hashes

Hashes for fuzzysearch-0.6.1-cp36-cp36m-win32.whl
Algorithm Hash digest
SHA256 bd214f4fa37a8b352f28c5bb85fb4f682356ff838593c88b96825155356f72ee
MD5 54771ad006eea8d2c20d13732a88a720
BLAKE2b-256 aac10e32d7578548920364b8ed9b01528c98c5e919245b343c952d51a6f28fac

See more details on using hashes here.

File details

Details for the file fuzzysearch-0.6.1-cp36-cp36m-macosx_10_14_x86_64.whl.

File metadata

  • Download URL: fuzzysearch-0.6.1-cp36-cp36m-macosx_10_14_x86_64.whl
  • Upload date:
  • Size: 75.1 kB
  • Tags: CPython 3.6m, macOS 10.14+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.20.1 setuptools/40.6.2 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/2.7.15

File hashes

Hashes for fuzzysearch-0.6.1-cp36-cp36m-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 7f45dfe891c30862859b150b99d0f676fa6f76d8ce973012e44aae55b537984d
MD5 3d003439a39c4a63299e0cf86bb45479
BLAKE2b-256 58751b93fb60d56ffc2885f47f587387d09e831d4d63974673e718ef87e64d59

See more details on using hashes here.

File details

Details for the file fuzzysearch-0.6.1-cp35-cp35m-win_amd64.whl.

File metadata

  • Download URL: fuzzysearch-0.6.1-cp35-cp35m-win_amd64.whl
  • Upload date:
  • Size: 76.3 kB
  • Tags: CPython 3.5m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.18.4 setuptools/40.6.2 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.6.3

File hashes

Hashes for fuzzysearch-0.6.1-cp35-cp35m-win_amd64.whl
Algorithm Hash digest
SHA256 ab793a7bb31c25dee6d285b709c2c266c388f31792cc24a86b60eef819c4ec7d
MD5 13bb93dba3956aa86464e809080d59e8
BLAKE2b-256 fbc45913b8feb20dfc7003929177fe2902e73b42696561e0e6e92be070557cb2

See more details on using hashes here.

File details

Details for the file fuzzysearch-0.6.1-cp35-cp35m-win32.whl.

File metadata

  • Download URL: fuzzysearch-0.6.1-cp35-cp35m-win32.whl
  • Upload date:
  • Size: 69.1 kB
  • Tags: CPython 3.5m, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.18.4 setuptools/40.6.2 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.6.3

File hashes

Hashes for fuzzysearch-0.6.1-cp35-cp35m-win32.whl
Algorithm Hash digest
SHA256 9ad72cc9e8275924f825edb627d7bd13d4e63346a81d6405126a5f42418912a1
MD5 db21c914ebc7d629008e4d695641f937
BLAKE2b-256 a39c0e06a9e1fcc16808540bed6ada66a5cfd5f81bf73ed54f3a1eb0d9398bba

See more details on using hashes here.

File details

Details for the file fuzzysearch-0.6.1-cp35-cp35m-macosx_10_14_x86_64.whl.

File metadata

  • Download URL: fuzzysearch-0.6.1-cp35-cp35m-macosx_10_14_x86_64.whl
  • Upload date:
  • Size: 73.3 kB
  • Tags: CPython 3.5m, macOS 10.14+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.20.1 setuptools/40.6.2 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/2.7.15

File hashes

Hashes for fuzzysearch-0.6.1-cp35-cp35m-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 0fe43ed0e67594d3f8c373b94fac282d00ae40dd598f41022644cfc4c18afa60
MD5 042dea4279d40823e93c1be95e11cce7
BLAKE2b-256 2913868c4ab4f424dc6ceb7428d3de01e3aeabd75b6bb510628bd1131aedd8f2

See more details on using hashes here.

File details

Details for the file fuzzysearch-0.6.1-cp34-cp34m-win32.whl.

File metadata

  • Download URL: fuzzysearch-0.6.1-cp34-cp34m-win32.whl
  • Upload date:
  • Size: 64.7 kB
  • Tags: CPython 3.4m, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.18.4 setuptools/40.6.2 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.6.3

File hashes

Hashes for fuzzysearch-0.6.1-cp34-cp34m-win32.whl
Algorithm Hash digest
SHA256 bf7ec50944f43e4231b09b81672591e48b110d2af6d9d36c00b886b9ad15278b
MD5 8715191bd09a7a20797fe5556b4fc1af
BLAKE2b-256 1870815c0764ddf94e9e64fe9bac0c4907c80ff5018bda248f6bf3de6febcc7a

See more details on using hashes here.

File details

Details for the file fuzzysearch-0.6.1-cp34-cp34m-macosx_10_14_x86_64.whl.

File metadata

  • Download URL: fuzzysearch-0.6.1-cp34-cp34m-macosx_10_14_x86_64.whl
  • Upload date:
  • Size: 73.2 kB
  • Tags: CPython 3.4m, macOS 10.14+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.20.1 setuptools/40.6.2 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/2.7.15

File hashes

Hashes for fuzzysearch-0.6.1-cp34-cp34m-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 d78751f6e5a581eeb9d22e7192f0fb644ac47327f77f9c0d009785c0a5223093
MD5 83a0de3f21e623667c4871f2979942ae
BLAKE2b-256 ee2b1c4c71a48a4bc27c17e25085e3335e60117cf35000640a59351727755d04

See more details on using hashes here.

File details

Details for the file fuzzysearch-0.6.1-cp27-cp27m-win_amd64.whl.

File metadata

  • Download URL: fuzzysearch-0.6.1-cp27-cp27m-win_amd64.whl
  • Upload date:
  • Size: 66.9 kB
  • Tags: CPython 2.7m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.18.4 setuptools/40.6.2 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.6.3

File hashes

Hashes for fuzzysearch-0.6.1-cp27-cp27m-win_amd64.whl
Algorithm Hash digest
SHA256 b275231feb4c85ab8a8c9ee1b036156654fee7df25108f7a420f2a0c549d7968
MD5 a062f7f560d1700ea547c2de2437553e
BLAKE2b-256 1e9f0da52741ec3a71d17b546041c591ff8c2db7201ef09fa079a37ea293a073

See more details on using hashes here.

File details

Details for the file fuzzysearch-0.6.1-cp27-cp27m-win32.whl.

File metadata

  • Download URL: fuzzysearch-0.6.1-cp27-cp27m-win32.whl
  • Upload date:
  • Size: 63.9 kB
  • Tags: CPython 2.7m, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.18.4 setuptools/40.6.2 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.6.3

File hashes

Hashes for fuzzysearch-0.6.1-cp27-cp27m-win32.whl
Algorithm Hash digest
SHA256 1192c7617173607427408a123bc054e49f965374156b6b21bc40d26b64105032
MD5 50578682fea171c5040610d6e1739793
BLAKE2b-256 68beaf69b14ecd519e864b72dab40ba24b3f632c049279c6a3cc645d683b5627

See more details on using hashes here.

File details

Details for the file fuzzysearch-0.6.1-cp27-cp27m-macosx_10_14_x86_64.whl.

File metadata

  • Download URL: fuzzysearch-0.6.1-cp27-cp27m-macosx_10_14_x86_64.whl
  • Upload date:
  • Size: 72.0 kB
  • Tags: CPython 2.7m, macOS 10.14+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.20.1 setuptools/40.6.2 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/2.7.15

File hashes

Hashes for fuzzysearch-0.6.1-cp27-cp27m-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 4d6644f4dce13edc133fdf7611ff7b3483110ab8ea972094e20f17d826382e1e
MD5 5cfb5c826d3b99aa4e7901ae3a3d6c32
BLAKE2b-256 b888eb129e946909f01a7862e8a4ce8a5c18021954ba8bde9e3133f092027b75

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page