Skip to main content

A library to match and compare strings.

Project description

Stringmatch

Code style: black

Yet another small, lightweight string matching library written in Python, based on the Levenshtein distance and the Levenshtein Python C Extension.
Inspired by seatgeek/thefuzz, which did not quite fit my needs, so I am building this library for myself, primarily.

Table of Contents

Requirements

  • Python 3.9 or later.

Installation

Via pip:

pip install stringmatch

Via git:

pip install -U git+https://github.com/atomflunder/stringmatch

Usage

from stringmatch import Match, Ratio, Strings

# Basic usage:
Match().match("searchlib", "srchlib")               # returns True
Match().match("searchlib", "something else")        # returns False

# Matching lists:
searches = ["searchli", "searhli", "search", "lib", "whatever", "s"]
Match().get_best_match("searchlib", searches)       # returns "searchli"
Match().get_best_matches("searchlib", searches)     # returns ['searchli', 'searhli', 'search']

# Ratios:
Ratio().ratio("searchlib", "searchlib")             # returns 100
Ratio().ratio("searchlib", "srechlib")              # returns 82
ratios = ["searchlib", "srechlib"]
Ratio().ratio_list("searchlib", ratios)             # returns [100, 82]

# Modify strings:
Strings().latinise("Héllö, world!")                 # returns "Hello, world!"
Strings().remove_punctuation("wh'at;, ever")        # returns "what ever"
Strings().only_letters("Héllö, world!")             # returns "Hll world"
Strings().ignore_case("test test!", lower=False)    # returns "TEST TEST!"

You can pass in additional arguments for the Match() functions to customise your search further:

score=int

The score cutoff for matching, by default set to 70.

Match().match("searchlib", "srechlib", score=85)    # returns False
Match().match("searchlib", "srechlib", score=70)    # returns True

limit=int

The limit of how many matches to return. Only available for Matches().get_best_matches(). By default this is set to 5.

searches = ["limit 5", "limit 4", "limit 3", "limit 2", "limit 1", "limit 0"]
Match().get_best_matches("limit 5", searches, limit=2)  # returns ["limit 5", "limit 4"]
Match().get_best_matches("limit 5", searches, limit=1)  # returns ["limit 5"]

latinise=bool

Replaces special unicode characters with their latin alphabet equivalents. By default turned off.

Match().match("séärçh", "search", latinise=True)    # returns True
Match().match("séärçh", "search", latinise=False)   # returns False

ignore_case=bool

If you want to ignore case sensitivity while searching. By default turned off.

Match().match("test", "TEST", ignore_case=True)     # returns True
Match().match("test", "TEST", ignore_case=False)    # returns False

remove_punctuation=bool

Removes commonly used punctuation symbols from the strings, like .,;:!? and so on. Be careful when using this, because if you pass in a string that is only made up of punctuation symbols, you will get an EmptySearchException. By default turned off.

Match().match("test,---....", "test", remove_punctuation=True)  # returns True
Match().match("test,---....", "test", remove_punctuation=False) # returns False

only_letters=bool

Removes every character that is not in the latin alphabet, a more extreme version of remove_punctuation. The same rules apply here, be careful when you use it or you might get an EmptySearchException. By default turned off.

Match().match("»»ᅳtestᅳ►", "test", only_letters=True)   # returns True
Match().match("»»ᅳtestᅳ►", "test", only_letters=False)  # returns False

scorer=str

The scoring algorithm to use, the available options are: "levenshtein", "jaro", "jaro_winkler". Different algorithms will produce different results, obviously. By default set to "levenshtein".

Match().match("test", "th test", scorer="levenshtein")  # returns True (score = 73)
Match().match("test", "th test", scorer="jaro_winkler") # returns False (score = 60)

Links

Packages used:

Related packages:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

stringmatch-0.3.0-py3-none-any.whl (12.5 kB view details)

Uploaded Python 3

File details

Details for the file stringmatch-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: stringmatch-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 12.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.9.2

File hashes

Hashes for stringmatch-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c3890bcc87de32fb0aee90af1e7b120b1dc1dc5fdeb8a90568d08c8860792b9c
MD5 40976d2c1f629b1dc9db424f53be5fd9
BLAKE2b-256 08ed71f35e5fc8fbf6fa7f0a3781121264d2d2014e27fbb115a34b463988cf1a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page