A library to match and compare strings.
Project description
Stringmatch
Yet another small, lightweight string matching library written in Python, based on the Levenshtein distance and the Levenshtein Python C Extension.
Inspired by seatgeek/thefuzz, which did not quite fit my needs, so I am building this library for myself, primarily.
Table of Contents
Requirements
- Python 3.9 or later.
Installation
Via pip:
pip install stringmatch
Via git:
pip install -U git+https://github.com/atomflunder/stringmatch
Usage
from stringmatch import Match, Ratio, Strings
# Basic usage:
Match().match("searchlib", "srchlib") # returns True
Match().match("searchlib", "something else") # returns False
# Matching lists:
searches = ["searchli", "searhli", "search", "lib", "whatever", "s"]
Match().get_best_match("searchlib", searches) # returns "searchli"
Match().get_best_matches("searchlib", searches) # returns ['searchli', 'searhli', 'search']
# Ratios:
Ratio().ratio("searchlib", "searchlib") # returns 100
Ratio().ratio("searchlib", "srechlib") # returns 82
ratios = ["searchlib", "srechlib"]
Ratio().ratio_list("searchlib", ratios) # returns [100, 82]
# Modify strings:
Strings().latinise("Héllö, world!") # returns "Hello, world!"
Strings().remove_punctuation("wh'at;, ever") # returns "what ever"
Strings().only_letters("Héllö, world!") # returns "Hll world"
Strings().ignore_case("test test!", lower=False) # returns "TEST TEST!"
You can pass in additional arguments for the Match()
functions to customise your search further:
score=int
The score cutoff for matching, by default set to 70.
Match().match("searchlib", "srechlib", score=85) # returns False
Match().match("searchlib", "srechlib", score=70) # returns True
limit=int
The limit of how many matches to return. Only available for Matches().get_best_matches()
. By default this is set to 5
.
searches = ["limit 5", "limit 4", "limit 3", "limit 2", "limit 1", "limit 0"]
Match().get_best_matches("limit 5", searches, limit=2) # returns ["limit 5", "limit 4"]
Match().get_best_matches("limit 5", searches, limit=1) # returns ["limit 5"]
latinise=bool
Replaces special unicode characters with their latin alphabet equivalents. By default turned off.
Match().match("séärçh", "search", latinise=True) # returns True
Match().match("séärçh", "search", latinise=False) # returns False
ignore_case=bool
If you want to ignore case sensitivity while searching. By default turned off.
Match().match("test", "TEST", ignore_case=True) # returns True
Match().match("test", "TEST", ignore_case=False) # returns False
remove_punctuation=bool
Removes commonly used punctuation symbols from the strings, like .,;:!?
and so on. Be careful when using this, because if you pass in a string that is only made up of punctuation symbols, you will get an EmptySearchException
. By default turned off.
Match().match("test,---....", "test", remove_punctuation=True) # returns True
Match().match("test,---....", "test", remove_punctuation=False) # returns False
only_letters=bool
Removes every character that is not in the latin alphabet, a more extreme version of remove_punctuation
. The same rules apply here, be careful when you use it or you might get an EmptySearchException
. By default turned off.
Match().match("»»ᅳtestᅳ►", "test", only_letters=True) # returns True
Match().match("»»ᅳtestᅳ►", "test", only_letters=False) # returns False
scorer=str
The scoring algorithm to use, the available options are: "levenshtein"
, "jaro"
, "jaro_winkler"
. Different algorithms will produce different results, obviously. By default set to "levenshtein"
.
Match().match("test", "th test", scorer="levenshtein") # returns True (score = 73)
Match().match("test", "th test", scorer="jaro_winkler") # returns False (score = 60)
Links
Packages used:
Related packages:
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Hashes for stringmatch-0.3.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2e342cd7136dd16c280b025dd2abbf79ef9c16b81d0f758a19155ee2310c3462 |
|
MD5 | 7c837427e10ed33d28ce88658cc97bcb |
|
BLAKE2b-256 | 4849c5025e8b1a6c4fd9842ee8cf39195c316a2e65d8be00d8a9282bf01c59d6 |