Fuzzy string matching in Python
Project description
Fuzzy-Match
Fuzzy string matching in Python. By default it uses Trigrams to calculate a similarity score and find matches by splitting strings into ngrams with a length of 3. The length of the ngram can be altered if desired. Also, Cosine, Levenshtein Distance, and Jaro-Winkler Distance algorithims are also available as alternatives.
Usage
>>> from fuzzy_match import match
>>> from fuzzy_match import algorithims
Trigram
>>> algorithims.trigram("this is a test string", "this is another test string")
0.703704
Cosine
>>> algorithims.cosine("this is a test string", "this is another test string")
0.7999999999999998
Levenshtein
>>> algorithims.levenshtein("this is a test string", "this is another test string")
0.7777777777777778
Jaro-Winkler
>>> algorithims.jaro_winkler("this is a test string", "this is another test string")
0.798941798941799
Match
>>> choices = ["simple strings", "strings are simple", "sim string", "string to match", "matching simple strings", "matching strings again"]
>>> match.extract("simple string", choices, limit=2)
[('simple strings', 0.8), ('sim string', 0.642857)]
>>> match.extractOne("simple string", choices)
('simple strings', 0.8)
You can also pass additional arguments to extract
and extractOne
to set a score cutoff value or use one of the other algorithims mentioned above. Here is an example:
>>> match.extract("simple string", choices, match_type='levenshtein', score_cutoff=0.7)
[('simple strings', 0.9285714285714286), ('sim string', 0.7692307692307693)]
match_type
options include trigram
, cosine
, levenshtein
, jaro_winkler
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
fuzzy-match-0.0.1.tar.gz
(4.2 kB
view hashes)
Built Distribution
Close
Hashes for fuzzy_match-0.0.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b0cc8eede335bfd7ab18509da593fef5b5336e2eec0757f7bb886c828d1ff849 |
|
MD5 | b1ac251e92c7a58060c94538eb7bd271 |
|
BLAKE2b-256 | e3aebe76d0df7d70f5912b0475bd06b48920d87fa18254666c86d8bdd4911678 |