A smart match package
Project description
Introduction
The smart-match module contains functions for calculating strings/sets similarity.
Concept
-
similarity: A value in a range of [0, 1], which represents how similar the two strings are. The larger the value, the more similar the two strings are.
-
dissimilarity: A value in a range of [0, 1], which represents how dissimilar the two strings are. The larger the value, the more dissimilar the two strings are. For a pair of strings, similarity = 1 - dissimilarity
-
distance: How far the two strings are. Notice that not all the methods support distance method.
-
score The larger the score, the more similar the two strings are. Notice not all the methods have score method.
We support three levels of string matching.
-
char: Similarity computation based on characters in the strings.
-
term: Similarity computation based on terms in the strings.
-
gram: Similarity computation based on q-grams in the strings.
Methods
We support the following methods.
Method | similarity | dissimilarity | distance | score |
---|---|---|---|---|
Levenshtein (default) | ✅ | ✅ | ✅ | ❌ |
Euclidean | ✅ | ✅ | ✅ | ❌ |
Damerau Levenshtein | ✅ | ✅ | ✅ | ❌ |
Block Distance | ✅ | ✅ | ✅ | ❌ |
Cosine | ✅ | ✅ | ❌ | ❌ |
Tanimoto Coefficient | ✅ | ✅ | ❌ | ❌ |
Dice | ✅ | ✅ | ❌ | ❌ |
Simon White | ✅ | ✅ | ❌ | ❌ |
Longest Common Substring | ✅ | ✅ | ✅ | ✅ |
Longest Common SubSequence | ✅ | ✅ | ✅ | ✅ |
Overlap Coefficient | ✅ | ✅ | ❌ | ❌ |
Generalized Overlap Coefficient | ✅ | ✅ | ❌ | ❌ |
Jaccard | ✅ | ✅ | ❌ | ❌ |
Generalized Jaccard | ✅ | ✅ | ❌ | ❌ |
Hamming | ✅ | ✅ | ✅ | ❌ |
Jaro | ✅ | ✅ | ❌ | ❌ |
Jaro Winkler | ✅ | ✅ | ❌ | ❌ |
Needleman Wunch | ✅ | ✅ | ❌ | ✅ |
Smith Waterman | ✅ | ✅ | ❌ | ✅ |
Smith Waterman Gotoh | ✅ | ✅ | ❌ | ✅ |
Monge Elkan | ✅ | ✅ | ❌ | ❌ |
Installation
pip install smart-match
Usage
import smart_match
print(smart_match.similarity('hello', 'hero'))
print(smart_match.dissimilarity('hello', 'hero'))
print(smart_match.distance('hello', 'hero'))
Output:
0.6
0.4
2
Check Wiki for more details.
License
smart-match is a free software. See the file LICENSE for the full text.
Authors
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for smart_match-0.1.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e11a9db75442612f38fb0080ce474f555c7105eaaa8b6f00e8309d74f5349a6d |
|
MD5 | 365afbf0f56761e91b4428aa330d5664 |
|
BLAKE2b-256 | 3f2a47991f181d7fe70ef072eb380bcc1ebea6dac7b98704d1f54cd9679a0635 |