A smart match package
Project description
Introduction
The smart-match module contains functions for calculating strings/sets similarity.
Concept
-
similarity: A value in a range of [0, 1], which represents how similar the two strings are. The larger the value, the more similar the two strings are.
-
dissimilarity: A value in a range of [0, 1], which represents how dissimilar the two strings are. The larger the value, the more dissimilar the two strings are. For a pair of strings, similarity = 1 - dissimilarity
-
distance: How far the two strings are. Notice that not all the methods support distance method.
-
score The larger the score, the more similar the two strings are. Notice not all the methods have score method.
We support three levels of string matching.
-
char: Similarity computation based on characters in the strings.
-
term: Similarity computation based on terms in the strings.
-
gram: Similarity computation based on q-grams in the strings.
Methods
We support the following methods.
| Method | similarity | dissimilarity | distance | score |
|---|---|---|---|---|
| Levenshtein (default) | ✅ | ✅ | ✅ | ❌ |
| Euclidean | ✅ | ✅ | ✅ | ❌ |
| Damerau Levenshtein | ✅ | ✅ | ✅ | ❌ |
| Block Distance | ✅ | ✅ | ✅ | ❌ |
| Cosine | ✅ | ✅ | ❌ | ❌ |
| Tanimoto Coefficient | ✅ | ✅ | ❌ | ❌ |
| Dice | ✅ | ✅ | ❌ | ❌ |
| Simon White | ✅ | ✅ | ❌ | ❌ |
| Longest Common Substring | ✅ | ✅ | ✅ | ✅ |
| Longest Common SubSequence | ✅ | ✅ | ✅ | ✅ |
| Overlap Coefficient | ✅ | ✅ | ❌ | ❌ |
| Generalized Overlap Coefficient | ✅ | ✅ | ❌ | ❌ |
| Jaccard | ✅ | ✅ | ❌ | ❌ |
| Generalized Jaccard | ✅ | ✅ | ❌ | ❌ |
| Hamming | ✅ | ✅ | ✅ | ❌ |
| Jaro | ✅ | ✅ | ❌ | ❌ |
| Jaro Winkler | ✅ | ✅ | ❌ | ❌ |
| Needleman Wunch | ✅ | ✅ | ❌ | ✅ |
| Smith Waterman | ✅ | ✅ | ❌ | ✅ |
| Smith Waterman Gotoh | ✅ | ✅ | ❌ | ✅ |
| Monge Elkan | ✅ | ✅ | ❌ | ❌ |
Installation
pip install smart-match
Usage
import smart_match
print(smart_match.similarity('hello', 'hero'))
print(smart_match.dissimilarity('hello', 'hero'))
print(smart_match.distance('hello', 'hero'))
Output:
0.6
0.4
2
Check Wiki for more details.
License
smart-match is a free software. See the file LICENSE for the full text.
Authors
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file smart_match-0.1.1.tar.gz.
File metadata
- Download URL: smart_match-0.1.1.tar.gz
- Upload date:
- Size: 22.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.22.0 setuptools/44.0.0 requests-toolbelt/0.9.1 tqdm/4.55.0 CPython/3.8.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5b38be5f67c2990dad455bef08164c0a4bcb78e885c1e89ef19c9e95089ccdfc
|
|
| MD5 |
8c2e3ab715c795939243439969d1d978
|
|
| BLAKE2b-256 |
812157d19bbb5c884337946fd09b397b2ed8e9fa6d2778cf4ab7393332dd99c7
|
File details
Details for the file smart_match-0.1.1-py3-none-any.whl.
File metadata
- Download URL: smart_match-0.1.1-py3-none-any.whl
- Upload date:
- Size: 40.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.22.0 setuptools/44.0.0 requests-toolbelt/0.9.1 tqdm/4.55.0 CPython/3.8.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e11a9db75442612f38fb0080ce474f555c7105eaaa8b6f00e8309d74f5349a6d
|
|
| MD5 |
365afbf0f56761e91b4428aa330d5664
|
|
| BLAKE2b-256 |
3f2a47991f181d7fe70ef072eb380bcc1ebea6dac7b98704d1f54cd9679a0635
|