Fast fuzzy string matching.
Project description
FastFuzzy
A Python module for fuzzy string matching and similarity measuring with constant lookup time by building an in-memory index.
This package relies on abydos, providing a sheer endless amount of distance metrics.
Installation
pip install fuzzyfast
Usage
Create Index
from fastfuzzy import QGramIndex
index = QGramIndex(tokens=["word1", "word2"], q=2)
Or use the class methods to read the tokens from a file:
with open("file.txt") as f:
index = QGramIndex.from_file(f)
Or:
index = QGramIndex.from_path("file.txt")
In both cases, the input file is expected to be a list of tokens line by line.
Query for most similar token
index.max_sim("word1")
("word1", 1.0)
If no token in the index as any overlap with the input token, it returns (None, 0.0)
.
Merge two indices
index1 = QGramIndex(tokens=["token1"])
index2 = QGramIndex(tokens=["token2"])
index3 = index1 + index2
Alternative distance metrics
By default, the index uses the standard QGram distance metric.
Alternatively, all other metrics defined in the abydos distance package can be specified with the cmp
argument:
QGramIndex(tokens=[...], cmp=abydos.distance.PositionalQGramDice)
Testing
In order to run the tests locally, install the test dependencies:
pip install -e .
pip install -e .[test]
And run the tests:
pytest -v --cov=src tests/
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file fastfuzzy-0.0.6.tar.gz
.
File metadata
- Download URL: fastfuzzy-0.0.6.tar.gz
- Upload date:
- Size: 5.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/51.1.0 requests-toolbelt/0.9.1 tqdm/4.55.1 CPython/3.8.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 39bcf3d8b7628176675de8879d49027b4dfe6a44ebf081631738f325bcd2dc09 |
|
MD5 | 541ddc8305b81d0e7d0d82c3c150a01e |
|
BLAKE2b-256 | b16589475642c89fdde8a30e4142b199bc4a45e79ac379e10918f219a710f13f |
File details
Details for the file fastfuzzy-0.0.6-py3-none-any.whl
.
File metadata
- Download URL: fastfuzzy-0.0.6-py3-none-any.whl
- Upload date:
- Size: 5.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/51.1.0 requests-toolbelt/0.9.1 tqdm/4.55.1 CPython/3.8.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2d2e84ef72574a31e483e2aecd214a1f99c9a8f8ae8121a1ed352d027cd64b7f |
|
MD5 | 6a8fe249f0364756e45fe638483a44e3 |
|
BLAKE2b-256 | 16d1d05b0a3f9a0954348b2ed77cc9b656897378ccace831a54a0f991201eaf5 |