Skip to main content

Fast fuzzy string matching.

Project description

FastFuzzy

A Python module for fuzzy string matching and similarity measuring with constant lookup time by building an in-memory index.

This package relies on abydos, providing a sheer endless amount of distance metrics.

Installation

pip install fuzzyfast

Usage

Create Index

from fastfuzzy import QGramIndex

index = QGramIndex(tokens=["word1", "word2"], q=2)

Or use the class methods to read the tokens from a file:

with open("file.txt") as f:
    index = QGramIndex.from_file(f)

Or:

index = QGramIndex.from_path("file.txt")

In both cases, the input file is expected to be a list of tokens line by line.

Query for most similar token

index.max_sim("word1")
("word1", 1.0)

If no token in the index as any overlap with the input token, it returns (None, 0.0).

Merge two indices

index1 = QGramIndex(tokens=["token1"])
index2 = QGramIndex(tokens=["token2"])
index3 = index1 + index2

Alternative distance metrics

By default, the index uses the standard QGram distance metric. Alternatively, all other metrics defined in the abydos distance package can be specified with the cmp argument:

QGramIndex(tokens=[...], cmp=abydos.distance.PositionalQGramDice)

Testing

In order to run the tests locally, install the test dependencies:

pip install -e .
pip install -e .[test]

And run the tests:

pytest -v --cov=src tests/

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fastfuzzy-0.0.6.tar.gz (5.1 kB view hashes)

Uploaded Source

Built Distribution

fastfuzzy-0.0.6-py3-none-any.whl (5.4 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page