Skip to main content

Fast fuzzy string matching.

Project description

FastFuzzy

A Python module for fuzzy string matching and similarity measuring with constant lookup time by building an in-memory index.

This package relies on abydos, providing a sheer endless amount of distance metrics.

Installation

pip install fuzzyfast

Usage

Create Index

from fastfuzzy import QGramIndex

index = QGramIndex(tokens=["word1", "word2"], q=2)

Or use the class methods to read the tokens from a file:

with open("file.txt") as f:
    index = QGramIndex.from_file(f)

Or:

index = QGramIndex.from_path("file.txt")

In both cases, the input file is expected to be a list of tokens line by line.

Query for most similar token

index.max_sim("word1")
("word1", 1.0)

If no token in the index as any overlap with the input token, it returns (None, 0.0).

Merge two indices

index1 = QGramIndex(tokens=["token1"])
index2 = QGramIndex(tokens=["token2"])
index3 = index1 + index2

Alternative distance metrics

By default, the index uses the standard QGram distance metric. Alternatively, all other metrics defined in the abydos distance package can be specified with the cmp argument:

QGramIndex(tokens=[...], cmp=abydos.distance.PositionalQGramDice)

Testing

In order to run the tests locally, install the test dependencies:

pip install -e .
pip install -e .[test]

And run the tests:

pytest -v --cov=src tests/

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fastfuzzy-0.0.6.tar.gz (5.1 kB view details)

Uploaded Source

Built Distribution

fastfuzzy-0.0.6-py3-none-any.whl (5.4 kB view details)

Uploaded Python 3

File details

Details for the file fastfuzzy-0.0.6.tar.gz.

File metadata

  • Download URL: fastfuzzy-0.0.6.tar.gz
  • Upload date:
  • Size: 5.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/51.1.0 requests-toolbelt/0.9.1 tqdm/4.55.1 CPython/3.8.7

File hashes

Hashes for fastfuzzy-0.0.6.tar.gz
Algorithm Hash digest
SHA256 39bcf3d8b7628176675de8879d49027b4dfe6a44ebf081631738f325bcd2dc09
MD5 541ddc8305b81d0e7d0d82c3c150a01e
BLAKE2b-256 b16589475642c89fdde8a30e4142b199bc4a45e79ac379e10918f219a710f13f

See more details on using hashes here.

File details

Details for the file fastfuzzy-0.0.6-py3-none-any.whl.

File metadata

  • Download URL: fastfuzzy-0.0.6-py3-none-any.whl
  • Upload date:
  • Size: 5.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/51.1.0 requests-toolbelt/0.9.1 tqdm/4.55.1 CPython/3.8.7

File hashes

Hashes for fastfuzzy-0.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 2d2e84ef72574a31e483e2aecd214a1f99c9a8f8ae8121a1ed352d027cd64b7f
MD5 6a8fe249f0364756e45fe638483a44e3
BLAKE2b-256 16d1d05b0a3f9a0954348b2ed77cc9b656897378ccace831a54a0f991201eaf5

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page