Skip to main content

PolyFuzz performs fuzzy string matching, grouping, and evaluation.

Project description

PyPI - Python PyPI - License PyPI - PyPi Build

PolyFuzz performs fuzzy string matching, string grouping, and contains extensive evaluation functions. PolyFuzz is meant to bring fuzzy string matching techniques together within a single framework.

Currently, methods include Levenshtein distance with RapidFuzz, a character-based n-gram TF-IDF, word embedding techniques such as FastText and GloVe, and finally 🤗 transformers embeddings.

You can use your own custom models for both the fuzzy string matching as well as the string grouping.

Corresponding medium post can be found here.

Getting Started

Back to ToC

from polyfuzz import PolyFuzz

from_list = ["apple", "apples", "appl", "recal", "house", "similarity"]
to_list = ["apple", "apples", "mouse"]

model = PolyFuzz("TF-IDF").match(from_list, to_list)

The resulting string matches can be accessed through model.get_matches():

>>> model.get_matches()
      From      To  Similarity
     apple   apple    1.000000
    apples  apples    1.000000
      appl   apple    0.783751
     recal    None    0.000000
     house   mouse    0.587927
similarity    None    0.000000

References

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

polyfuzz-0.0.1.tar.gz (14.6 kB view hashes)

Uploaded Source

Built Distribution

polyfuzz-0.0.1-py2.py3-none-any.whl (21.4 kB view hashes)

Uploaded Python 2 Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page