Skip to main content

A library for fuzzymatching

Project description

floof: simple fuzzymatching / comparison library

PyPI - Downloads

What is it?

floof is a Python package that makes fuzzymatching easy. Fuzzymatching is a common data task required whenever two strings don't quite exactly match. There are many algorithms to calculate string similarity, with dozens of disparate implementations. floof aims to collect all of these in an easy-to-use package, and reduce the boilerplate needed to apply these algorithms to your data.

Usage:

Dependencies

  • [pandas - Output ]
  • [scikit-learn - Used to implement TFIDF]
  • [sparse_dot_topn - Fast sparse matrix multiplication]

Installing

The easiest way is to install floof is from PyPI using pip:

pip install floof

Running

First, import the library.

import floof

Floof provides two classes: Comparer and Matcher. Both are instantiated the same way, taking as arguments two Pandas Series, an "original" and a "lookup", although in practice the order doesn't madder.

matcher = floof.Matcher(original, lookup)
comparer = floof.Comparer(original, lookup)

All functions in the Matcher class return a crosswalk of the original strings and the best k matches from the lookup strings. The primary convenience function is floof.Matcher().match(), which applies several different similarity algorithms and produces a composite score. Given an example input of:

original_names = ["apple", "pear"]
lookup_names = ["appl", "apil", "prear"]

A matcher function would return something like:

original_name lookup_name levenshtein_score tfidf_score final_score
apple appl 90 80 85
apple apil 70 85 77.5
pear prear 95 90 92.5

The Comparer class is meant to compare strings one-to-one. That is to say, given an input of:

original_names = ["apple", "pear"]
lookup_names = ["appl", "apil"]

A comparer function would return something like:

levensthein_score
90
95

Performance

Fuzzymatching can be very intense, as many algorithms are by nature quadratic. For each original string, you must compare against all lookup strings. Therefore, floof is by default concurrent. It also can perform common-sense speedups, like first removing exact matches from the pool, and using a non-quadratic algorithm (TFIDF) to filter the pool.

TODO:

  • Allow custom scorers

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

floof-0.1.11.tar.gz (70.1 kB view hashes)

Uploaded Source

Built Distributions

floof-0.1.11-pp310-pypy310_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (582.5 kB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ x86-64

floof-0.1.11-pp310-pypy310_pp73-manylinux_2_17_s390x.manylinux2014_s390x.whl (683.9 kB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ s390x

floof-0.1.11-pp310-pypy310_pp73-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl (695.5 kB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ ppc64le

floof-0.1.11-pp310-pypy310_pp73-manylinux_2_17_armv7l.manylinux2014_armv7l.whl (568.4 kB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ ARMv7l

floof-0.1.11-pp310-pypy310_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (560.5 kB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ ARM64

floof-0.1.11-pp310-pypy310_pp73-manylinux_2_12_i686.manylinux2010_i686.whl (614.8 kB view hashes)

Uploaded PyPy manylinux: glibc 2.12+ i686

floof-0.1.11-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (582.5 kB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ x86-64

floof-0.1.11-pp39-pypy39_pp73-manylinux_2_17_s390x.manylinux2014_s390x.whl (683.9 kB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ s390x

floof-0.1.11-pp39-pypy39_pp73-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl (695.5 kB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ ppc64le

floof-0.1.11-pp39-pypy39_pp73-manylinux_2_17_armv7l.manylinux2014_armv7l.whl (568.4 kB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ ARMv7l

floof-0.1.11-pp39-pypy39_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (560.5 kB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ ARM64

floof-0.1.11-pp39-pypy39_pp73-manylinux_2_12_i686.manylinux2010_i686.whl (614.8 kB view hashes)

Uploaded PyPy manylinux: glibc 2.12+ i686

floof-0.1.11-pp38-pypy38_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (582.5 kB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ x86-64

floof-0.1.11-pp38-pypy38_pp73-manylinux_2_17_s390x.manylinux2014_s390x.whl (683.9 kB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ s390x

floof-0.1.11-pp38-pypy38_pp73-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl (695.6 kB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ ppc64le

floof-0.1.11-pp38-pypy38_pp73-manylinux_2_17_armv7l.manylinux2014_armv7l.whl (568.4 kB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ ARMv7l

floof-0.1.11-pp38-pypy38_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (560.4 kB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ ARM64

floof-0.1.11-pp38-pypy38_pp73-manylinux_2_12_i686.manylinux2010_i686.whl (614.8 kB view hashes)

Uploaded PyPy manylinux: glibc 2.12+ i686

floof-0.1.11-pp37-pypy37_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (584.1 kB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ x86-64

floof-0.1.11-pp37-pypy37_pp73-manylinux_2_17_s390x.manylinux2014_s390x.whl (686.7 kB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ s390x

floof-0.1.11-pp37-pypy37_pp73-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl (697.0 kB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ ppc64le

floof-0.1.11-pp37-pypy37_pp73-manylinux_2_17_armv7l.manylinux2014_armv7l.whl (569.9 kB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ ARMv7l

floof-0.1.11-pp37-pypy37_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (561.4 kB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ ARM64

floof-0.1.11-pp37-pypy37_pp73-manylinux_2_12_i686.manylinux2010_i686.whl (616.1 kB view hashes)

Uploaded PyPy manylinux: glibc 2.12+ i686

floof-0.1.11-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (582.1 kB view hashes)

Uploaded CPython 3.12 manylinux: glibc 2.17+ x86-64

floof-0.1.11-cp312-cp312-manylinux_2_17_s390x.manylinux2014_s390x.whl (688.5 kB view hashes)

Uploaded CPython 3.12 manylinux: glibc 2.17+ s390x

floof-0.1.11-cp312-cp312-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl (692.1 kB view hashes)

Uploaded CPython 3.12 manylinux: glibc 2.17+ ppc64le

floof-0.1.11-cp312-cp312-manylinux_2_17_armv7l.manylinux2014_armv7l.whl (567.2 kB view hashes)

Uploaded CPython 3.12 manylinux: glibc 2.17+ ARMv7l

floof-0.1.11-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (559.5 kB view hashes)

Uploaded CPython 3.12 manylinux: glibc 2.17+ ARM64

floof-0.1.11-cp312-cp312-manylinux_2_12_i686.manylinux2010_i686.whl (613.8 kB view hashes)

Uploaded CPython 3.12 manylinux: glibc 2.12+ i686

floof-0.1.11-cp311-none-win_amd64.whl (459.6 kB view hashes)

Uploaded CPython 3.11 Windows x86-64

floof-0.1.11-cp311-none-win32.whl (450.5 kB view hashes)

Uploaded CPython 3.11 Windows x86

floof-0.1.11-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (582.8 kB view hashes)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64

floof-0.1.11-cp311-cp311-manylinux_2_17_s390x.manylinux2014_s390x.whl (683.7 kB view hashes)

Uploaded CPython 3.11 manylinux: glibc 2.17+ s390x

floof-0.1.11-cp311-cp311-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl (695.3 kB view hashes)

Uploaded CPython 3.11 manylinux: glibc 2.17+ ppc64le

floof-0.1.11-cp311-cp311-manylinux_2_17_armv7l.manylinux2014_armv7l.whl (568.3 kB view hashes)

Uploaded CPython 3.11 manylinux: glibc 2.17+ ARMv7l

floof-0.1.11-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (560.0 kB view hashes)

Uploaded CPython 3.11 manylinux: glibc 2.17+ ARM64

floof-0.1.11-cp311-cp311-manylinux_2_12_i686.manylinux2010_i686.whl (614.6 kB view hashes)

Uploaded CPython 3.11 manylinux: glibc 2.12+ i686

floof-0.1.11-cp311-cp311-macosx_11_0_arm64.whl (524.8 kB view hashes)

Uploaded CPython 3.11 macOS 11.0+ ARM64

floof-0.1.11-cp311-cp311-macosx_10_7_x86_64.whl (548.3 kB view hashes)

Uploaded CPython 3.11 macOS 10.7+ x86-64

floof-0.1.11-cp310-none-win_amd64.whl (459.6 kB view hashes)

Uploaded CPython 3.10 Windows x86-64

floof-0.1.11-cp310-none-win32.whl (450.5 kB view hashes)

Uploaded CPython 3.10 Windows x86

floof-0.1.11-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (582.8 kB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

floof-0.1.11-cp310-cp310-manylinux_2_17_s390x.manylinux2014_s390x.whl (683.7 kB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.17+ s390x

floof-0.1.11-cp310-cp310-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl (695.3 kB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.17+ ppc64le

floof-0.1.11-cp310-cp310-manylinux_2_17_armv7l.manylinux2014_armv7l.whl (568.3 kB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.17+ ARMv7l

floof-0.1.11-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (560.0 kB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.17+ ARM64

floof-0.1.11-cp310-cp310-manylinux_2_12_i686.manylinux2010_i686.whl (614.6 kB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.12+ i686

floof-0.1.11-cp310-cp310-macosx_11_0_arm64.whl (524.8 kB view hashes)

Uploaded CPython 3.10 macOS 11.0+ ARM64

floof-0.1.11-cp310-cp310-macosx_10_7_x86_64.whl (548.3 kB view hashes)

Uploaded CPython 3.10 macOS 10.7+ x86-64

floof-0.1.11-cp39-none-win_amd64.whl (459.6 kB view hashes)

Uploaded CPython 3.9 Windows x86-64

floof-0.1.11-cp39-none-win32.whl (450.5 kB view hashes)

Uploaded CPython 3.9 Windows x86

floof-0.1.11-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (582.8 kB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

floof-0.1.11-cp39-cp39-manylinux_2_17_s390x.manylinux2014_s390x.whl (683.7 kB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.17+ s390x

floof-0.1.11-cp39-cp39-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl (695.2 kB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.17+ ppc64le

floof-0.1.11-cp39-cp39-manylinux_2_17_armv7l.manylinux2014_armv7l.whl (568.3 kB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.17+ ARMv7l

floof-0.1.11-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (560.0 kB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.17+ ARM64

floof-0.1.11-cp39-cp39-manylinux_2_12_i686.manylinux2010_i686.whl (614.6 kB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.12+ i686

floof-0.1.11-cp38-none-win_amd64.whl (459.6 kB view hashes)

Uploaded CPython 3.8 Windows x86-64

floof-0.1.11-cp38-none-win32.whl (450.5 kB view hashes)

Uploaded CPython 3.8 Windows x86

floof-0.1.11-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (582.8 kB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64

floof-0.1.11-cp38-cp38-manylinux_2_17_s390x.manylinux2014_s390x.whl (683.6 kB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.17+ s390x

floof-0.1.11-cp38-cp38-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl (695.2 kB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.17+ ppc64le

floof-0.1.11-cp38-cp38-manylinux_2_17_armv7l.manylinux2014_armv7l.whl (568.2 kB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.17+ ARMv7l

floof-0.1.11-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (560.0 kB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.17+ ARM64

floof-0.1.11-cp38-cp38-manylinux_2_12_i686.manylinux2010_i686.whl (614.6 kB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.12+ i686

floof-0.1.11-cp37-none-win_amd64.whl (459.6 kB view hashes)

Uploaded CPython 3.7 Windows x86-64

floof-0.1.11-cp37-none-win32.whl (450.5 kB view hashes)

Uploaded CPython 3.7 Windows x86

floof-0.1.11-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (582.8 kB view hashes)

Uploaded CPython 3.7m manylinux: glibc 2.17+ x86-64

floof-0.1.11-cp37-cp37m-manylinux_2_17_s390x.manylinux2014_s390x.whl (683.6 kB view hashes)

Uploaded CPython 3.7m manylinux: glibc 2.17+ s390x

floof-0.1.11-cp37-cp37m-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl (695.2 kB view hashes)

Uploaded CPython 3.7m manylinux: glibc 2.17+ ppc64le

floof-0.1.11-cp37-cp37m-manylinux_2_17_armv7l.manylinux2014_armv7l.whl (568.2 kB view hashes)

Uploaded CPython 3.7m manylinux: glibc 2.17+ ARMv7l

floof-0.1.11-cp37-cp37m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (560.0 kB view hashes)

Uploaded CPython 3.7m manylinux: glibc 2.17+ ARM64

floof-0.1.11-cp37-cp37m-manylinux_2_12_i686.manylinux2010_i686.whl (614.6 kB view hashes)

Uploaded CPython 3.7m manylinux: glibc 2.12+ i686

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page