Skip to main content

Extension for `pandas` to use `rapidfuzz` for fuzzy matching.

Project description

pandas-fuzz

PyPI - Python Version PyPI PyPI - Downloads PyPI - License GitHub Workflow Test) Website GitHub tag (with filter) codecov pre-commit


Extension for pandas to use rapidfuzz for fuzzy matching.

Requirements

Installation

pip install pandas_fuzz

Usage

To register the extension make sure to import pandas_fuzz before using it`.

import pandas as pd
import pandas_fuzz

Alternatively, you can import pandas from pandas_fuzz directly.

from pandas_fuzz import pandas as pd

rapidfuzz.fuzz

pandas_fuzz integrates the following functions from rapidfuzz.fuzz into pandas. These functions are available in the fuzz namespace for both pandas.Series and pandas.DataFrame.

  • rapidfuzz.fuzz.ratio
  • rapidfuzz.fuzz.partial_ratio
  • rapidfuzz.fuzz.partial_ratio_alignment
  • rapidfuzz.fuzz.token_sort_ratio
  • rapidfuzz.fuzz.token_set_ratio
  • rapidfuzz.fuzz.token_ratio
  • rapidfuzz.fuzz.partial_token_sort_ratio
  • rapidfuzz.fuzz.partial_token_set_ratio
  • rapidfuzz.fuzz.partial_token_ratio
  • rapidfuzz.fuzz.WRatio
  • rapidfuzz.fuzz.QRatio

pandas.Series

apply fuzz.ratio element wise to pd.Series.

>>> pd.Series(["this is a test", "this is a test!"]).fuzz.ratio("this is a test!")
0     96.551724
1    100.000000
dtype: float64

pandas.DataFrame

apply fuzz.ratio row wise to columns s1 and s2

>>> pd.DataFrame({
    "s1": ["this is a test", "this is a test!"],
    "s2": ["this is a test", "this is a test!"]
}).fuzz.ratio("s1", "s2")
0    100.0
1    100.0
dtype: float64

Dependencies

PyPI - pandas PyPI - Version


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pandas_fuzz-0.1.4.tar.gz (6.8 kB view hashes)

Uploaded Source

Built Distribution

pandas_fuzz-0.1.4-py3-none-any.whl (6.4 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page