Skip to main content

Fuzzy String Matching with custom objects in Python

Project description

https://github.com/seatgeek/thefuzz/actions/workflows/ci.yml/badge.svg

TheFuzz

Fuzzy string matching like a boss. It uses Levenshtein Distance to calculate the differences between sequences in a simple-to-use package.

Requirements

For testing

  • pycodestyle

  • hypothesis

  • pytest

Installation

Using pip via PyPI

pip install thefuzz

Using pip via GitHub

pip install git+git://github.com/seatgeek/thefuzz.git@0.19.0#egg=thefuzz

Adding to your requirements.txt file (run pip install -r requirements.txt afterwards)

git+ssh://git@github.com/seatgeek/thefuzz.git@0.19.0#egg=thefuzz

Manually via GIT

git clone git://github.com/seatgeek/thefuzz.git thefuzz
cd thefuzz
python setup.py install

Usage

>>> from thefuzz import fuzz
>>> from thefuzz import process

Simple Ratio

>>> fuzz.ratio("this is a test", "this is a test!")
    97

Partial Ratio

>>> fuzz.partial_ratio("this is a test", "this is a test!")
    100

Token Sort Ratio

>>> fuzz.ratio("fuzzy wuzzy was a bear", "wuzzy fuzzy was a bear")
    91
>>> fuzz.token_sort_ratio("fuzzy wuzzy was a bear", "wuzzy fuzzy was a bear")
    100

Token Set Ratio

>>> fuzz.token_sort_ratio("fuzzy was a bear", "fuzzy fuzzy was a bear")
    84
>>> fuzz.token_set_ratio("fuzzy was a bear", "fuzzy fuzzy was a bear")
    100

Partial Token Sort Ratio

>>> fuzz.token_sort_ratio("fuzzy was a bear", "wuzzy fuzzy was a bear")
    84
>>> fuzz.partial_token_sort_ratio("fuzzy was a bear", "wuzzy fuzzy was a bear")
    100

Process

>>> choices = ["Atlanta Falcons", "New York Jets", "New York Giants", "Dallas Cowboys"]
>>> process.extract("new york jets", choices, limit=2)
    [('New York Jets', 100), ('New York Giants', 78)]
>>> process.extractOne("cowboys", choices)
    ("Dallas Cowboys", 90)

You can also pass additional parameters to extractOne method to make it use a specific scorer. A typical use case is to match file paths:

>>> process.extractOne("System of a down - Hypnotize - Heroin", songs)
    ('/music/library/good/System of a Down/2005 - Hypnotize/01 - Attack.mp3', 86)
>>> process.extractOne("System of a down - Hypnotize - Heroin", songs, scorer=fuzz.token_sort_ratio)
    ("/music/library/good/System of a Down/2005 - Hypnotize/10 - She's Like Heroin.mp3", 61)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

the_fuzz_with_custom_object-0.22.5.tar.gz (20.1 kB view details)

Uploaded Source

Built Distribution

File details

Details for the file the_fuzz_with_custom_object-0.22.5.tar.gz.

File metadata

File hashes

Hashes for the_fuzz_with_custom_object-0.22.5.tar.gz
Algorithm Hash digest
SHA256 25841b75372e3ed65973e5669de3331853ee2fb0979321e33b45480e0d7f6bcb
MD5 da17cf0d2c876fad5cb7b93832ea0fe9
BLAKE2b-256 2f89e0b427219c76b2a9d976fbab52d26f01ec6856a4db8a050d520cf17e4490

See more details on using hashes here.

File details

Details for the file the_fuzz_with_custom_object-0.22.5-py3-none-any.whl.

File metadata

File hashes

Hashes for the_fuzz_with_custom_object-0.22.5-py3-none-any.whl
Algorithm Hash digest
SHA256 13c4d88a5aafa219332fed08eaf407ec4167046aa47a06b13b4a455fd1add827
MD5 ef9720337d513c33f48c65a32416bbb7
BLAKE2b-256 02f9e5aae4ec29e7d9e2eed64f9f4dbbb95ea8770aa751855c910e5e42364b53

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page