Fuzzy Comparison Utilities for DataFrame Columns
Project description
Fuzzy Comparison Utilities for DataFrame Columns
pip install fuzzypandaswuzzy
Tested against Windows 10 / Python 3.10 / Anaconda
This module provides a function to perform fuzzy comparison between two columns of a DataFrame using the RapidFuzz library.
It also extends the DataFrame class to add a method for fuzzy comparison between two columns.
Module dependencies:
- pandas (pd)
- numpy (np)
- RapidFuzz (from rapidfuzz import process, fuzz)
Usage:
import pandas as pd
from rapidfuzz import fuzz
from fuzzypandaswuzzy import pd_add_fuzzy_all
pd_add_fuzzy_all()
df = pd.read_csv(r"arcore_devicelist.csv")
df2 = df.d_fuzzy2cols(scorer=fuzz.QRatio) # compares the first 2 columns
aa_value1 aa_match aa_index_v2 aa_value2
0 Mobicel 82.352943 1978 Mobicel_R1
1 Hyundai 66.666664 5425 Cunda
2 OPPO 66.666664 10102 P7PRO
3 samsung 80.000000 745 samseong
4 DEXP 66.666664 1174 EP
... ... ... ...
22523 TECNO 76.923080 587 TECNO-i5
22524 STYLO 83.333336 7272 STYLOF1
22525 GarantiaMOVIL 52.631580 16788 armani
22526 Cherry_Mobile 72.000000 3510 Cherry_Comet
22527 SANSUI 53.333332 3465 ASUS_P00I
Note:
The 'scorer' parameter in the fuzzy_compare function and d_fuzzy2cols method accepts a scoring function from the RapidFuzz library
(e.g., fuzz.WRatio, fuzz.QRatio, etc.). If no scorer is specified, the default scorer used is fuzz.QRatio.
For more information on the RapidFuzz library, visit https://github.com/maxbachmann/rapidfuzz.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
fuzzypandaswuzzy-0.10.tar.gz
(21.8 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file fuzzypandaswuzzy-0.10.tar.gz.
File metadata
- Download URL: fuzzypandaswuzzy-0.10.tar.gz
- Upload date:
- Size: 21.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8515cb4b94a9b3db21ad0daa5b3790d19513abdf6151ecb70058ee24be782e1e
|
|
| MD5 |
59a657176bfb5b926959c9913abfe86f
|
|
| BLAKE2b-256 |
f64390648bb007b6e80d153d0aa6ad5dd59ef69489df695d4519e540a41190b7
|
File details
Details for the file fuzzypandaswuzzy-0.10-py3-none-any.whl.
File metadata
- Download URL: fuzzypandaswuzzy-0.10-py3-none-any.whl
- Upload date:
- Size: 22.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2b32b821a4b715d55b8a0f43fc4b290cfb109a379d2aff38d62bb7c7de59b6c4
|
|
| MD5 |
8fe40de3bdb1bd036b605536f7d83ab6
|
|
| BLAKE2b-256 |
ff0b14af787ab9f24cb1fe9912eb0e2d39fa908da779459b5e033ac981ad264b
|