Skip to main content

Fuzzy Comparison Utilities for DataFrame Columns

Project description

Fuzzy Comparison Utilities for DataFrame Columns

pip install fuzzypandaswuzzy

Tested against Windows 10 / Python 3.10 / Anaconda

This module provides a function to perform fuzzy comparison between two columns of a DataFrame using the RapidFuzz library.
It also extends the DataFrame class to add a method for fuzzy comparison between two columns.

Module dependencies:
	- pandas (pd)
	- numpy (np)
	- RapidFuzz (from rapidfuzz import process, fuzz)

Usage:
	import pandas as pd
	from rapidfuzz import fuzz
	from fuzzypandaswuzzy import pd_add_fuzzy_all
	pd_add_fuzzy_all()

	df = pd.read_csv(r"arcore_devicelist.csv")
	df2 = df.d_fuzzy2cols(scorer=fuzz.QRatio) # compares the first 2 columns

			   aa_value1   aa_match  aa_index_v2     aa_value2
	0            Mobicel  82.352943         1978    Mobicel_R1
	1            Hyundai  66.666664         5425         Cunda
	2               OPPO  66.666664        10102         P7PRO
	3            samsung  80.000000          745      samseong
	4               DEXP  66.666664         1174            EP
				  ...        ...          ...           ...
	22523          TECNO  76.923080          587      TECNO-i5
	22524          STYLO  83.333336         7272       STYLOF1
	22525  GarantiaMOVIL  52.631580        16788        armani
	22526  Cherry_Mobile  72.000000         3510  Cherry_Comet
	22527         SANSUI  53.333332         3465     ASUS_P00I


Note:
	The 'scorer' parameter in the fuzzy_compare function and d_fuzzy2cols method accepts a scoring function from the RapidFuzz library
	(e.g., fuzz.WRatio, fuzz.QRatio, etc.). If no scorer is specified, the default scorer used is fuzz.QRatio.

	For more information on the RapidFuzz library, visit https://github.com/maxbachmann/rapidfuzz.	

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fuzzypandaswuzzy-0.10.tar.gz (21.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fuzzypandaswuzzy-0.10-py3-none-any.whl (22.6 kB view details)

Uploaded Python 3

File details

Details for the file fuzzypandaswuzzy-0.10.tar.gz.

File metadata

  • Download URL: fuzzypandaswuzzy-0.10.tar.gz
  • Upload date:
  • Size: 21.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.10

File hashes

Hashes for fuzzypandaswuzzy-0.10.tar.gz
Algorithm Hash digest
SHA256 8515cb4b94a9b3db21ad0daa5b3790d19513abdf6151ecb70058ee24be782e1e
MD5 59a657176bfb5b926959c9913abfe86f
BLAKE2b-256 f64390648bb007b6e80d153d0aa6ad5dd59ef69489df695d4519e540a41190b7

See more details on using hashes here.

File details

Details for the file fuzzypandaswuzzy-0.10-py3-none-any.whl.

File metadata

File hashes

Hashes for fuzzypandaswuzzy-0.10-py3-none-any.whl
Algorithm Hash digest
SHA256 2b32b821a4b715d55b8a0f43fc4b290cfb109a379d2aff38d62bb7c7de59b6c4
MD5 8fe40de3bdb1bd036b605536f7d83ab6
BLAKE2b-256 ff0b14af787ab9f24cb1fe9912eb0e2d39fa908da779459b5e033ac981ad264b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page