Skip to main content

Merges two DataFrames using fuzzy matching on specified columns

Project description

Merges two DataFrames using fuzzy matching on specified columns

Tested against Windows / Python 3.11 / Anaconda

pip install a-pandas-ex-fuzzymerge

This function performs a fuzzy matching between two DataFrames `df1` and `df2`
based on the columns specified in `right_on` and `left_on`. Fuzzy matching allows
you to find similar values between these columns, making it useful for matching
data with small variations, such as typos or abbreviations.

Parameters:
df1 (DataFrame): The first DataFrame to be merged.
df2 (DataFrame): The second DataFrame to be merged.
right_on (str): The column name in `df2` to be used for matching.
left_on (str): The column name in `df1` to be used for matching.
usedtype (numpy.dtype, optional): The data type to use for the distance matrix.
	Defaults to `np.uint8`.
scorer (function, optional): The scoring function to use for fuzzy matching.
	Defaults to `fuzz.WRatio`.
concat_value (bool, optional): Whether to add a 'concat_value' column in the result DataFrame,
	containing the similarity scores. Defaults to `True`.
**kwargs: Additional keyword arguments to pass to the `pandas.merge` function.

Returns:
DataFrame: A merged DataFrame with rows that matched based on the specified fuzzy criteria.

Example:
	from a_pandas_ex_fuzzymerge import pd_add_fuzzymerge
	import pandas as pd
	import numpy as np
	from rapidfuzz import fuzz
	pd_add_fuzzymerge()
	df1 = pd.read_csv(
		"https://raw.githubusercontent.com/pandas-dev/pandas/main/doc/data/titanic.csv"
	)
	df2 = df1.copy()
	df2 = pd.concat([df2 for x in range(3)], ignore_index=True)
	df2.Name = (df2.Name + np.random.uniform(1, 2000, len(df2)).astype("U"))
	df1 = pd.concat([df1 for x in range(3)], ignore_index=True)
	df1.Name = (df1.Name + np.random.uniform(1, 2000, len(df1)).astype("U"))

	df3 = df1.d_fuzzy_merge(df2, right_on='Name', left_on='Name', usedtype=np.uint8, scorer=fuzz.partial_ratio,
							concat_value=True)
	print(df3)

Project details


Release history Release notifications | RSS feed

This version

0.10

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

a_pandas_ex_fuzzymerge-0.10.tar.gz (22.3 kB view details)

Uploaded Source

Built Distribution

a_pandas_ex_fuzzymerge-0.10-py3-none-any.whl (23.5 kB view details)

Uploaded Python 3

File details

Details for the file a_pandas_ex_fuzzymerge-0.10.tar.gz.

File metadata

  • Download URL: a_pandas_ex_fuzzymerge-0.10.tar.gz
  • Upload date:
  • Size: 22.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.5

File hashes

Hashes for a_pandas_ex_fuzzymerge-0.10.tar.gz
Algorithm Hash digest
SHA256 757b1d8511570adc1be41c3732f9b93e895e318de93a2af6c12c9d148d791a16
MD5 3cad1120edd1697734a8e912561f7b1c
BLAKE2b-256 6aa46a4f9217e0a30abfb127478c19539b66cff1d4aea6d7170323035ae59be0

See more details on using hashes here.

File details

Details for the file a_pandas_ex_fuzzymerge-0.10-py3-none-any.whl.

File metadata

File hashes

Hashes for a_pandas_ex_fuzzymerge-0.10-py3-none-any.whl
Algorithm Hash digest
SHA256 5701d08ce76cc3a0668f9e1c3c622a2def97671e8dd5cd9df38dce3ecbe10601
MD5 144bc03787efce8f448807d70b2d6d1f
BLAKE2b-256 7e4a42e8db0a2db08ab7751bf2ae8eed8f3dee494267572b959d22f5f1ad1e96

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page