Merges two DataFrames using fuzzy matching on specified columns
Project description
Merges two DataFrames using fuzzy matching on specified columns
Tested against Windows / Python 3.11 / Anaconda
pip install a-pandas-ex-fuzzymerge
This function performs a fuzzy matching between two DataFrames `df1` and `df2`
based on the columns specified in `right_on` and `left_on`. Fuzzy matching allows
you to find similar values between these columns, making it useful for matching
data with small variations, such as typos or abbreviations.
Parameters:
df1 (DataFrame): The first DataFrame to be merged.
df2 (DataFrame): The second DataFrame to be merged.
right_on (str): The column name in `df2` to be used for matching.
left_on (str): The column name in `df1` to be used for matching.
usedtype (numpy.dtype, optional): The data type to use for the distance matrix.
Defaults to `np.uint8`.
scorer (function, optional): The scoring function to use for fuzzy matching.
Defaults to `fuzz.WRatio`.
concat_value (bool, optional): Whether to add a 'concat_value' column in the result DataFrame,
containing the similarity scores. Defaults to `True`.
**kwargs: Additional keyword arguments to pass to the `pandas.merge` function.
Returns:
DataFrame: A merged DataFrame with rows that matched based on the specified fuzzy criteria.
Example:
from a_pandas_ex_fuzzymerge import pd_add_fuzzymerge
import pandas as pd
import numpy as np
from rapidfuzz import fuzz
pd_add_fuzzymerge()
df1 = pd.read_csv(
"https://raw.githubusercontent.com/pandas-dev/pandas/main/doc/data/titanic.csv"
)
df2 = df1.copy()
df2 = pd.concat([df2 for x in range(3)], ignore_index=True)
df2.Name = (df2.Name + np.random.uniform(1, 2000, len(df2)).astype("U"))
df1 = pd.concat([df1 for x in range(3)], ignore_index=True)
df1.Name = (df1.Name + np.random.uniform(1, 2000, len(df1)).astype("U"))
df3 = df1.d_fuzzy_merge(df2, right_on='Name', left_on='Name', usedtype=np.uint8, scorer=fuzz.partial_ratio,
concat_value=True)
print(df3)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Close
Hashes for a_pandas_ex_fuzzymerge-0.10.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 757b1d8511570adc1be41c3732f9b93e895e318de93a2af6c12c9d148d791a16 |
|
MD5 | 3cad1120edd1697734a8e912561f7b1c |
|
BLAKE2b-256 | 6aa46a4f9217e0a30abfb127478c19539b66cff1d4aea6d7170323035ae59be0 |
Close
Hashes for a_pandas_ex_fuzzymerge-0.10-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5701d08ce76cc3a0668f9e1c3c622a2def97671e8dd5cd9df38dce3ecbe10601 |
|
MD5 | 144bc03787efce8f448807d70b2d6d1f |
|
BLAKE2b-256 | 7e4a42e8db0a2db08ab7751bf2ae8eed8f3dee494267572b959d22f5f1ad1e96 |