Merge two pandas DataFrames using closest match
## merge_closest ### Merge column(s) into DataFrame using closest-without-going-over lookup.
The merge_closest function mimics Excel’s VLOOKUP function in approximate match (range lookup) mode, with added benefits of ensuring the lookup table is sorted and merging any subset of columns from the lookup table. It’s similar to a left join based on the lookup_field value that is closest to the data_field value without going over. Only the first matching row from lookup is merged; if lookup_field contains duplicates, it’s up to the user to drop duplicates in advance as appropriate. For any row in data whose data_field value is less than all values in lookup_field, lookup values in result will be missing.
By default, all columns in lookup other than lookup_field will be merged with data. A specific list of columns from lookup to include or exclude can be passed. If both include_cols and exclude_cols are provided, exclude_cols is ignored. To include lookup_field in result, lookup_field must be passed in include_cols, perhaps as include_cols=lookup.columns.
### This is my first Python package and first GitHub repo…
…and any comments or suggestions are most welcome.
from merge_closest import merge_closest
- merge_closest(data, lookup, data_field, lookup_field,
- include_cols=None, exclude_cols=None, presorted=False)
`data` : pandas DataFrame
`lookup` : pandas DataFrame
`data_field` : name of column in data
`lookup_field` : name of column in lookup
`result` : pandas DataFrame