Identify corresponding fields between datasets
Project description
field-match
Overview
field-match is a Python library designed to analyze and compare fields between two datasets based on similarity scores, and match up the field names.
Installation
To install the package, run:
pip install field-match
Usage
Import and use the package as follows:
import pandas as pd
from field_match import field_similarity_report, generate_column_rename
# Load your datasets
df1 = pd.read_csv('dataset1.csv')
df2 = pd.read_csv('dataset2.csv')
# Use the field_similarity_report function to match up fields by similarity score
results = field_similarity_report(df1, df2)
# Generate the column rename code snippet
rename_snippet = generate_column_rename(results)
For more detailed examples, please refer to the examples
folder in this repository.
Example Applications
-
Clarifying Ambiguous Dataset Fields: When merging or integrating a new or updated dataset (such as data from a another year or source) with an existing dataset or workflow, uncertainty in how fields correspond to each may exist because of a lack of headers or differing field names. field_match can help identify which fields in the new dataset correspond to expected fields.
-
Integrating External Dataset with Existing Model: When feeding an external dataset into a pre-existing model, it's crucial to ensure that the data aligns correctly with the model's expected input format. field_match can help you identify which fields in your new dataset correspond to the fields your model expects.
License
field-match is released under the MIT License.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for field_match-0.1.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6c5e065e9d399a8010837abb85e888a2ad3d45494f886428e17e080daef68489 |
|
MD5 | 206ac24e503f79c96fe720e8fcaa5ac5 |
|
BLAKE2b-256 | 1755208f514143723821fba61c0b3c71270bffd512c05f18532334406cf06e6b |