Identify corresponding fields between datasets
Project description
field-match
Overview
field-match is a Python library designed to analyze and compare fields between two datasets based on similarity scores, and match up the field names.
Installation
To install the package, run:
pip install field-match
Usage
Import and use the package as follows:
import pandas as pd
from field_match import field_similarity_report, generate_column_rename
# Load your datasets
df1 = pd.read_csv('dataset1.csv')
df2 = pd.read_csv('dataset2.csv')
# Use the field_similarity_report function to match up fields by similarity score
results = field_similarity_report(df1, df2)
# Generate the column rename code snippet
rename_snippet = generate_column_rename(results)
For more detailed examples, please refer to the examples
folder in this repository.
Example Applications
-
Clarifying Ambiguous Dataset Fields: When merging or integrating a new or updated dataset (such as data from a another year or source) with an existing dataset or workflow, uncertainty in how fields correspond to each may exist because of a lack of headers or differing field names. field_match can help identify which fields in the new dataset correspond to expected fields.
-
Integrating External Dataset with Existing Model: When feeding an external dataset into a pre-existing model, it's crucial to ensure that the data aligns correctly with the model's expected input format. field_match can help you identify which fields in your new dataset correspond to the fields your model expects.
License
field-match is released under the MIT License.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file field_match-0.1.0.tar.gz
.
File metadata
- Download URL: field_match-0.1.0.tar.gz
- Upload date:
- Size: 10.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7da4985d834954041366dde144bd83ecf931f2656b8070d4a0f6a6c4a87bf8d0 |
|
MD5 | 6cffe487aab55cf1bc54033cc82f332c |
|
BLAKE2b-256 | 1feb37ca545cb9a1e646f3abb870fe4171c654ca16367fa7a168a007408a3df6 |
File details
Details for the file field_match-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: field_match-0.1.0-py3-none-any.whl
- Upload date:
- Size: 5.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6c5e065e9d399a8010837abb85e888a2ad3d45494f886428e17e080daef68489 |
|
MD5 | 206ac24e503f79c96fe720e8fcaa5ac5 |
|
BLAKE2b-256 | 1755208f514143723821fba61c0b3c71270bffd512c05f18532334406cf06e6b |