Skip to main content

Identify corresponding fields between datasets

Project description

field-match

Overview

field-match is a Python library designed to analyze and compare fields between two datasets based on similarity scores, and match up the field names.

Installation

To install the package, run:

pip install field-match

Usage

Import and use the package as follows:

import pandas as pd
from field_match import field_similarity_report, generate_column_rename

# Load your datasets
df1 = pd.read_csv('dataset1.csv')
df2 = pd.read_csv('dataset2.csv')

# Use the field_similarity_report function to match up fields by similarity score
results = field_similarity_report(df1, df2)
                   
# Generate the column rename code snippet
rename_snippet = generate_column_rename(results)                      

For more detailed examples, please refer to the examples folder in this repository.

Example Applications

  1. Clarifying Ambiguous Dataset Fields: When merging or integrating a new or updated dataset (such as data from a another year or source) with an existing dataset or workflow, uncertainty in how fields correspond to each may exist because of a lack of headers or differing field names. field_match can help identify which fields in the new dataset correspond to expected fields.

  2. Integrating External Dataset with Existing Model: When feeding an external dataset into a pre-existing model, it's crucial to ensure that the data aligns correctly with the model's expected input format. field_match can help you identify which fields in your new dataset correspond to the fields your model expects.

License

field-match is released under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

field_match-0.1.0.tar.gz (10.4 kB view details)

Uploaded Source

Built Distribution

field_match-0.1.0-py3-none-any.whl (5.9 kB view details)

Uploaded Python 3

File details

Details for the file field_match-0.1.0.tar.gz.

File metadata

  • Download URL: field_match-0.1.0.tar.gz
  • Upload date:
  • Size: 10.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.1

File hashes

Hashes for field_match-0.1.0.tar.gz
Algorithm Hash digest
SHA256 7da4985d834954041366dde144bd83ecf931f2656b8070d4a0f6a6c4a87bf8d0
MD5 6cffe487aab55cf1bc54033cc82f332c
BLAKE2b-256 1feb37ca545cb9a1e646f3abb870fe4171c654ca16367fa7a168a007408a3df6

See more details on using hashes here.

File details

Details for the file field_match-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: field_match-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 5.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.1

File hashes

Hashes for field_match-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6c5e065e9d399a8010837abb85e888a2ad3d45494f886428e17e080daef68489
MD5 206ac24e503f79c96fe720e8fcaa5ac5
BLAKE2b-256 1755208f514143723821fba61c0b3c71270bffd512c05f18532334406cf06e6b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page