Skip to main content

Create data quality rules and apply them to datasets.

Project description

Data Harmonization Package

This package provides functionality for file merging and data harmonization using various algorithms and AI models. It allows you to merge CSV files and harmonize data based on a sample-based approach using GPT-based models.

Installation

You can install the package from PyPI using pip:

pip install data-harmonization-ai

Usage

DataHarmonizer Class

The DataHarmonizer class provides the capability to merge CSV files based on different options. It supports the following merge options:

  • ChatGPT
  • GPT4
  • Fuzzy Wuzzy
  • Rapidfuzz
  • Jaro Winkler
  • JW Layered with ChatGPT
  • JW Layered with GPT4
  • FW Layered with GPT4
  • Recursive Data Harmonization

Example usage:

from utility import DataHarmonizer

Create an instance of DataHarmonizer

key='openai-key' harmonizer = DataHarmonizer(key,'file1.csv', 'file2.csv', 'ChatGPT')

Merge the files based on the specified option

result = harmonizer.merge_files()

print(result)

DataHarmonizationWithSuggestion Class

The DataHarmonizationWithSuggestion class allows you to harmonize data using a sample-based approach.

It takes a sample file and two data files as input.

Example usage: from utility import DataHarmonizationWithSuggestion

Create an instance of DataHarmonizationWithSuggestion

key = 'openai-key' harmonizer = DataHarmonizationWithSuggestion(key, "sample_harmonized_data.csv", "file1.csv", "file2.csv")

Harmonize the data based on the sample

result = harmonizer.harmonize_data()

print(result)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

data-harmonization-ai-dp-2.0.0.tar.gz (10.6 kB view details)

Uploaded Source

Built Distribution

data_harmonization_ai_dp-2.0.0-py3-none-any.whl (18.6 kB view details)

Uploaded Python 3

File details

Details for the file data-harmonization-ai-dp-2.0.0.tar.gz.

File metadata

File hashes

Hashes for data-harmonization-ai-dp-2.0.0.tar.gz
Algorithm Hash digest
SHA256 dd8d1c41500c16b7f57350364e66ed3f03c678833f52150860261c3b7dd08f65
MD5 fa18d63523779d87a772f3d1503001fd
BLAKE2b-256 9583268df0b072c7e4bfb636ec20a0157faf94427524ea551d11b36c4aa3abe6

See more details on using hashes here.

File details

Details for the file data_harmonization_ai_dp-2.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for data_harmonization_ai_dp-2.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2036f5b2e142e96b8674f8ac493076c8a1ab945893673887b7c4508d6acc9ace
MD5 d9fee21997a1c209e6913bedc71860ce
BLAKE2b-256 482f8482b1087e50f0473a16f36db469fd37e0d43b96e1355858be964e8159fe

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page