Skip to main content

Create data quality rules and apply them to datasets.

Project description

Data Harmonization Package

This package provides functionality for file merging and data harmonization using various algorithms and AI models. It allows you to merge CSV files and harmonize data based on a sample-based approach using GPT-based models.

Installation

You can install the package from PyPI using pip:

pip install data-harmonization-ai

Usage

DataHarmonizer Class

The DataHarmonizer class provides the capability to merge CSV files based on different options. It supports the following merge options:

  • ChatGPT
  • GPT4
  • Fuzzy Wuzzy
  • Rapidfuzz
  • Jaro Winkler
  • JW Layered with ChatGPT
  • JW Layered with GPT4
  • FW Layered with GPT4
  • Recursive Data Harmonization

Example usage:

from utility import DataHarmonizer

Create an instance of DataHarmonizer

key='openai-key' harmonizer = DataHarmonizer(key,'file1.csv', 'file2.csv', 'ChatGPT')

Merge the files based on the specified option

result = harmonizer.merge_files()

print(result)

DataHarmonizationWithSuggestion Class

The DataHarmonizationWithSuggestion class allows you to harmonize data using a sample-based approach.

It takes a sample file and two data files as input.

Example usage: from utility import DataHarmonizationWithSuggestion

Create an instance of DataHarmonizationWithSuggestion

key = 'openai-key' harmonizer = DataHarmonizationWithSuggestion(key, "sample_harmonized_data.csv", "file1.csv", "file2.csv")

Harmonize the data based on the sample

result = harmonizer.harmonize_data()

print(result)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dp-ai-data-harmonization-1.0.2.tar.gz (14.8 kB view details)

Uploaded Source

Built Distribution

dp_ai_data_harmonization-1.0.2-py3-none-any.whl (31.5 kB view details)

Uploaded Python 3

File details

Details for the file dp-ai-data-harmonization-1.0.2.tar.gz.

File metadata

File hashes

Hashes for dp-ai-data-harmonization-1.0.2.tar.gz
Algorithm Hash digest
SHA256 28669adff4a6905bea8a12ae36ef6dd00d62627e760aa66b1f199d1cb38f06aa
MD5 5f91eb458e6cd92c9910949d16e216ec
BLAKE2b-256 3f5677366b69aab72f0cc0c5e3fb1d136d166f40d5e0d3a23696a1814387fe83

See more details on using hashes here.

File details

Details for the file dp_ai_data_harmonization-1.0.2-py3-none-any.whl.

File metadata

File hashes

Hashes for dp_ai_data_harmonization-1.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 694845809eb72fb8a2d08355c092f6305483a17838350dc3173998594d69a9a4
MD5 bb316dc5b8d5f2f0da36520d3b75dfd7
BLAKE2b-256 a4f51e23dbaffcc5542df96d4cd61f98060f5884e4e3eadb0abc7b5f368d6b78

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page