Create data quality rules and apply them to datasets.
Project description
Data Harmonization Package
This package provides functionality for file merging and data harmonization using various algorithms and AI models. It allows you to merge CSV files and harmonize data based on a sample-based approach using GPT-based models.
Installation
You can install the package from PyPI using pip:
pip install data-harmonization-ai
Usage
DataHarmonizer Class
The DataHarmonizer
class provides the capability to merge CSV files based on different options.
It supports the following merge options:
- ChatGPT
- GPT4
- Fuzzy Wuzzy
- Rapidfuzz
- Jaro Winkler
- JW Layered with ChatGPT
- JW Layered with GPT4
- FW Layered with GPT4
- Recursive Data Harmonization
Example usage:
from utility import DataHarmonizer
Create an instance of DataHarmonizer
key='openai-key' harmonizer = DataHarmonizer(key,'file1.csv', 'file2.csv', 'ChatGPT')
Merge the files based on the specified option
result = harmonizer.merge_files()
print(result)
DataHarmonizationWithSuggestion Class
The DataHarmonizationWithSuggestion class allows you to harmonize data using a sample-based approach.
It takes a sample file and two data files as input.
Example usage: from utility import DataHarmonizationWithSuggestion
Create an instance of DataHarmonizationWithSuggestion
key = 'openai-key' harmonizer = DataHarmonizationWithSuggestion(key, "sample_harmonized_data.csv", "file1.csv", "file2.csv")
Harmonize the data based on the sample
result = harmonizer.harmonize_data()
print(result)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for data-harmonization-ai-dp-2.0.0.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | dd8d1c41500c16b7f57350364e66ed3f03c678833f52150860261c3b7dd08f65 |
|
MD5 | fa18d63523779d87a772f3d1503001fd |
|
BLAKE2b-256 | 9583268df0b072c7e4bfb636ec20a0157faf94427524ea551d11b36c4aa3abe6 |
Hashes for data_harmonization_ai_dp-2.0.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2036f5b2e142e96b8674f8ac493076c8a1ab945893673887b7c4508d6acc9ace |
|
MD5 | d9fee21997a1c209e6913bedc71860ce |
|
BLAKE2b-256 | 482f8482b1087e50f0473a16f36db469fd37e0d43b96e1355858be964e8159fe |