Skip to main content

DataFingerprint is a Python package designed to compare two datasets and generate a detailed report highlighting the differences between them. This tool is particularly useful for data validation, quality assurance, and ensuring data consistency across different sources.

Project description

DataFingerprint

DataFingerprint is a Python package designed to compare two datasets and generate a detailed report highlighting the differences between them. This tool is particularly useful for data validation, quality assurance, and ensuring data consistency across different sources.

Features

  • Column Name Differences: Identify columns that are present in one dataset but missing in the other.
  • Column Data Type Differences: Detect discrepancies in data types between corresponding columns in the two datasets.
  • Row Differences: Find rows that are present in one dataset but missing in the other, or rows that have different values in corresponding columns.
  • Paired Row Differences: Compare rows that have the same primary key or unique identifier in both datasets and identify differences in their values.
  • Data Report: Generate a comprehensive report summarizing all the differences found between the two datasets.

Installation

To install DataFingerprint, you can use pip:

pip install data-fingerprint

Usage

Here's a basic example of how to use DataFingerprint to compare two datasets:

import polars as pl

from data_fingerprint.src.utils import get_dataframe
from data_fingerprint.src.models import DataReport

# Create two sample datasets
df1 = pl.DataFrame({
    'id': [1, 2, 3],
    'name': ['Alice', 'Bob', 'Charlie'],
    'age': [25, 30, 35]
})
df2 = pl.DataFrame({
    'id': [1, 2, 4],
    'name': ['Alice', 'Bob', 'David'],
    'age': [25, 30, 40]
})
# Generate a data report comparing the two datasets
report: DataReport = get_data_report(
    df_0, df_1, "df_0", "df_1", pairing_columns=["a"]
)
print(report.model_dump_json(indent=4))
print(get_dataframe(report))

License

This project is licensed under the MIT License. See the LICENSE file for details.

Contributing

Contributions are welcome! Please open an issue or submit a pull request on GitHub.

Contact

For any questions or feedback, please contact [your email].

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

data_fingerprint-0.1.0.tar.gz (22.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

data_fingerprint-0.1.0-py3-none-any.whl (24.4 kB view details)

Uploaded Python 3

File details

Details for the file data_fingerprint-0.1.0.tar.gz.

File metadata

  • Download URL: data_fingerprint-0.1.0.tar.gz
  • Upload date:
  • Size: 22.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.12.3 Linux/6.8.0-56-generic

File hashes

Hashes for data_fingerprint-0.1.0.tar.gz
Algorithm Hash digest
SHA256 c2a99e278ce913d93dd41bf2c76dc12dc99f475b6b2be415e57bb7365351aeb0
MD5 e17f3f3cb71366f6d651ba15ac4a822e
BLAKE2b-256 90516ccd9b30779781e1d935f3be7b95b984982b3509a066562984f9ab6b957d

See more details on using hashes here.

File details

Details for the file data_fingerprint-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: data_fingerprint-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 24.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.12.3 Linux/6.8.0-56-generic

File hashes

Hashes for data_fingerprint-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c2184a936e4e0ed3c107bc77b1c946f62e8d104484516288388676380204a3b3
MD5 01f1c026ce6efd57dd81a43c36e9b415
BLAKE2b-256 5090beb109715ba256eb7b87ccd849b06fc3d0fb1af9cd13dde88c863753bbc8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page