Skip to main content

High-performance data diffing for pandas/pyarrow DataFrames

Project description

DataForge Diff

High-performance data diffing for Python.

Installation

pip install dataforge-diff

Usage

from dataforge_diff import diff
import pandas as pd

df_a = pd.DataFrame({"id": [1, 2, 3], "name": ["a", "b", "c"]})
df_b = pd.DataFrame({"id": [1, 2, 4], "name": ["a", "b", "changed"]})

result = diff(df_a, df_b, "id")
print(f"Modified: {result['modified_count']}")

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dataforge_diff-0.1.0-cp39-cp39-macosx_11_0_arm64.whl (598.3 kB view details)

Uploaded CPython 3.9macOS 11.0+ ARM64

File details

Details for the file dataforge_diff-0.1.0-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for dataforge_diff-0.1.0-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 2bb7a2ed95aac015ede9f01bc35971824bc2388e4e39b3909810dfc6e1bdfbcb
MD5 b336f240a3b9ea76a76ca474f39f3a02
BLAKE2b-256 baac33ca7abfbb8c47019e631139f5d499c36cc7262098c7fccfb33aa37684b2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page