Skip to main content

Compare two dataframes and return column-wise difference and additional record

Project description

dataframe_diff

dataframe_diff is a micro-library which takes two dataframes as input , compares them and return two dataframes with column wise comparison and additional records.

Installation

pip install dataframe-diff

Examples

>>> import pandas as pd
>>> df1=pd.read_csv('students_1.csv')
>>> df2=pd.read_csv('students_2.csv')
>>> from dataframe_diff import dataframe_diff
>>> df1.head()
      Name Subjects  Marks Grade
0  Leonard      Eng     70     B
1  Leonard     Math     80     B
2  Leonard  Physics     90     A
3  Sheldon      Eng     90     A
4  Sheldon     Math     99     A
>>> df2.head()
      Name Subjects  Marks Grade
0  Leonard      Eng     75     A
1  Leonard     Math     85     A
2  Leonard  Physics     90     A
3  Sheldon      Eng     99     A
4  Sheldon     Math     99     A
>>> d1_column,d2_additional=dataframe_diff(df1, df2, key=['Name','Subjects'])
>>> d1_column
      Name Subjects value_x value_y column_name
0  Leonard      Eng      70      75       Marks
1  Leonard      Eng       B       A       Grade
2  Leonard     Math      80      85       Marks
3  Leonard     Math       B       A       Grade
4  Sheldon      Eng      90      99       Marks
5    Penny  Physics      65      75       Marks
6    Penny  Physics       C       B       Grade
>>> d2_additional
     Name   Subjects  Marks Grade  sets
0  Rajesh       Math     93     A  df_x
1  Howard  Chemistry     83     B  df_y

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dataframe_diff-0.5.tar.gz (2.8 kB view details)

Uploaded Source

File details

Details for the file dataframe_diff-0.5.tar.gz.

File metadata

  • Download URL: dataframe_diff-0.5.tar.gz
  • Upload date:
  • Size: 2.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.1.0 requests-toolbelt/0.9.1 tqdm/4.41.1 CPython/3.7.5

File hashes

Hashes for dataframe_diff-0.5.tar.gz
Algorithm Hash digest
SHA256 f9c069138a0337d2e16a1c00a5c6f18d1161182d08977e8a07622cf1af685259
MD5 d5a51a37e2ea94db625d477958f23c3a
BLAKE2b-256 059e5c8439ec8aa92ff591a9af5837396eca290529d3a224c6ca9f9437e41ffc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page