Tabular data comparison wrapper for DataComPy

These details have not been verified by PyPI

Project links

Project description

TabularCompare

Code style: black

Tabular data comparison wrapper for DataComPy.

Quick Install

pip install tabularcompare

Basic Usage

Comparison object

import pandas as pd
from tabularcompare import Comparison

df1 = pd.DataFrame(
    {
        "idx1": ["A", "B", "B", "C"],
        "idx2": ["01", "01", "02", "03"],
        "colA": ["AA", "BA", "BB", "CA"],
        "colB": [100, 200, 200, 300]
     }
)
df2 = pd.DataFrame(
    {
        "idx1": ["A", "B", "C"],
        "idx2": ["01", "01", "03"],
        "colA": ["AA", "XA", "CA"],
        "colB": [101, 200, 300],
        "colC": ["foo", "bar", "baz"]
     }
)

comparison = Comparison(
    df1, df2, join_columns=["idx1", "idx2"]
)

Added Functionalities

Diverging Subset

This method introduces an enhanced look at the changes identified at the intersection of compared DataFrames, following the notation {df1} --> {df2}.

comparison.diverging_subset()

	idx1	idx2	colA	colB
0	A	01	NaN	{100} --> {101}
1	B	01	{BA} --> {XA}	NaN

Rows that are unique to either DataFrame can be called via .df1_unq_rows() and .df2_unq_rows() methods.

comparison.df1_unq_rows()

	idx1	idx2	colA	colB
2	B	02	BB	200

Columns that are unique to either DataFrame can be called via .df1_unq_columns() and .df2_unq_columns() methods.

comparison.df2_unq_columns()

	idx1	idx2	colC
0	A	01	foo
1	B	01	bar
2	C	03	baz

Enhanced Reporting

The report functionality is now callable from the comparison object. It includes a .txt, .html, and .xlsx version.

comparison.report_to_txt("./results/Report.txt")
comparison.report_to_html("./results/Report.html")
comparison.report_to_xlsx("./results/Report.xlsx", write_originals=True)

The Excel report will output complete comparison results, with tabs dedicated to:

Original dataframes (when write_originals=True).
Columns present only on df1 and/or df2.
Rows present only on df1 and/or df2.
Diverging subset showing all the changes identified from df1 to df2.

The HTML report will also output a rendered table of the diverging subset on top of the file, alongside the native DataComPy summary report.

Fully-fledged Command Line Interface

DataComPy Methods

Most methods native to datacompy.Compare functionality are still present, including;

.report()
.df1_unq_rows() / .df2_unq_rows()
.df1_unq_columns() / .df2_unq_columns()
.intersect_columns()
.intersect_rows()

The native datacompy.Compare method is also callable from the tabularcompare core module:

from tabularcompare import Compare

# datacompy.Compare method
comparison = Compare(
    df1, df2, join_columns=["idx1", "idx2"]
)
print(comparison.report())

For a complete documentation on DataComPy you can head to DataComPy.

CLI

Command Line Interface to output an Excel .xlsx report and, optionally, html and txt summaries.

tabularcompare --help

Usage: tabularcompare [OPTIONS] DF1 DF2

Options:
  -c, --columns, --on TEXT    Comma-separated list of key column(s) to compare
                              df1 against df2. When not provided, df1 and df2
                              will be matched on index.
  -ic, --ignore_columns TEXT  Comma-separated list of column(s) to ignore from
                              df1 and df2.
  -n1, --df1_name TEXT        Alias for Data frame 1. Default = df1
  -n2, --df2_name TEXT        Alias for Data frame 2. Default = df2
  -is, --ignore_spaces        Flag to strip and ignore whitespaces from string
                              columns.
  -ci, --case_insensitive     Flag to compare string columns on a case-
                              insensitive manner.
  -cl, --cast_lowercase       Flag to cast column names to lower case before
                              comparison.
  -at, --abs_tol FLOAT        Absolute tolerance between two numeric values.
  -rt, --rel_tol FLOAT        Relative tolerance between two numeric values.
  -txt, --txt                 Flag to output a .txt report with a comparison
                              summary.
  -html, --html               Flag to output an HTML report with a comparison
                              summary.
  -od, --only_deltas          Flag to suppress original dataframes from the
                              output .xlsx report.
  -o, --output, --out PATH    Output location for report files. Defaults to
                              current location.
  -e, --encoding TEXT         Character encoding to read df1 and df2.
  -v, --verbose               Verbosity.
  --help                      Show this message and exit.

Sample usage

The application reads from two file paths input for csv, json, or excel files.

cd ./data/
tabularcompare ./df1.csv ./df2.csv -c 'idx1,idx2' -n1 myTable1 -n2 myTable2 -o ../results/

Caveat

The comparison results will take into account data types across columns. I.E. If we update our sample dataframe df2 to include a missing value on colB, it will now be of dtype object, as oposed to Int64 in df1. This might lead to miss-leading interpretations of the results to users without information on data types.

import pandas as pd
from tabularcompare import Comparison

df1 = pd.DataFrame(
    {
        "idx1": ["A", "B", "B", "C"],
        "idx2": ["01", "01", "02", "03"],
        "colA": ["AA", "BA", "BB", "CA"],
        "colB": [100, 200, 200, 300]
     }
)
df2 = pd.DataFrame(
    {
        "idx1": ["A", "B", "C"],
        "idx2": ["01", "01", "03"],
        "colA": ["AA", "XA", "CA"],
        "colB": [pd.NA, 200, 300],
        "colC": ["foo", "bar", "baz"]
     }
)

comparison = Comparison(
    df1, df2, join_columns=["idx1", "idx2"]
)
comparison.diverging_subset()

Which will return:

	idx1	idx2	colA	colB
0	A	01	NaN	{100} --> {}
1	B	01	{BA} --> {XA}	{200} --> {200}
3	C	03	NaN	{300} --> {300}

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.0.4

Apr 22, 2023

0.0.3

Apr 22, 2023

0.0.2

Apr 20, 2023

0.0.1

Apr 20, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tabularcompare-0.0.4.tar.gz (19.7 kB view details)

Uploaded Apr 22, 2023 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

tabularcompare-0.0.4-py3-none-any.whl (14.8 kB view details)

Uploaded Apr 22, 2023 Python 3

File details

Details for the file tabularcompare-0.0.4.tar.gz.

File metadata

Download URL: tabularcompare-0.0.4.tar.gz
Upload date: Apr 22, 2023
Size: 19.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.9.12

File hashes

Hashes for tabularcompare-0.0.4.tar.gz
Algorithm	Hash digest
SHA256	`b4041418eead523bfe73de75481632332dfda77725b202fec471c01dd5d0db2d`
MD5	`4e849861fc8c2eb4bb5a5755958913b3`
BLAKE2b-256	`cc4ac8570559c5876a122814cad5b1abac315d0d73bc364475cc3953e48d256a`

See more details on using hashes here.

File details

Details for the file tabularcompare-0.0.4-py3-none-any.whl.

File metadata

Download URL: tabularcompare-0.0.4-py3-none-any.whl
Upload date: Apr 22, 2023
Size: 14.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.9.12

File hashes

Hashes for tabularcompare-0.0.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a870403a30b004b8ba4a9cded52d13795798ea2d9c9828b6a46d57022854bb83`
MD5	`b79e7c984ccc06c3f8dbbecb4a8d6ed3`
BLAKE2b-256	`c111cee5fdddaca4405fc04ec18c6a0c24975ee49ab1985cf4eefd9e1dab456d`

See more details on using hashes here.

tabularcompare 0.0.4

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

TabularCompare

Quick Install

Basic Usage

Comparison object

Added Functionalities

Diverging Subset

Enhanced Reporting

Fully-fledged Command Line Interface

DataComPy Methods

CLI

Sample usage

Caveat

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes