Skip to main content

CSV Comparison on steroids

Project description

comparesv

Python CSV Comparison on steriods

Installation

pip install comparesv

Usage

comparesv [-h] [-v] [--enc1 ENCODING] [--enc2 ENCODING] [-i]
              [-rm ROW_MATCH] [-cm COLUMN_MATCH] [-sm STRING_MATCH] [-ir]
              [-ic] [-is] [-s]
              [FILE1] [FILE2]

CSV files comparison

positional arguments:
  FILE1                 the first CSV file
  FILE2                 the second CSV file

optional arguments:
  -h, --help            show this help message and exit
  -v, --version         show program's version number and exit
  --enc1 ENCODING       encoding of the first file (default is to autodetect)
  --enc2 ENCODING       encoding of the second file (default is to autodetect)
  -i, --ignore-case     ignore case (default is case-sensitive)
  -rm ROW_MATCH, --row-match ROW_MATCH
                        Logic to be used to identify the rows. Possible
                        options 'order', 'fuzzy', 'deep' (default is order)
  -cm COLUMN_MATCH, --column-match COLUMN_MATCH
                        Logic to be used to identify the columns. Possible
                        options 'exact','fuzzy' (default is exact)
  -sm STRING_MATCH, --string-match STRING_MATCH
                        Logic to be used to identify the columns. Possible
                        options 'exact','fuzzy' (default is exact)
  -ir, --include-addnl-rows
                        Include added additional added rows from second file
                        (default is false)
  -ic, --include-addnl-columns
                        Include added additional columns from second file
                        (default is false)
  -is, --include-stats  Include stats (default is false)
  -s, --save-output     Save output to file

Examples

Scenario 1: Simple direct comparison

id first last age
432 Roy Aguilar 46
914 Janie Bowman 24
021 Grace Copeland 53
708 Louise Franklin 25
850 Gertrude Carr 60

vs

id first last age
432 Roy Aguilar 46
914 Janie Bowman 24
021 Grace Copeland 53
708 Louise Franklin 25
850 Gertrude Carr 60
comparesv file1 file2

Will provide:

S.No id first last age
1 True True True True
2 True True True True
3 True True True True
4 True True True True
5 True True True True

and

S.No id first last age
1 [432]:[432] [Roy]:[Roy] [Aguilar]:[Aguilar] [46]:[46]
2 [914]:[914] [Janie]:[Janie] [Bowman]:[Bowman] [24]:[24]
3 [021]:[021] [Grace]:[Grace] [Copeland]:[Copeland] [53]:[53]
4 [708]:[708] [Louise]:[Louise] [Franklin]:[Franklin] [25]:[25]
5 [850]:[850] [Gertrude]:[Gertrude] [Carr]:[Carr] [60]:[60]

Scenario 2: Fuzzy column names

id first last age of student
432 Roy Aguilar 46
914 Janie Bowman 24

and

id first last age
432 Roy Aguilar 46
914 Janie Bowman 24
comparesv file1.csv file2.csv --column-match 'fuzzy'

will provide

S.No id first last age
1 True True True True
2 True True True True

Scenario 3: Fuzzy row order - Differnt ordered textual data

id first last age
432 Roy Aguilar 46
914 Janie Bowman 24
021 Grace Copeland 53

and

id first last age of student
021 Grace Copeland 53
432 Roy Aguilar 46
914 Janie Bowman 24
comparesv file1.csv file2.csv --column-match 'fuzzy' --row-match 'fuzzy'

will provide

S.No id first last age
1 True True True True
2 True True True True
3 True True True True

Scenario 3: Deep row order - Different ordered numerical data

year1 year2 year3 year
751 609 590 930
417 501 441 763
691 621 941 563
179 781 335 225
961 530 433 571

and

year1 year2 year3 year
961 530 433 571
751 609 590 930
691 621 941 563
179 781 335 225
417 501 441 763
comparesv file1.csv file2.csv --row-match 'deep'
S.No year1 year2 year3 year
1 True True True True
2 True True True True
3 True True True True
4 True True True True
5 True True True True

Scenario n: Unlimited options. Please explore the options below


Description

The first file is considered as the source file. It will be compared against the second file. Refer the below options to finetune the way it works.

Row Match (-rm)

This will define the way how the rows between the files will be identified for comparison

order - This is the default option, This will compare the rows by their position between the files. This can be used if the records in both the files are in same order

fuzzy - This will use fuzzy logic to identify the matching row on second file. This can be used if the records are not in order and most of the data are text.

deep - This will use fuzzy logic to identify the matching row on second file. This can be used if the records are not in order and it has numeric data. This will look for each row in file1 against all the rows in file2 to find a potential match

Column Match (-rm)

This will define the way how the columns between the files will be identified for comparison

exact - This is the default option, This will compare the columns between the files by their headers for an exact match and select it for comparison. eg. 'Age' and 'Age' columns across the files will be selected for comparison.

fuzzy - This will use fuzzy logic to identify the matching column on second file. This can be used if the column headers across the files are not exactly same by somehow closer. eg. 'age' and 'age of student' columns may be selected for comparison.

String Match (-sm)

This will define the way how the textual data is compared.

exact - This is the default option, This will compare the exact text.

fuzzy - This will use fuzzy logic to find if the texts are closer to each other and identifies the match.

Include Additional Rows (-ir)

If the second file contains more rows than the first file, this option will enable the comparison output to include the remaining rows (uncompared ones).

Include Additional Columns (-ic)

If the second file contains more columns than the first file, this option will enable the comparison output to include the remaining columms.

Ignore case (-i)

This option will ignore the case while comparing the strings.

Include Stats (-is)

This option is enabled by default and it outputs the comparison stats (in percentage) on the console

Save Output (-s)

This option will save the result & values comparison in the current directory. This is enabled by default.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

comparesv-0.15.tar.gz (8.9 kB view details)

Uploaded Source

Built Distributions

comparesv-0.15-py3.7.egg (18.2 kB view details)

Uploaded Source

comparesv-0.15-py2.py3-none-any.whl (10.0 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file comparesv-0.15.tar.gz.

File metadata

  • Download URL: comparesv-0.15.tar.gz
  • Upload date:
  • Size: 8.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/47.3.1 requests-toolbelt/0.9.1 tqdm/4.18.0 CPython/3.7.7

File hashes

Hashes for comparesv-0.15.tar.gz
Algorithm Hash digest
SHA256 e7d7027ef750585d33a04087cbce878dd2749ca75552cc4dbca63113abbe91d0
MD5 6567c6b6c2f724fca2f093b8b2aa2c94
BLAKE2b-256 e233f2daacbf647a6dad4d00dbbbcac20ef53ef8f60df760af1884cf5cd9b6f7

See more details on using hashes here.

File details

Details for the file comparesv-0.15-py3.7.egg.

File metadata

  • Download URL: comparesv-0.15-py3.7.egg
  • Upload date:
  • Size: 18.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/47.3.1 requests-toolbelt/0.9.1 tqdm/4.18.0 CPython/3.7.7

File hashes

Hashes for comparesv-0.15-py3.7.egg
Algorithm Hash digest
SHA256 e3cb18520c6d6f15503b07d25106271ecb7d332e0730c9861c4d1f3851282beb
MD5 d3044423f56e43c2830509fb60a410f9
BLAKE2b-256 7516215aec5efeb9739000a92b4fbb2e1c9c929fb5c7ae26de1ba2c4af3e8003

See more details on using hashes here.

File details

Details for the file comparesv-0.15-py2.py3-none-any.whl.

File metadata

  • Download URL: comparesv-0.15-py2.py3-none-any.whl
  • Upload date:
  • Size: 10.0 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/47.3.1 requests-toolbelt/0.9.1 tqdm/4.18.0 CPython/3.7.7

File hashes

Hashes for comparesv-0.15-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 b9685bd902ca79f41d464715c4d4bd684c6c6df34030ed2391aeb113e8e55f69
MD5 a374da5083d5664b97ab402fe03845fe
BLAKE2b-256 b03f9a0046f325ed8990ad8f5eebd1c8b5765f476e143e3f300b2f5405eb9ceb

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page