csvdiff

Generate a diff between two CSV files.

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 2 - Pre-Alpha
Intended Audience
- Developers
License
- OSI Approved :: BSD License
Natural Language
- English
Programming Language

Project description

https://travis-ci.org/larsyencken/csvdiff.png?branch=master

Overview

Generate a diff between two CSV files on the command-line.

csvdiff allows you to compare the semantic contents of two CSV files, ignoring things like row and column ordering in order to get to what’s actually changed. This is useful if you’re comparing the output of an automatic system from one day to the next, so that you can look at just what’s changed.

It’s also useful for maintaining patches to third-party data. Diffs generated by csvdiff are a subset of JSON and can be stored and applied using the matching csvpatch command. If upstream data changes, you can fetch the new version and re-apply your changes to it easily.

Installing

You’ll firstly need Python and pip. Then run:

pip install csvdiff

Examples

For example, suppose we have a.csv:

id,name,amount
1,bob,20
2,eva,63
3,sarah,7
4,jeff,19
6,fred,10

After some changes and corrections to the data, we now have b.csv:

id,name,amount
1,bob,23       <--- changed
3,sarah,7
4,jeff,19
5,mira,81      <--- added
6,fred,13      <--- changed

Now we can ask for a summary of differences:

$ csvdiff --style=summary id a.csv b.csv
1 rows removed (20.0%)
1 rows added (20.0%)
2 rows changed (40.0%)

Or look at the full diff pretty printed, to make it more readable:

$ csvdiff --style=pretty --output=diff.json id a.csv b.csv
$ cat diff.json
{
  "_index": [
    "id"
  ],
  "added": [
    {
      "amount": "81",
      "id": "5",
      "name": "mira"
    }
  ],
  "changed": [
    {
      "fields": {
        "amount": {
          "from": "20",
          "to": "23"
        }
      },
      "key": [
        "1"
      ]
    },
    {
      "fields": {
        "amount": {
          "from": "10",
          "to": "13"
        }
      },
      "key": [
        "6"
      ]
    }
  ],
  "removed": [
    {
      "amount": "63",
      "id": "2",
      "name": "eva"
    }
  ]
}

If you want to ignore a column from the comparison then you can do so by specifying a comma seperated list of column names to ignore. For example:

$ csvdiff --style=summary --ignore-columns=amount id a.csv b.csv
1 rows removed (20.0%)
1 rows added (20.0%)
0 rows changed (0%)

You can also choose to compare numeric fields only up to a certain number of significant figures. Use negative significant figures for orders of magnitude:

$ csvdiff --style=summary id a.csv c.csv
0 rows removed (0.0%)
0 rows added (0.0%)
2 rows changed (40.0%)
$ csvdiff --style=summary id --significance=-1 a.csv c.csv
files are identical

Diffs generated this way contain all the data that’s changed, and can be reapplied later if the original data changes. For example, suppose more data gets added to a.csv, giving us a-plus.csv:

id,name,amount
1,bob,20
2,eva,63
3,sarah,7
4,jeff,19
6,fred,10
8,henry,9

We can reapply our changes with the csvpatch command:

$ csvpatch --input=diff.json --output=b-plus.csv a-plus.csv
$ cat b-plus.csv
id,name,amount
1,bob,23
3,sarah,7
4,jeff,19
5,mira,81
6,fred,13
8,henry,9

This can be useful if you’re using csvdiff to transform data that’s outside your control. In this case, you maintain the patch file and simply reapply it when the upstream data provider gives you a fresh file.

For more usage options, run csvdiff --help or csvpatch --help.

License

BSD license

History

0.3.3 (2017-07-20)

Add the –significance option to limit to significant figures.

0.3.2 (2017-07-20)

Add the –sep option for different delimiters.
Fix a bug when a patched document becomes empty (#29).

0.3.1 (2016-04-20)

Fix a bug in summary mode.
Check for rows bleeding into one another.

0.3.0 (2015-01-07)

Standardise patch format with a JSON schema.
Provide a matching csvpatch command applying diffs.
Add a man page and docs for csvpatch.
Use exit codes to indicate difference.
Add a –quiet option to csvdiff.

0.2.0 (2014-12-30)

Uses click for the command-line interface.
Drop YAML support in favour of pretty-printed JSON.
Uses –style option to change output style.
Provides a full man page.

0.1.0 (2014-03-15)

First release on PyPI.
Generates a JSON or YAML difference between two CSV files
Specify multiple key components with -k
Can provide a difference summary
Assumes files use standard comma-separation, double-quoting and a header row with field names

Project details

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 2 - Pre-Alpha
Intended Audience
- Developers
License
- OSI Approved :: BSD License
Natural Language
- English
Programming Language

Release history Release notifications | RSS feed

This version

0.3.3

Jul 20, 2017

0.3.2

Jul 20, 2017

0.3.1

Apr 20, 2016

0.3.0

Jan 7, 2015

0.2.0

Dec 30, 2014

0.1.0

Mar 14, 2014

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

csvdiff-0.3.3.tar.gz (27.2 kB view details)

Uploaded Jul 20, 2017 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

csvdiff-0.3.3-py2.py3-none-any.whl (12.9 kB view details)

Uploaded Jul 20, 2017 Python 2Python 3

File details

Details for the file csvdiff-0.3.3.tar.gz.

File metadata

Download URL: csvdiff-0.3.3.tar.gz
Upload date: Jul 20, 2017
Size: 27.2 kB
Tags: Source
Uploaded using Trusted Publishing? No

File hashes

Hashes for csvdiff-0.3.3.tar.gz
Algorithm	Hash digest
SHA256	`7fd35e4bcc437b71281ed443990ca6909fb36bda5fcea035e50bff62ba15e06d`
MD5	`197580c7a04a58cb1d34ea80bca9ad3c`
BLAKE2b-256	`b3c6b9b476eaa841cbb49deb51038ac8981be226f09a6a03ab9feeab2a513d66`

See more details on using hashes here.

File details

Details for the file csvdiff-0.3.3-py2.py3-none-any.whl.

File metadata

Download URL: csvdiff-0.3.3-py2.py3-none-any.whl
Upload date: Jul 20, 2017
Size: 12.9 kB
Tags: Python 2, Python 3
Uploaded using Trusted Publishing? No

File hashes

Hashes for csvdiff-0.3.3-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`773fffdf8fd10bea0f99a3cba0b2471e7ae6f8594186e611bbf7bdc1cc747c02`
MD5	`8d91c4e60134ea818c7eefb718238fb6`
BLAKE2b-256	`8805c31aa05264a0d825ae243d2aa44d73da7d17572923bc573d1447b992de2c`

See more details on using hashes here.

csvdiff 0.3.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Overview

Installing

Examples

License

History

0.3.3 (2017-07-20)

0.3.2 (2017-07-20)

0.3.1 (2016-04-20)

0.3.0 (2015-01-07)

0.2.0 (2014-12-30)

0.1.0 (2014-03-15)

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes