This is a pre-production deployment of Warehouse, however changes made here WILL affect the production instance of PyPI.
Latest Version Dependencies status unknown Test status unknown Test coverage unknown
Project Description

Overview

Generate a diff between two CSV files on the command-line.

csvdiff allows you to compare the semantic contents of two CSV files, ignoring things like row and column ordering in order to get to what’s actually changed. This is useful if you’re comparing the output of an automatic system from one day to the next, so that you can look at just what’s changed.

It’s also useful for maintaining patches to third-party data. Diffs generated by csvdiff are a subset of JSON and can be stored and applied using the matching csvpatch command. If upstream data changes, you can fetch the new version and re-apply your changes to it easily.

Installing

You’ll firstly need Python and pip. Then run:

pip install csvdiff

Examples

For example, suppose we have a.csv:

id,name,amount
1,bob,20
2,eva,63
3,sarah,7
4,jeff,19
6,fred,10

After some changes and corrections to the data, we now have b.csv:

id,name,amount
1,bob,23       <--- changed
3,sarah,7
4,jeff,19
5,mira,81      <--- added
6,fred,13      <--- changed

Now we can ask for a summary of differences:

$ csvdiff --style=summary id a.csv b.csv
1 rows removed (20.0%)
1 rows added (20.0%)
2 rows changed (40.0%)

Or look at the full diff pretty printed, to make it more readable:

$ csvdiff --style=pretty --output=diff.json id a.csv b.csv
$ cat diff.json
{
  "_index": [
    "id"
  ],
  "added": [
    {
      "amount": "81",
      "id": "5",
      "name": "mira"
    }
  ],
  "changed": [
    {
      "fields": {
        "amount": {
          "from": "20",
          "to": "23"
        }
      },
      "key": [
        "1"
      ]
    },
    {
      "fields": {
        "amount": {
          "from": "10",
          "to": "13"
        }
      },
      "key": [
        "6"
      ]
    }
  ],
  "removed": [
    {
      "amount": "63",
      "id": "2",
      "name": "eva"
    }
  ]
}

Diffs generated this way contain all the data that’s changed, and can be reapplied later if the original data changes. For example, suppose more data gets added to a.csv, giving us a-plus.csv:

id,name,amount
1,bob,20
2,eva,63
3,sarah,7
4,jeff,19
6,fred,10
8,henry,9

We can reapply our changes with the csvpatch command:

$ csvpatch --input=diff.json --output=b-plus.csv a-plus.csv
$ cat b-plus.csv
id,name,amount
1,bob,23
3,sarah,7
4,jeff,19
5,mira,81
6,fred,13
8,henry,9

This can be useful if you’re using csvdiff to transform data that’s outside your control. In this case, you maintain the patch file and simply reapply it when the upstream data provider gives you a fresh file.

For more usage options, run csvdiff --help or csvpatch --help.

License

BSD license

History

0.3.1 (2016-04-20)

  • Fix a bug in summary mode.
  • Check for rows bleeding into one another.

0.3.0 (2015-01-07)

  • Standardise patch format with a JSON schema.
  • Provide a matching csvpatch command applying diffs.
  • Add a man page and docs for csvpatch.
  • Use exit codes to indicate difference.
  • Add a –quiet option to csvdiff.

0.2.0 (2014-12-30)

  • Uses click for the command-line interface.
  • Drop YAML support in favour of pretty-printed JSON.
  • Uses –style option to change output style.
  • Provides a full man page.

0.1.0 (2014-03-15)

  • First release on PyPI.
  • Generates a JSON or YAML difference between two CSV files
  • Specify multiple key components with -k
  • Can provide a difference summary
  • Assumes files use standard comma-separation, double-quoting and a header row with field names
Release History

Release History

0.3.1

This version

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.3.0

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.2.0

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.1.0

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

Download Files

Download Files

TODO: Brief introduction on what you do with files - including link to relevant help section.

File Name & Checksum SHA256 Checksum Help Version File Type Upload Date
csvdiff-0.3.1-py2.py3-none-any.whl (11.7 kB) Copy SHA256 Checksum SHA256 3.5 Wheel Apr 20, 2016
csvdiff-0.3.1.tar.gz (10.6 kB) Copy SHA256 Checksum SHA256 Source Apr 20, 2016

Supported By

WebFaction WebFaction Technical Writing Elastic Elastic Search Pingdom Pingdom Monitoring Dyn Dyn DNS HPE HPE Development Sentry Sentry Error Logging CloudAMQP CloudAMQP RabbitMQ Heroku Heroku PaaS Kabu Creative Kabu Creative UX & Design Fastly Fastly CDN DigiCert DigiCert EV Certificate Rackspace Rackspace Cloud Servers DreamHost DreamHost Log Hosting