Skip to main content
Join the official Python Developers Survey 2018 and win valuable prizes: Start the survey!

Diff and patch tables

Project description

[![Build Status](https://travis-ci.org/paulfitz/daff.svg?branch=master)](https://travis-ci.org/paulfitz/daff)
[![NPM version](https://badge.fury.io/js/daff.svg)](http://badge.fury.io/js/daff)
[![Gem Version](https://badge.fury.io/rb/daff.svg)](http://badge.fury.io/rb/daff)
[![PyPI version](https://badge.fury.io/py/daff.svg)](http://badge.fury.io/py/daff)
[![PHP version](https://badge.fury.io/ph/paulfitz%2Fdaff-php.svg)](http://badge.fury.io/ph/paulfitz%2Fdaff-php)
[![Bower version](https://badge.fury.io/bo/daff.svg)](http://badge.fury.io/bo/daff)
![Badge count](http://img.shields.io/:badges-7/7-33aa33.svg)

daff: data diff
===============

This is a library for comparing tables, producing a summary of their
differences, and using such a summary as a patch file. It is
optimized for comparing tables that share a common origin, in other
words multiple versions of the "same" table.

For a live demo, see:
> http://paulfitz.github.com/daff/

Install the library for your favorite language:
````sh
npm install daff -g # node/javascript
pip install daff # python
gem install daff # ruby
composer require paulfitz/daff-php # php
install.packages('daff') # R wrapper by Edwin de Jonge
bower install daff # web/javascript
````

Other translations are available here:
> https://github.com/paulfitz/daff/releases

Or use the library to view csv diffs on github via a chrome extension:
> https://github.com/theodi/csvhub

The diff format used by `daff` is specified here:
> http://paulfitz.github.io/daff-doc/spec.html

This library is a stripped down version of the coopy toolbox (see
http://share.find.coop). To compare tables from different origins,
or with automatically generated IDs, or other complications, check out
the coopy toolbox.

The program
-----------

You can run `daff`/`daff.py`/`daff.rb` as a utility program:
````
$ daff
daff can produce and apply tabular diffs.
Call as:
daff [--color] [--no-color] [--output OUTPUT.csv] a.csv b.csv
daff [--output OUTPUT.html] a.csv b.csv
daff [--output OUTPUT.csv] parent.csv a.csv b.csv
daff [--output OUTPUT.ndjson] a.ndjson b.ndjson
daff [--www] a.csv b.csv
daff patch [--inplace] [--output OUTPUT.csv] a.csv patch.csv
daff merge [--inplace] [--output OUTPUT.csv] parent.csv a.csv b.csv
daff trim [--output OUTPUT.csv] source.csv
daff render [--output OUTPUT.html] diff.csv
daff copy in.csv out.tsv
daff in.csv
daff git
daff version

The --inplace option to patch and merge will result in modification of a.csv.

If you need more control, here is the full list of flags:
daff diff [--output OUTPUT.csv] [--context NUM] [--all] [--act ACT] a.csv b.csv
--act ACT: show only a certain kind of change (update, insert, delete)
--all: do not prune unchanged rows or columns
--all-rows: do not prune unchanged rows
--all-columns: do not prune unchanged columns
--color: highlight changes with terminal colors (default in terminals)
--context NUM: show NUM rows of context
--id: specify column to use as primary key (repeat for multi-column key)
--ignore: specify column to ignore completely (can repeat)
--index: include row/columns numbers from original tables
--input-format [csv|tsv|ssv|psv|json]: set format to expect for input
--eol [crlf|lf|cr|auto]: separator between rows of csv output.
--no-color: make sure terminal colors are not used
--ordered: assume row order is meaningful (default for CSV)
--output-format [csv|tsv|ssv|psv|json|copy|html]: set format for output
--padding [dense|sparse|smart]: set padding method for aligning columns
--table NAME: compare the named table, used with SQL sources
--unordered: assume row order is meaningless (default for json formats)
-w / --ignore-whitespace: ignore changes in leading/trailing whitespace
-i / --ignore-case: ignore differences in case

daff render [--output OUTPUT.html] [--css CSS.css] [--fragment] [--plain] diff.csv
--css CSS.css: generate a suitable css file to go with the html
--fragment: generate just a html fragment rather than a page
--plain: do not use fancy utf8 characters to make arrows prettier
--www: send output to a browser
````

Formats supported are CSV, TSV, and ndjson.

Using with git
--------------

Run `daff git csv` to install daff as a diff and merge handler
for `*.csv` files in your repository. Run `daff git` for instructions
on doing this manually. Your CSV diffs and merges will get smarter,
since git will suddenly understand about rows and columns, not just lines:

![Example CSV diff](http://paulfitz.github.io/daff-doc/images/daff_vs_diff.png)

The library
-----------

You can use `daff` as a library from any supported language. We take
here the example of Javascript. To use `daff` on a webpage,
first include `daff.js`:
```html
<script src="daff.js"></script>
```
Or if using node outside the browser:
```js
var daff = require('daff');
```

For concreteness, assume we have two versions of a table,
`data1` and `data2`:
```js
var data1 = [
['Country','Capital'],
['Ireland','Dublin'],
['France','Paris'],
['Spain','Barcelona']
];
var data2 = [
['Country','Code','Capital'],
['Ireland','ie','Dublin'],
['France','fr','Paris'],
['Spain','es','Madrid'],
['Germany','de','Berlin']
];
```

To make those tables accessible to the library, we wrap them
in `daff.TableView`:
```js
var table1 = new daff.TableView(data1);
var table2 = new daff.TableView(data2);
```

We can now compute the alignment between the rows and columns
in the two tables:
```js
var alignment = daff.compareTables(table1,table2).align();
```

To produce a diff from the alignment, we first need a table
for the output:
```js
var data_diff = [];
var table_diff = new daff.TableView(data_diff);
```

Using default options for the diff:
```js
var flags = new daff.CompareFlags();
var highlighter = new daff.TableDiff(alignment,flags);
highlighter.hilite(table_diff);
```

The diff is now in `data_diff` in highlighter format, see
specification here:
> http://paulfitz.github.io/daff-doc/spec.html

```js
[ [ '!', '', '+++', '' ],
[ '@@', 'Country', 'Code', 'Capital' ],
[ '+', 'Ireland', 'ie', 'Dublin' ],
[ '+', 'France', 'fr', 'Paris' ],
[ '->', 'Spain', 'es', 'Barcelona->Madrid' ],
[ '+++', 'Germany', 'de', 'Berlin' ] ]
```

For visualization, you may want to convert this to a HTML table
with appropriate classes on cells so you can color-code inserts,
deletes, updates, etc. You can do this with:
```js
var diff2html = new daff.DiffRender();
diff2html.render(table_diff);
var table_diff_html = diff2html.html();
```

For 3-way differences (that is, comparing two tables given knowledge
of a common ancestor) use `daff.compareTables3` (give ancestor
table as the first argument).

Here is how to apply that difference as a patch:
```js
var patcher = new daff.HighlightPatch(table1,table_diff);
patcher.apply();
// table1 should now equal table2
```

For other languages, you should find sample code in
the packages on the [Releases](https://github.com/paulfitz/daff/releases) page.

Supported languages
-------------------

The `daff` library is written in [Haxe](http://haxe.org/), which
can be translated reasonably well into at least the following languages:

* Javascript
* Python
* Java
* C#
* C++
* Ruby (using an [unofficial haxe target](https://github.com/paulfitz/haxe) developed for `daff`)
* PHP

Some translations are done for you on the
[Releases](https://github.com/paulfitz/daff/releases) page.
To make another translation, or to compile from source
first follow the [Haxe language introduction](https://haxe.org/documentation/introduction/language-introduction.html) for the
language you care about. At the time of writing, if you are on OSX, you should
install haxe using `brew install haxe`. Then do one of:

```
make js
make php
make py
make java
make cs
make cpp
```

For each language, the `daff` library expects to be handed an interface to tables you create, rather than creating them
itself. This is to avoid inefficient copies from one format to another. You'll find a `SimpleTable` class you can use if
you find this awkward.

Other possibilities:

* There's a daff wrapper for R written by [Edwin de Jonge](https://github.com/edwindj), see https://github.com/edwindj/daff and http://cran.r-project.org/web/packages/daff
* There's a hand-written ruby port by [James Smith](https://github.com/Floppy), see https://github.com/theodi/coopy-ruby

API documentation
-----------------

* You can browse the `daff` classes at http://paulfitz.github.io/daff-doc/

Sponsors
--------

<img src="http://datacommons.coop/images/the_zen_of_venn.png" alt="the zen of venn" height="100">
The <a href="https://datacommons.coop">Data Commons Co-op</a>, "perhaps the geekiest of all cooperative organizations on the planet," has given great moral support during the development of `daff`.
Donate a multiple of `42.42` in your currency to let them know you care: <a href="https://datacommons.coop/donate/">https://datacommons.coop/donate/</a>.

Reading material
----------------

* http://dataprotocols.org/tabular-diff-format/ : a specification of the diff format we use (appears to have gone away; see https://paulfitz.github.io/daff-doc/spec.html now).
* http://theodi.org/blog/csvhub-github-diffs-for-csv-files : using this library with github.
* https://github.com/ropensci/unconf/issues/19 : a thread about diffing data in which daff shows up in at least four guises (see if you can spot them all).
* http://theodi.org/blog/adapting-git-simple-data : using this library with gitlab.
* http://okfnlabs.org/blog/2013/08/08/diffing-and-patching-data.html : a summary of where the library came from.
* http://blog.okfn.org/2013/07/02/git-and-github-for-data/ : a post about storing small data in git/github.
* http://blog.ouseful.info/2013/08/27/diff-or-chop-github-csv-data-files-and-openrefine/ : counterpoint - a post discussing tracked-changes rather than diffs.
* http://blog.byronjsmith.com/makefile-shortcuts.html : a tutorial on using `make` for data, with daff in the mix. "Since git considers changes on a per-line basis,
looking at diffs of comma-delimited and tab-delimited files can get obnoxious. The program daff fixes this problem."

## License

daff is distributed under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Filename, size & hash SHA256 hash help File type Python version Upload date
daff-1.3.38.tar.gz (152.4 kB) Copy SHA256 hash SHA256 Source None Oct 13, 2018

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN SignalFx SignalFx Supporter DigiCert DigiCert EV certificate StatusPage StatusPage Status page