Skip to main content

Python CLI tool and library for diffing CSV and JSON files

Project description

csv-diff

PyPI Changelog Tests License

Tool for viewing the difference between two CSV, TSV or JSON files. See Generating a commit log for San Francisco’s official list of trees (and the sf-tree-history repo commit log) for background information on this project.

Installation

pip install csv-diff

Usage

Consider two CSV files:

one.csv

id,name,age
1,Cleo,4
2,Pancakes,2

two.csv

id,name,age
1,Cleo,5
3,Bailey,1

csv-diff can show a human-readable summary of differences between the files:

$ csv-diff one.csv two.csv --key=id
1 row changed, 1 row added, 1 row removed

1 row changed

  Row 1
    age: "4" => "5"

1 row added

  id: 3
  name: Bailey
  age: 1

1 row removed

  id: 2
  name: Pancakes
  age: 2

The --key=id option means that the id column should be treated as the unique key, to identify which records have changed.

The tool will automatically detect if your files are comma- or tab-separated. You can over-ride this automatic detection and force the tool to use a specific format using --format=tsv or --format=csv.

You can also feed it JSON files, provided they are a JSON array of objects where each object has the same keys. Use --format=json if your input files are JSON.

Use --show-unchanged to include full details of the unchanged values for rows with at least one change in the diff output:

% csv-diff one.csv two.csv --key=id --show-unchanged
1 row changed

  id: 1
    age: "4" => "5"

    Unchanged:
      name: "Cleo"

You can use the --json option to get a machine-readable difference:

$ csv-diff one.csv two.csv --key=id --json
{
    "added": [
        {
            "id": "3",
            "name": "Bailey",
            "age": "1"
        }
    ],
    "removed": [
        {
            "id": "2",
            "name": "Pancakes",
            "age": "2"
        }
    ],
    "changed": [
        {
            "key": "1",
            "changes": {
                "age": [
                    "4",
                    "5"
                ]
            }
        }
    ],
    "columns_added": [],
    "columns_removed": []
}

As a Python library

You can also import the Python library into your own code like so:

from csv_diff import load_csv, compare
diff = compare(
    load_csv(open("one.csv"), key="id"),
    load_csv(open("two.csv"), key="id")
)

diff will now contain the same data structure as the output in the --json example above.

If the columns in the CSV have changed, those added or removed columns will be ignored when calculating changes made to specific rows.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

csv-diff-1.1.tar.gz (7.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

csv_diff-1.1-py3-none-any.whl (12.6 kB view details)

Uploaded Python 3

File details

Details for the file csv-diff-1.1.tar.gz.

File metadata

  • Download URL: csv-diff-1.1.tar.gz
  • Upload date:
  • Size: 7.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.57.0 CPython/3.9.1

File hashes

Hashes for csv-diff-1.1.tar.gz
Algorithm Hash digest
SHA256 ff94117992c67dd8bc4f917374f4dc198945ff85b1d67a11bd148b635f576bea
MD5 2c9cf505b45db4cb3c586c84cfef6f2e
BLAKE2b-256 df3e1873856cd1cdb2b373f592e4985d5de99e485daff425c37d6916c89adb3c

See more details on using hashes here.

File details

Details for the file csv_diff-1.1-py3-none-any.whl.

File metadata

  • Download URL: csv_diff-1.1-py3-none-any.whl
  • Upload date:
  • Size: 12.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.57.0 CPython/3.9.1

File hashes

Hashes for csv_diff-1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 f41447fb69165c4b4ae04bb0081884162502ff932e3471e78c97b053dc7ce0ef
MD5 5d2bb522ddb9d5354d82f4e1fdf446d2
BLAKE2b-256 bbcf53d23ae469f2727a5bdb7d6442573084136d2d713129e420cb554ff5506c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page