Skip to main content

Python CLI tool and library for diffing CSV files

Project description

csv-diff

PyPI Changelog Tests License

Tool for viewing the difference between two CSV files. See Generating a commit log for San Francisco’s official list of trees (and the sf-tree-history repo commit log) for background information on this project.

Installation

pip install csv-diff

Usage

Consider two CSV files:

one.csv

id,name,age
1,Cleo,4
2,Pancakes,2

two.csv

id,name,age
1,Cleo,5
3,Bailey,1

csv-diff can show a human-readable summary of differences between the files:

$ csv-diff one.csv two.csv --key=id
1 row changed, 1 row added, 1 row removed

1 row changed

  Row 1
    age: "4" => "5"

1 row added

  id: 3
  name: Bailey
  age: 1

1 row removed

  id: 2
  name: Pancakes
  age: 2

The --key=id option means that the id column should be treated as the unique key, to identify which records have changed.

The tool will automatically detect if your files are comma- or tab-separated. You can over-ride this automatic detection and force the tool to use a specific format using --format=tsv or --format=csv.

Use --show-unchanged to include full details of the unchanged rows in the diff output:

% csv-diff one.csv two.csv --key=id --show-unchanged
1 row changed

  id: 1
    age: "4" => "5"

    Unchanged:
      name: "Cleo"

You can use the --json option to get a machine-readable difference:

$ csv-diff one.csv two.csv --key=id --json
{
    "added": [
        {
            "id": "3",
            "name": "Bailey",
            "age": "1"
        }
    ],
    "removed": [
        {
            "id": "2",
            "name": "Pancakes",
            "age": "2"
        }
    ],
    "changed": [
        {
            "key": "1",
            "changes": {
                "age": [
                    "4",
                    "5"
                ]
            }
        }
    ],
    "columns_added": [],
    "columns_removed": []
}

As a Python library

You can also import the Python library into your own code like so:

from csv_diff import load_csv, compare
diff = compare(
    load_csv(open("one.csv"), key="id"),
    load_csv(open("two.csv"), key="id")
)

diff will now contain the same data structure as the output in the --json example above.

If the columns in the CSV have changed, those added or removed columns will be ignored when calculating changes made to specific rows.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

csv-diff-1.0.tar.gz (6.8 kB view details)

Uploaded Source

Built Distribution

csv_diff-1.0-py3-none-any.whl (12.1 kB view details)

Uploaded Python 3

File details

Details for the file csv-diff-1.0.tar.gz.

File metadata

  • Download URL: csv-diff-1.0.tar.gz
  • Upload date:
  • Size: 6.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.50.2 CPython/3.9.0

File hashes

Hashes for csv-diff-1.0.tar.gz
Algorithm Hash digest
SHA256 ca832ec129c98c96020eb0c56997e4fbefb4fc47dc21b9865eea973961f7afe5
MD5 26a5482936829d2ebe7894170b2c389c
BLAKE2b-256 644390d8572613042e7620baec165122d34a9038c4ae8e3921c3d96d13884068

See more details on using hashes here.

File details

Details for the file csv_diff-1.0-py3-none-any.whl.

File metadata

  • Download URL: csv_diff-1.0-py3-none-any.whl
  • Upload date:
  • Size: 12.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.50.2 CPython/3.9.0

File hashes

Hashes for csv_diff-1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 98b528937362da99929922adcaa0c48b61b631499e3cb55a94c8e26ef7e0ad2e
MD5 28808bbfd49ebc19961906ac65d62f71
BLAKE2b-256 8e755a0aae5cd2bfeb8591cc44232fd251a86644a6ee2815bceb05f2b1ef4442

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page