Skip to main content

Python CLI tool and library for diffing CSV files

Project description

csv-diff

PyPI License

Tool for viewing the difference between two CSV files. See Generating a commit log for San Francisco’s official list of trees (and the sf-tree-history repo commit log) for background information on this project.

Consider two CSV files:

one.csv

id,name,age
1,Cleo,4
2,Pancakes,2

two.csv

id,name,age
1,Cleo,5
3,Bailey,1

csv-diff can show a human-readable summary of differences between the files:

$ csv-diff one.csv two.csv --key=id
1 row changed, 1 row added, 1 row removed

1 row changed

  Row 1
    age: "4" => "5"

1 row added

  id: 3
  name: Bailey
  age: 1

1 row removed

  id: 2
  name: Pancakes
  age: 2

The --key=id option means that the id column should be treated as the unique key, to identify which records have changed.

You can also run it using the --json option to get a machine-readable difference:

$ csv-diff one.csv two.csv --key=id --json
{
    "added": [
        {
            "id": "3",
            "name": "Bailey",
            "age": "1"
        }
    ],
    "removed": [
        {
            "id": "2",
            "name": "Pancakes",
            "age": "2"
        }
    ],
    "changed": [
        {
            "key": "1",
            "changes": {
                "age": [
                    "4",
                    "5"
                ]
            }
        }
    ],
    "columns_added": [],
    "columns_removed": []
}

You can also import the Python library into your own code like so:

from csv_diff import load_csv, compare
diff = compare(
    load_csv(open("one.csv"), key="id"),
    load_csv(open("two.csv"), key="id")
)

diff will now contain the same data structure as the output in the --json example above.

If the columns in the CSV have changed, those added or removed olumns will be ignored when calculating changes made to specific rows.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

csv-diff-yhua-0.5.4.tar.gz (5.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

csv_diff_yhua-0.5.4-py3-none-any.whl (11.5 kB view details)

Uploaded Python 3

File details

Details for the file csv-diff-yhua-0.5.4.tar.gz.

File metadata

  • Download URL: csv-diff-yhua-0.5.4.tar.gz
  • Upload date:
  • Size: 5.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/3.7.3

File hashes

Hashes for csv-diff-yhua-0.5.4.tar.gz
Algorithm Hash digest
SHA256 9cdee48fd792e38239651dbe92cadc2646ee33dc2de29fbd93045cb88c826349
MD5 2046bb8adc4ad0c4afee1e74933d4095
BLAKE2b-256 2628557819a19347ca8c034b2c8f862a3d81a349a8cf8e733376b414158aab12

See more details on using hashes here.

File details

Details for the file csv_diff_yhua-0.5.4-py3-none-any.whl.

File metadata

  • Download URL: csv_diff_yhua-0.5.4-py3-none-any.whl
  • Upload date:
  • Size: 11.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/3.7.3

File hashes

Hashes for csv_diff_yhua-0.5.4-py3-none-any.whl
Algorithm Hash digest
SHA256 9e448dc0f158965cbeaa3bea4a0093f12c23602e4d1cff990d1e94262a84ddb8
MD5 0b3928fcd699b6564bf9e4b9484e737d
BLAKE2b-256 ffba6d441ec87914ded8fba2c197f992801afb57390a527df8bab6da883ac29b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page