Skip to main content

Compare delimited files that share a common key.

Project description

Build Documentation Status made-with-python GitHub license


Logo

csvcomparer

Compare delimited files that share a common key.
Explore the docs » · Report Bug · Request Feature

Table of Contents
  1. Overview
  2. Example
  3. Installation
  4. Roadmap
  5. Contributing
  6. License
  7. Contact
  8. Acknowledgments

Overview

csvcomparer is an open-source Python project used for determining differences between two delimited files (referred to here as "left" and "right" files) that share a common key, or index. Specifically, csvcomparer determines:

  • Columns exclusive to the left and right files, respectively.
  • Rows exclusive to the left and right files, respectively.
  • Field-level differences for rows/columns in common between files.

(back to top)

Basic Usage

As a python module:

from csvcomparer import CsvCompare

diffs = CsvCompare(
  "<path/to/left_file.csv>",
  "<path/to/right_file.csv>",
  "<key>").diffs

As a command line utility:

> python csvcomparer left_csv_filepath right_csv_filepath key

Examples

Provided the following file data:
menu_l.csv

id name pic price score togo
1A beer 🍺 $6.00 3.9 N
1B wine 🍷 $7.25 4.5 N
2A cheese 🧀 $4.10 4.0 Y
3A bacon 🥓 $3.33 4.9 Y

menu_r.csv

id name pic price stars
1A beer 🍻 $5.25 3.9
1B wine 🍷 $7.25 4.8
2A cheese 🧀 $3.95 4.1
4C taco 🌮 $8.33 3.1
5B pizza 🍕 $9.99 2.4

Usage as a Python module...

>>> from csvcomparer import CsvCompare
>>> CsvCompare("menu_l.csv", "menu_r.csv", "id").diffs

... or as a command-line utility:

> python csvcomparer menu_l.csv menu_r.csv id

Returns:

{'cols_added': ['stars'],
 'cols_removed': ['score', 'togo'],
 'rows_added': {'4C': {'name': 'taco',
                       'pic': '🌮',
                       'price': '$8.33',
                       'stars': 3.1},
                '5B': {'name': 'pizza',
                       'pic': '🍕',
                       'price': '$9.99',
                       'stars': 2.4}},
 'rows_changed': {'1A': [('pic', '🍺', '🍻'), ('price', '$6.00', '$5.25')],
                  '2A': [('price', '$4.10', '$3.95')]},
 'rows_removed': {'3A': {'name': 'bacon',
                         'pic': '🥓',
                         'price': '$3.33',
                         'score': 4.9,
                         'togo': 'Y'}}}

Multi-column keys are also supported. So for the same file data:

>>> CsvCompare("menu_l.csv", "menu_r.csv", ["id", "name"]).diffs

Returns:

{'cols_added': ['stars'],
 'cols_removed': ['score', 'togo'],
 'rows_added': {('4C', 'taco'): {'pic': '🌮', 'price': '$8.33', 'stars': 3.1},
                ('5B', 'pizza'): {'pic': '🍕', 'price': '$9.99', 'stars': 2.4}},
 'rows_changed': {('1A', 'beer'): [('pic', '🍺', '🍻'),
                                   ('price', '$6.00', '$5.25')],
                  ('2A', 'cheese'): [('price', '$4.10', '$3.95')]},
 'rows_removed': {('3A', 'bacon'): {'pic': '🥓',
                                    'price': '$3.33',
                                    'score': 4.9,
                                    'togo': 'Y'}}}

See the docs for detailed usage and examples.

(back to top)

Installation

Prerequisites

Then simply run:

poetry install csvcomparer

(back to top)

Roadmap

csvcomparer is in its infancy, and there are high hopes for this project!

The ultimate goal is being able to compare any two data sets that can be consumed as a "dataframe", regardless of size, efficiently as possible. This comes with a great deal of challenges, but I'm confident it will get there.

See the open issues for a full list of proposed features (and known issues).

(back to top)

Contributing

Any contributions are greatly appreciated.

If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature/AmazingFeature)
  3. Commit your Changes (git commit -m 'Add some AmazingFeature')
  4. Push to the Branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

(back to top)

License

Distributed under the MIT License. See LICENSE.txt for more information.

(back to top)

Contact

Ryan Bergsmith - LinkedIn - ryguydev@gmail.com
Project Link: Github

(back to top)

Acknowledgments

(back to top)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

csvcomparer-0.1.0.tar.gz (9.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

csvcomparer-0.1.0-py3-none-any.whl (8.4 kB view details)

Uploaded Python 3

File details

Details for the file csvcomparer-0.1.0.tar.gz.

File metadata

  • Download URL: csvcomparer-0.1.0.tar.gz
  • Upload date:
  • Size: 9.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.13 CPython/3.10.4 Darwin/21.4.0

File hashes

Hashes for csvcomparer-0.1.0.tar.gz
Algorithm Hash digest
SHA256 94385aa6d943056a5eac1236e426dded3bb5f6dde2bc4578db13bcc351d246a0
MD5 81d1fe547e239c58f439e9c34f4288d5
BLAKE2b-256 222593197432386d2575ff9610151fdebeb78189573665d0ac17316d6177a7b5

See more details on using hashes here.

File details

Details for the file csvcomparer-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: csvcomparer-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 8.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.13 CPython/3.10.4 Darwin/21.4.0

File hashes

Hashes for csvcomparer-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 34ead392987a4aa8dc2fd0f6e00b9e0bb26dabe30a0d63ee2704ec51ddf46d90
MD5 c6afde7411b13c55c4db3cfc69d8e904
BLAKE2b-256 36ab69a18b0da3929a782321b23644297bf448d042fd16cfa6a2eeef64ddd024

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page