Skip to main content

Compare delimited files that share a common key.

Project description

Build Documentation Status made-with-python GitHub license


Logo

csvcomparer

Compare delimited files that share a common key.
Explore the docs » · Report Bug · Request Feature

Table of Contents
  1. Overview
  2. Example
  3. Installation
  4. Roadmap
  5. Contributing
  6. License
  7. Contact
  8. Acknowledgments

Overview

csvcomparer is an open-source Python project used for determining differences between two delimited files (referred to here as "left" and "right" files) that share a common key, or index. Specifically, csvcomparer determines:

  • Columns exclusive to the left and right files, respectively.
  • Rows exclusive to the left and right files, respectively.
  • Field-level differences for rows/columns in common between files.

(back to top)

Basic Usage

As a python module:

from csvcomparer import CsvCompare

diffs = CsvCompare(
  "<path/to/left_file.csv>",
  "<path/to/right_file.csv>",
  "<key>").diffs

As a command line utility:

> python csvcomparer left_csv_filepath right_csv_filepath key

Examples

Provided the following file data:
menu_l.csv

id name pic price score togo
1A beer 🍺 $6.00 3.9 N
1B wine 🍷 $7.25 4.5 N
2A cheese 🧀 $4.10 4.0 Y
3A bacon 🥓 $3.33 4.9 Y

menu_r.csv

id name pic price stars
1A beer 🍻 $5.25 3.9
1B wine 🍷 $7.25 4.8
2A cheese 🧀 $3.95 4.1
4C taco 🌮 $8.33 3.1
5B pizza 🍕 $9.99 2.4

Usage as a Python module...

>>> from csvcomparer import CsvCompare
>>> CsvCompare("menu_l.csv", "menu_r.csv", "id").diffs

... or as a command-line utility:

> python csvcomparer menu_l.csv menu_r.csv id

Returns:

{'cols_added': ['stars'],
 'cols_removed': ['score', 'togo'],
 'rows_added': {'4C': {'name': 'taco',
                       'pic': '🌮',
                       'price': '$8.33',
                       'stars': 3.1},
                '5B': {'name': 'pizza',
                       'pic': '🍕',
                       'price': '$9.99',
                       'stars': 2.4}},
 'rows_changed': {'1A': [('pic', '🍺', '🍻'), ('price', '$6.00', '$5.25')],
                  '2A': [('price', '$4.10', '$3.95')]},
 'rows_removed': {'3A': {'name': 'bacon',
                         'pic': '🥓',
                         'price': '$3.33',
                         'score': 4.9,
                         'togo': 'Y'}}}

Multi-column keys are also supported. So for the same file data:

>>> CsvCompare("menu_l.csv", "menu_r.csv", ["id", "name"]).diffs

Returns:

{'cols_added': ['stars'],
 'cols_removed': ['score', 'togo'],
 'rows_added': {('4C', 'taco'): {'pic': '🌮', 'price': '$8.33', 'stars': 3.1},
                ('5B', 'pizza'): {'pic': '🍕', 'price': '$9.99', 'stars': 2.4}},
 'rows_changed': {('1A', 'beer'): [('pic', '🍺', '🍻'),
                                   ('price', '$6.00', '$5.25')],
                  ('2A', 'cheese'): [('price', '$4.10', '$3.95')]},
 'rows_removed': {('3A', 'bacon'): {'pic': '🥓',
                                    'price': '$3.33',
                                    'score': 4.9,
                                    'togo': 'Y'}}}

See the docs for detailed usage and examples.

(back to top)

Installation

Prerequisites

Then simply run:

poetry install csvcomparer

(back to top)

Roadmap

csvcomparer is in its infancy, and there are high hopes for this project!

The ultimate goal is being able to compare any two data sets that can be consumed as a "dataframe", regardless of size, efficiently as possible. This comes with a great deal of challenges, but I'm confident it will get there.

See the open issues for a full list of proposed features (and known issues).

(back to top)

Contributing

Any contributions are greatly appreciated.

If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature/AmazingFeature)
  3. Commit your Changes (git commit -m 'Add some AmazingFeature')
  4. Push to the Branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

(back to top)

License

Distributed under the MIT License. See LICENSE.txt for more information.

(back to top)

Contact

Ryan Bergsmith - LinkedIn - ryguydev@gmail.com
Project Link: Github

(back to top)

Acknowledgments

(back to top)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

csvcomparer-0.1.0.tar.gz (9.2 kB view hashes)

Uploaded Source

Built Distribution

csvcomparer-0.1.0-py3-none-any.whl (8.4 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page