Compare delimited files that share a common key.
Project description
csvcomparer
Compare delimited files that share a common key.
Explore the docs »
·
Report Bug
·
Request Feature
Table of Contents
Overview
csvcomparer is an open-source Python project used for determining differences between two delimited files (referred to here as "left" and "right" files) that share a common key, or index. Specifically, csvcomparer determines:
- Columns exclusive to the left and right files, respectively.
- Rows exclusive to the left and right files, respectively.
- Field-level differences for rows/columns in common between files.
Basic Usage
As a python module:
from csvcomparer import CsvCompare
diffs = CsvCompare(
"<path/to/left_file.csv>",
"<path/to/right_file.csv>",
"<key>").diffs
As a command line utility:
> python csvcomparer left_csv_filepath right_csv_filepath key
Examples
Provided the following file data:
menu_l.csv
id | name | pic | price | score | togo |
---|---|---|---|---|---|
1A | beer | 🍺 | $6.00 | 3.9 | N |
1B | wine | 🍷 | $7.25 | 4.5 | N |
2A | cheese | 🧀 | $4.10 | 4.0 | Y |
3A | bacon | 🥓 | $3.33 | 4.9 | Y |
menu_r.csv
id | name | pic | price | stars |
---|---|---|---|---|
1A | beer | 🍻 | $5.25 | 3.9 |
1B | wine | 🍷 | $7.25 | 4.8 |
2A | cheese | 🧀 | $3.95 | 4.1 |
4C | taco | 🌮 | $8.33 | 3.1 |
5B | pizza | 🍕 | $9.99 | 2.4 |
Usage as a Python module...
>>> from csvcomparer import CsvCompare
>>> CsvCompare("menu_l.csv", "menu_r.csv", "id").diffs
... or as a command-line utility:
> python csvcomparer menu_l.csv menu_r.csv id
Returns:
{'cols_added': ['stars'],
'cols_removed': ['score', 'togo'],
'rows_added': {'4C': {'name': 'taco',
'pic': '🌮',
'price': '$8.33',
'stars': 3.1},
'5B': {'name': 'pizza',
'pic': '🍕',
'price': '$9.99',
'stars': 2.4}},
'rows_changed': {'1A': [('pic', '🍺', '🍻'), ('price', '$6.00', '$5.25')],
'2A': [('price', '$4.10', '$3.95')]},
'rows_removed': {'3A': {'name': 'bacon',
'pic': '🥓',
'price': '$3.33',
'score': 4.9,
'togo': 'Y'}}}
Multi-column keys are also supported. So for the same file data:
>>> CsvCompare("menu_l.csv", "menu_r.csv", ["id", "name"]).diffs
Returns:
{'cols_added': ['stars'],
'cols_removed': ['score', 'togo'],
'rows_added': {('4C', 'taco'): {'pic': '🌮', 'price': '$8.33', 'stars': 3.1},
('5B', 'pizza'): {'pic': '🍕', 'price': '$9.99', 'stars': 2.4}},
'rows_changed': {('1A', 'beer'): [('pic', '🍺', '🍻'),
('price', '$6.00', '$5.25')],
('2A', 'cheese'): [('price', '$4.10', '$3.95')]},
'rows_removed': {('3A', 'bacon'): {'pic': '🥓',
'price': '$3.33',
'score': 4.9,
'togo': 'Y'}}}
See the docs for detailed usage and examples.
Installation
Prerequisites
Then simply run:
poetry install csvcomparer
Roadmap
csvcomparer is in its infancy, and there are high hopes for this project!
The ultimate goal is being able to compare any two data sets that can be consumed as a "dataframe", regardless of size, efficiently as possible. This comes with a great deal of challenges, but I'm confident it will get there.
See the open issues for a full list of proposed features (and known issues).
Contributing
Any contributions are greatly appreciated.
If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature
) - Commit your Changes (
git commit -m 'Add some AmazingFeature'
) - Push to the Branch (
git push origin feature/AmazingFeature
) - Open a Pull Request
License
Distributed under the MIT License. See LICENSE.txt
for more information.
Contact
Ryan Bergsmith - LinkedIn - ryguydev@gmail.com
Project Link: Github
Acknowledgments
- Robin Zaubeerer for being a great PDM coach and providing roadmap inspiration.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for csvcomparer-0.1.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 34ead392987a4aa8dc2fd0f6e00b9e0bb26dabe30a0d63ee2704ec51ddf46d90 |
|
MD5 | c6afde7411b13c55c4db3cfc69d8e904 |
|
BLAKE2b-256 | 36ab69a18b0da3929a782321b23644297bf448d042fd16cfa6a2eeef64ddd024 |