diff commits to your kedro pipeline
Project description
kedro-diff
kedro-diff
aims to be a familiar interface into comparing two points in
history. Git diffs are fantastic tools but often are too granular to see what
has changed inside the pipeline. kedro-diff
aims to be a familiar tool at a
higher level so we can see changes to nodes (names, inputs, outputs, tags).
Installation
pip install kedro-diff
Example
kedro diff --stat develop..master
M __default__ | 6 ++++-
M data_science | 3 +++
M data_engineering | 3 ++-
?? new_pipeline
4 pipelines changed, 5 insertions(+), 4 deletions(-)
Usage
# diff develop into master
kedro diff develop..master
kedro diff develop master
# diff current state with main
kedro diff main
# diff current state with main
kedro diff ..main
# comparing pipelines from two branches
kedro diff master new_branch data_science
More examples
kedro diff develop..master
╭──────────────────────────────────────────────────────────────────────────────╮
│ modified: data_engineering │
╰──────────────────────────────────────────────────────────────────────────────╯
+ strip_whitespace
+ lowercase_columns
+ get_trains
- get_tains
╭──────────────────────────────────────────────────────────────────────────────╮
│ modified: data_science │
╰──────────────────────────────────────────────────────────────────────────────╯
+ split_data
Roadmap
1.0.0
- commit parser
- get
pipeline.to_json()
for__default__
for two different commits - get
pipeline.to_json()
for all pipelines for two different commits - --stat compares the number of nodes added or dropped in
__default__
- --stat compares the number of nodes added or dropped in all pipelines
- --stat compares attribute changes (inputs, outputs, tags) in all pipelines
- compare input names
- compare output names
- speed up getting repeat pipelines from the same commit (no need to reaload a new session)
- speed up getting repeat commits by checking commit hash (reuse existing json)
- minimize untested code
2.0.0
super-size pipeline.to_json()
- compare all attributes on a node ( not just inputs, outputs, tags)
- allow users to specify custom to_json method
- function names
- function hashes
- catalog _filepath
- catalog _sql
Testing
This project strives for 100% test coverage where it makes sense. Other kedro plugins I have created have suffered development speed by the complexity of fully testing on a full kedro project. There are so many pieces to get into place that it becomes difficult to test accross multiple versions of kedro or keep the tests working as kedro changes. Minimal functionality will be placed into modules that require a kedro full kedro project in place to work.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file kedro-diff-0.1.1.tar.gz
.
File metadata
- Download URL: kedro-diff-0.1.1.tar.gz
- Upload date:
- Size: 15.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.6.3 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.0 CPython/3.7.11
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b6e0e2f69d9ec25cb52e8f49d7fabbdfb4bb6823e387a23780acf8ad1f212de2 |
|
MD5 | 1e962f6facce4b069bf0b0ce6bac8d63 |
|
BLAKE2b-256 | 90be66b3730d75d31a1ed8c1f88324f670e2634bf23a5631362dbd4f3a3b496e |
File details
Details for the file kedro_diff-0.1.1-py3-none-any.whl
.
File metadata
- Download URL: kedro_diff-0.1.1-py3-none-any.whl
- Upload date:
- Size: 17.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.6.3 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.0 CPython/3.7.11
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6c521cff3e66bdcfb2fd9d9561f3ba568e06c7e60adae5a35e3046729e989923 |
|
MD5 | b4e69979c1263524b0bdd16af85433c4 |
|
BLAKE2b-256 | 4e04f3a7de309d4d9c609a25ed3b429e24219ac0c4c91f0f4554dd218134a7be |