Skip to main content

diff commits to your kedro pipeline

Project description

kedro-diff

kedro-diff aims to be a familiar interface into comparing two points in history. Git diffs are fantastic tools but often are too granular to see what has changed inside the pipeline. kedro-diff aims to be a familiar tool at a higher level so we can see changes to nodes (names, inputs, outputs, tags).

Installation

pip install kedro-diff

Example

kedro diff --stat develop..master
M  __default__      | 6 ++++-
M  data_science     | 3 +++
M  data_engineering | 3 ++-
?? new_pipeline

4 pipelines changed, 5 insertions(+), 4 deletions(-)

Usage

# diff develop into master
kedro diff develop..master

kedro diff develop master

# diff current state with main
kedro diff main

# diff current state with main
kedro diff ..main

# comparing pipelines from two branches
kedro diff master new_branch data_science

More examples

kedro diff develop..master
╭──────────────────────────────────────────────────────────────────────────────╮
│ modified: data_engineering                                                   │
╰──────────────────────────────────────────────────────────────────────────────╯
+ strip_whitespace
+ lowercase_columns
+ get_trains
- get_tains
╭──────────────────────────────────────────────────────────────────────────────╮
│ modified: data_science                                                       │
╰──────────────────────────────────────────────────────────────────────────────╯
+ split_data

Roadmap

1.0.0

  • commit parser
  • get pipeline.to_json() for __default__ for two different commits
  • get pipeline.to_json() for all pipelines for two different commits
  • --stat compares the number of nodes added or dropped in __default__
  • --stat compares the number of nodes added or dropped in all pipelines
  • --stat compares attribute changes (inputs, outputs, tags) in all pipelines
  • compare input names
  • compare output names
  • speed up getting repeat pipelines from the same commit (no need to reaload a new session)
  • speed up getting repeat commits by checking commit hash (reuse existing json)
  • minimize untested code

2.0.0

super-size pipeline.to_json()

  • compare all attributes on a node ( not just inputs, outputs, tags)
  • allow users to specify custom to_json method
  • function names
  • function hashes
  • catalog _filepath
  • catalog _sql

Testing

This project strives for 100% test coverage where it makes sense. Other kedro plugins I have created have suffered development speed by the complexity of fully testing on a full kedro project. There are so many pieces to get into place that it becomes difficult to test accross multiple versions of kedro or keep the tests working as kedro changes. Minimal functionality will be placed into modules that require a kedro full kedro project in place to work.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kedro-diff-0.1.1.tar.gz (15.4 kB view details)

Uploaded Source

Built Distribution

kedro_diff-0.1.1-py3-none-any.whl (17.9 kB view details)

Uploaded Python 3

File details

Details for the file kedro-diff-0.1.1.tar.gz.

File metadata

  • Download URL: kedro-diff-0.1.1.tar.gz
  • Upload date:
  • Size: 15.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.6.3 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.0 CPython/3.7.11

File hashes

Hashes for kedro-diff-0.1.1.tar.gz
Algorithm Hash digest
SHA256 b6e0e2f69d9ec25cb52e8f49d7fabbdfb4bb6823e387a23780acf8ad1f212de2
MD5 1e962f6facce4b069bf0b0ce6bac8d63
BLAKE2b-256 90be66b3730d75d31a1ed8c1f88324f670e2634bf23a5631362dbd4f3a3b496e

See more details on using hashes here.

File details

Details for the file kedro_diff-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: kedro_diff-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 17.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.6.3 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.0 CPython/3.7.11

File hashes

Hashes for kedro_diff-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 6c521cff3e66bdcfb2fd9d9561f3ba568e06c7e60adae5a35e3046729e989923
MD5 b4e69979c1263524b0bdd16af85433c4
BLAKE2b-256 4e04f3a7de309d4d9c609a25ed3b429e24219ac0c4c91f0f4554dd218134a7be

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page