Skip to main content

Kedro Great makes integrating Great Expectations with Kedro easy!

Project description

Kedro Great

As Seen on DataEngineerOne
Watch the Video: Kedro Great: Use Great Expectations with Ease!

Kedro Great is an easy-to-use plugin for kedro that makes integration with Great Expectations fast and simple.

Hold yourself accountable to Great Expectations.
Never have fear of data silently changing ever again.

Quick Start

Install

Kedro Great is available on pypi, and is installed with kedro hooks.

pip install kedro-great

Setup

Once installed, kedro great becomes available as a kedro command.

You can use kedro great init to initialize a Great Expectations project, and then automatically generate its project context.

Furthermore, by using kedro great init, you also generate Great Expectations Datasources and Suites to use with your catalog.yml DataSets.

By default, expectation suites are named for the catalog.yml name and a basic.json is generated for each.

kedro great init

Use

After the Great Expectations project has been setup and configured, you can now use the KedroGreat hook to run all your data validations every time the pipeline runs.

# run.py
from kedro_great import KedroGreat

class ProjectContext(KedroContext):
    hooks = (
        KedroGreat(),
    )

Then just run the kedro pipeline to run the suites.

kedro run

Results

Finally, you can use great_expectations itself to generate documentation and view the results of your pipeline.

Love seeing those green ticks!

great_expectations docs build

Hook Options

The KedroGreat hook supports a few options currently. If you wish to

expectations_map: Dict[str, Union[str, List[str]]]

If you have multiple expectation suites you wish to run, or expectation suites that do not have the same name as the catalog dataset, these mappings can be specified in the expectations_map argument for KedroGreat

Default: The catalog name is the expectation name.

Note: Specifying a suite type such as .basic will override all other suite types

KedroGreat(expectations_map={
    'pandas_iris_data': 'pandas_iris_data',
    'spark_iris_data': ['spark_iris_data',
                        'other_expectation',
                        'another_expectation.basic'],

})

suite_types: List[Optional[str]]

If your suites have multiple types, you can choose exactly which types to run.

A None means that a suite will not have the type appended to the name.

Default: The KedroGreat.DEFAULT_SUITE_TYPES.

Node: If a suite type is already specified in the expectations_map, that will override this list.

KedroGreat(suite_types=[
    'warning',
    'basic',
    None
])

run_before_node:bool, run_after_node: bool

You can decide when the suites run, before or after a node or both before and after a node.

It will operate on the node inputs and outputs respectively.

Default: Only runs before a node runs.

KedroGreat(run_before_node=True, run_after_node=False)

fail_fast: bool, fail_after_pipeline_run: bool

You can also have KedroGreat throw a SuiteValidationFailure when a Great Expectations validation fails.

Either the exception can be throw immediately, or the exceptions can be aggregated over the whole pipeline run, and thrown at the end.

This is useful for when you wish to run validation on your pipeline in a CI/CD way.

Default: Neither are set

KedroGreat(fail_fast=True, fail_after_pipeline_run=True)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kedro-great-0.2.7.tar.gz (9.4 kB view details)

Uploaded Source

Built Distribution

kedro_great-0.2.7-py3-none-any.whl (13.0 kB view details)

Uploaded Python 3

File details

Details for the file kedro-great-0.2.7.tar.gz.

File metadata

  • Download URL: kedro-great-0.2.7.tar.gz
  • Upload date:
  • Size: 9.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.24.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.7.6

File hashes

Hashes for kedro-great-0.2.7.tar.gz
Algorithm Hash digest
SHA256 e0aa47af119d10a3459e2b4a8cc9b1cf44a690a8eec6e50ad65842ed2400fee6
MD5 6697de0ef54794397ee0195540045c48
BLAKE2b-256 723e6a238d7e79abe1c0967b18b422a7800d7fec874f4b3a3efcf41efeb244e1

See more details on using hashes here.

File details

Details for the file kedro_great-0.2.7-py3-none-any.whl.

File metadata

  • Download URL: kedro_great-0.2.7-py3-none-any.whl
  • Upload date:
  • Size: 13.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.24.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.7.6

File hashes

Hashes for kedro_great-0.2.7-py3-none-any.whl
Algorithm Hash digest
SHA256 4dc78a652728d5803faf0a0502290c504602209f974c20a2b695dcbe97a4def5
MD5 8e5645fcc3df0701d1faf0bd73ec1545
BLAKE2b-256 295cb7938b4ad9000c63add0c47891aae3cf405a41b2e26ffd6ce59fde723f7c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page