A Kedro plugin to integrate Data Sentinel in Kedro projects.
Project description
Kedro-DataSentinel
kedro-datasentinel is a kedro-plugin for seamless integration of Data Sentinel capabilities inside kedro projects. It enforces Kedro principles to make data quality and validation as production-ready as possible. Its core functionalities are:
-
Data Validation:
kedro-datasentinelenhances data quality for machine learning and data engineering pipelines. With minimal configuration, you can validate your datasets during a kedro run, both online (during pipeline execution) and offline (post-execution). -
Audit Logging: Track and monitor your pipeline executions with detailed audit logs. This feature provides visibility into your data processing workflows, making it easier to debug issues and ensure compliance.
-
Notification System: Get alerted when data quality issues arise. Configure notifications to be sent through various channels when validation checks fail.
How do I install kedro-datasentinel?
You can install kedro-datasentinel with pip:
pip install kedro-datasentinel
For development installation:
pip install --upgrade git+https://github.com/SumzCol/kedro-datasentinel.git
We recommend using a package manager (like conda) to create a virtual environment and to read kedro installation guide.
Getting started
To use kedro-datasentinel in your Kedro project:
- Install the package as described above
- Create a
datasentinel.ymlconfiguration file in your project'sconfdirectory - Configure your datasets with validation rules in your catalog
- Run your Kedro pipeline as usual
Features
Data Validation
kedro-datasentinel provides a flexible framework for validating your data:
- Online Validation: Validate data during pipeline execution
- Offline Validation: Validate data after pipeline execution leveraging the command
datasentinel validate -d <dataset_name> - Custom Checks: Create your own validation checks
- Integration with Data Sentinel: Leverage all the capabilities of Data Sentinel
Audit Logging
Track the execution of your Kedro pipelines with detailed audit logs:
- Node Execution: Log when nodes start, complete, or fail
- Input/Output Tracking: Record which datasets were used as inputs and outputs
- Error Logging: Capture exceptions and error messages
- Multiple Storage Options: Store audit logs in databases, files, or custom stores
Notification System
Get alerted when data quality issues arise:
- Email Notifications: Send emails when validation checks fail
- Custom Notifiers: Create your own notification channels
- Event-Based Triggers: Configure which events trigger notifications
Release and roadmap
The release history centralizes package improvements across time.
Disclaimer
This package is still in active development. We use SemVer principles to version our releases.
Can I contribute?
We'd be happy to receive help to maintain and improve the package. Any PR will be considered (from typo in the docs to core features add-on). Please check the contributing guidelines.
Main contributors
The following people actively maintain, enhance and discuss design to make this package as good as possible:
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file kedro_datasentinel-0.0.1b1-py3-none-any.whl.
File metadata
- Download URL: kedro_datasentinel-0.0.1b1-py3-none-any.whl
- Upload date:
- Size: 19.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
97d13899989a0e9f4fba19befa9c821d5fc47f05add0f26432971420dda3f15b
|
|
| MD5 |
811ce92fe4ccb0661228bdaaf606d38f
|
|
| BLAKE2b-256 |
7b9ae7995b5a6009fcfdde62dd943885fb17776618787ee404bafe6b0677edb4
|
Provenance
The following attestation bundles were made for kedro_datasentinel-0.0.1b1-py3-none-any.whl:
Publisher:
create-release.yml on SumzCol/kedro-datasentinel
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
kedro_datasentinel-0.0.1b1-py3-none-any.whl -
Subject digest:
97d13899989a0e9f4fba19befa9c821d5fc47f05add0f26432971420dda3f15b - Sigstore transparency entry: 450728399
- Sigstore integration time:
-
Permalink:
SumzCol/kedro-datasentinel@1f978ca24b2418dfdffdad57226e4c03e1548a61 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/SumzCol
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
create-release.yml@1f978ca24b2418dfdffdad57226e4c03e1548a61 -
Trigger Event:
push
-
Statement type: