Skip to main content

A Kedro plugin to integrate Data Sentinel in Kedro projects.

Project description

Kedro-DataSentinel

Python version PyPI version License Powered by Kedro

kedro-datasentinel is a kedro-plugin for seamless integration of Data Sentinel capabilities inside kedro projects. It enforces Kedro principles to make data quality and validation as production-ready as possible. Its core functionalities are:

  • Data Validation: kedro-datasentinel enhances data quality for machine learning and data engineering pipelines. With minimal configuration, you can validate your datasets during a kedro run, both online (during pipeline execution) and offline (post-execution).

  • Audit Logging: Track and monitor your pipeline executions with detailed audit logs. This feature provides visibility into your data processing workflows, making it easier to debug issues and ensure compliance.

  • Notification System: Get alerted when data quality issues arise. Configure notifications to be sent through various channels when validation checks fail.

How do I install kedro-datasentinel?

You can install kedro-datasentinel with pip:

pip install kedro-datasentinel

For development installation:

pip install --upgrade git+https://github.com/SumzCol/kedro-datasentinel.git

We recommend using a package manager (like conda) to create a virtual environment and to read kedro installation guide.

Getting started

To use kedro-datasentinel in your Kedro project:

  1. Install the package as described above
  2. Create a datasentinel.yml configuration file in your project's conf directory
  3. Configure your datasets with validation rules in your catalog
  4. Run your Kedro pipeline as usual

Features

Data Validation

kedro-datasentinel provides a flexible framework for validating your data:

  • Online Validation: Validate data during pipeline execution
  • Offline Validation: Validate data after pipeline execution leveraging the command datasentinel validate -d <dataset_name>
  • Custom Checks: Create your own validation checks
  • Integration with Data Sentinel: Leverage all the capabilities of Data Sentinel

Audit Logging

Track the execution of your Kedro pipelines with detailed audit logs:

  • Node Execution: Log when nodes start, complete, or fail
  • Input/Output Tracking: Record which datasets were used as inputs and outputs
  • Error Logging: Capture exceptions and error messages
  • Multiple Storage Options: Store audit logs in databases, files, or custom stores

Notification System

Get alerted when data quality issues arise:

  • Email Notifications: Send emails when validation checks fail
  • Custom Notifiers: Create your own notification channels
  • Event-Based Triggers: Configure which events trigger notifications

Release and roadmap

The release history centralizes package improvements across time.

Disclaimer

This package is still in active development. We use SemVer principles to version our releases.

Can I contribute?

We'd be happy to receive help to maintain and improve the package. Any PR will be considered (from typo in the docs to core features add-on). Please check the contributing guidelines.

Main contributors

The following people actively maintain, enhance and discuss design to make this package as good as possible:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

kedro_datasentinel-0.0.1b2-py3-none-any.whl (19.5 kB view details)

Uploaded Python 3

File details

Details for the file kedro_datasentinel-0.0.1b2-py3-none-any.whl.

File metadata

File hashes

Hashes for kedro_datasentinel-0.0.1b2-py3-none-any.whl
Algorithm Hash digest
SHA256 f918e54697bb15385dc151c3d19f7fcde55536ef64ad263b4b7084206aad32ad
MD5 437b7d7e1f51da98321de1b8852d75bc
BLAKE2b-256 579fc230b57f795263787eede00fe32a4257b20c6b04150fdb0c3034ec2579e6

See more details on using hashes here.

Provenance

The following attestation bundles were made for kedro_datasentinel-0.0.1b2-py3-none-any.whl:

Publisher: create-release.yml on SumzCol/kedro-datasentinel

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page