Sentri - a production-ready, configurable data quality validation framework
Project description
Sentri
Sentri is a production-ready, configurable data quality validation framework with 10 check types, multiple data connectors, and flexible output formats.
Features
- 10 Check Types: Completeness, Uniqueness, Range, Turnover, Value Spike, Frequency, Correlation, Statistical, Distribution, Drift
- Multiple Connectors: CSV, Oracle, Snowflake (extensible)
- Flexible Configuration: YAML with environment variable support
- Multiple Output Formats: JSON, HTML, CSV, DataFrame
- Threshold-Based Validation: Critical, Warning, Pass, Error states
- Comprehensive Logging: Text and JSON formats
Installation
pip install sentri
# Optional: development extras when working on the project itself
pip install -e ".[dev]" # Development install
pip install -e ".[all]" # With all database connectors
Quick Start
Programmatic Usage
from sentri import DataQualityFramework
from sentri.checks import CompletenessCheck, TurnoverCheck
from sentri.connectors import OracleConnector, SnowflakeConnector
# Example: using DataQualityFramework with a config file
framework = DataQualityFramework(config_path="config.yaml")
results = framework.run_checks(start_date="2025-01-01", end_date="2025-01-31")
Configuration File Usage
# config.yaml
source:
type: csv
csv:
file_path: /data/sample.csv
date_column: effective_date
metadata:
dq_check_name: "Sample Check"
date_column: effective_date
id_column: entity_id
checks:
completeness:
value:
thresholds:
absolute_critical: 0.05
absolute_warning: 0.02
description: "Value completeness"
output:
formats: [json, html, csv]
destination: /output
Check Types
| Check Type | Description |
|---|---|
| Completeness | Monitor null/missing values |
| Uniqueness | Detect duplicate values |
| Range | Validate value bounds |
| Turnover | Track ID additions/removals |
| Value Spike | Detect abnormal value changes |
| Frequency | Monitor category distributions |
| Statistical | Track mean, std, median, etc. |
| Correlation | Validate temporal/cross-column correlation |
| Distribution | Detect distribution shifts (KS test) |
| Drift | Identify gradual drift (PSI) |
Project Structure
dq_framework/
├── src/data_quality/
│ ├── checks/ # Check implementations
│ ├── connectors/ # Data connectors
│ ├── core/ # Exceptions, config, framework
│ ├── formatters/ # Output formatters
│ ├── managers/ # Check manager
│ └── utils/ # Logger, constants
├── tests/ # Unit tests
├── examples/ # Sample configs and scripts
└── pyproject.toml
Running Tests
pytest tests/unit/ -v --cov
Thresholds
Each check supports:
absolute_critical: Fails if exceededabsolute_warning: Warns if exceededdelta_critical/delta_warning: For change-based thresholds
Output
Results include:
- Summary: Total, passed, warnings, failed, pass rate
- Details: Check type, column, date, metric value, status
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
sentri-1.0.0.tar.gz
(34.1 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
sentri-1.0.0-py3-none-any.whl
(50.3 kB
view details)
File details
Details for the file sentri-1.0.0.tar.gz.
File metadata
- Download URL: sentri-1.0.0.tar.gz
- Upload date:
- Size: 34.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5460956d3956983c1cb209424ecdd2779192e9216b832ce0d5ea2ba7d817d852
|
|
| MD5 |
15158962bb89278f6dcf7db1d1e8debf
|
|
| BLAKE2b-256 |
86bbf16a68b1e6a86b6ec9890dd9a3458804af66871d2f15f0eea3ab346842b2
|
File details
Details for the file sentri-1.0.0-py3-none-any.whl.
File metadata
- Download URL: sentri-1.0.0-py3-none-any.whl
- Upload date:
- Size: 50.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0f6b32656552918c213ea074342a25b4f3bda3ae280a8359ff573d156dc4eccf
|
|
| MD5 |
fa530d39cdc3210c09d95238c91a4e1a
|
|
| BLAKE2b-256 |
05ab2b274574b07da723c86d69d977456fcbd3dfde29e2b6dfaab2f0fda15234
|