Skip to main content

Sentri - a production-ready, configurable data quality validation framework

Project description

Sentri

Sentri is a production-ready, configurable data quality validation framework with 10 check types, multiple data connectors, and flexible output formats.

Features

  • 10 Check Types: Completeness, Uniqueness, Range, Turnover, Value Spike, Frequency, Correlation, Statistical, Distribution, Drift
  • Multiple Connectors: CSV, Oracle, Snowflake (extensible)
  • Flexible Configuration: YAML with environment variable support
  • Multiple Output Formats: JSON, HTML, CSV, DataFrame
  • Threshold-Based Validation: Critical, Warning, Pass, Error states
  • Comprehensive Logging: Text and JSON formats

Installation

pip install sentri

# Optional: development extras when working on the project itself
pip install -e ".[dev]"  # Development install
pip install -e ".[all]"  # With all database connectors

Quick Start

Programmatic Usage

from sentri import DataQualityFramework
from sentri.checks import CompletenessCheck, TurnoverCheck
from sentri.connectors import OracleConnector, SnowflakeConnector

# Example: using DataQualityFramework with a config file
framework = DataQualityFramework(config_path="config.yaml")
results = framework.run_checks(start_date="2025-01-01", end_date="2025-01-31")

Configuration File Usage

# config.yaml
source:
  type: csv
  csv:
    file_path: /data/sample.csv
    date_column: effective_date

metadata:
  dq_check_name: "Sample Check"
  date_column: effective_date
  id_column: entity_id

checks:
  completeness:
    value:
      thresholds:
        absolute_critical: 0.05
        absolute_warning: 0.02
      description: "Value completeness"

output:
  formats: [json, html, csv]
  destination: /output

Check Types

Check Type Description
Completeness Monitor null/missing values
Uniqueness Detect duplicate values
Range Validate value bounds
Turnover Track ID additions/removals
Value Spike Detect abnormal value changes
Frequency Monitor category distributions
Statistical Track mean, std, median, etc.
Correlation Validate temporal/cross-column correlation
Distribution Detect distribution shifts (KS test)
Drift Identify gradual drift (PSI)

Project Structure

dq_framework/
├── src/data_quality/
│   ├── checks/           # Check implementations
│   ├── connectors/       # Data connectors
│   ├── core/             # Exceptions, config, framework
│   ├── formatters/       # Output formatters
│   ├── managers/         # Check manager
│   └── utils/            # Logger, constants
├── tests/                # Unit tests
├── examples/             # Sample configs and scripts
└── pyproject.toml

Running Tests

pytest tests/unit/ -v --cov

Thresholds

Each check supports:

  • absolute_critical: Fails if exceeded
  • absolute_warning: Warns if exceeded
  • delta_critical/delta_warning: For change-based thresholds

Output

Results include:

  • Summary: Total, passed, warnings, failed, pass rate
  • Details: Check type, column, date, metric value, status

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sentri-1.0.0.tar.gz (34.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sentri-1.0.0-py3-none-any.whl (50.3 kB view details)

Uploaded Python 3

File details

Details for the file sentri-1.0.0.tar.gz.

File metadata

  • Download URL: sentri-1.0.0.tar.gz
  • Upload date:
  • Size: 34.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.1

File hashes

Hashes for sentri-1.0.0.tar.gz
Algorithm Hash digest
SHA256 5460956d3956983c1cb209424ecdd2779192e9216b832ce0d5ea2ba7d817d852
MD5 15158962bb89278f6dcf7db1d1e8debf
BLAKE2b-256 86bbf16a68b1e6a86b6ec9890dd9a3458804af66871d2f15f0eea3ab346842b2

See more details on using hashes here.

File details

Details for the file sentri-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: sentri-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 50.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.1

File hashes

Hashes for sentri-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0f6b32656552918c213ea074342a25b4f3bda3ae280a8359ff573d156dc4eccf
MD5 fa530d39cdc3210c09d95238c91a4e1a
BLAKE2b-256 05ab2b274574b07da723c86d69d977456fcbd3dfde29e2b6dfaab2f0fda15234

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page