Skip to main content

Framework-agnostic validation engine for Data Quality Language (DQL)

Project description

dql-core

CI PyPI Python License

Framework-agnostic validation engine for Data Quality Language (DQL).

Documentation | PyPI | GitHub

dql-core provides the abstract validation, cleaner, and executor framework that can be implemented for any Python framework (Django, Flask, FastAPI, SQLAlchemy, Pandas, etc.). It handles the business logic of DQL validation without being tied to any specific data access layer.

Installation

pip install dql-core

Quick Start

Creating a Custom Executor

from dql_core import ValidationExecutor
from dql_parser import parse_dql

class MyExecutor(ValidationExecutor):
    """Custom executor for your framework."""

    def get_records(self, model_name: str):
        # Return records from your data source
        return my_database.query(model_name).all()

    def filter_records(self, records, condition):
        # Apply filtering logic
        return [r for r in records if self.evaluate_condition(r, condition)]

    def count_records(self, records):
        return len(list(records))

    def get_field_value(self, record, field_name: str):
        # Get field value from your record type
        return getattr(record, field_name)

# Parse DQL and execute validation
dql_text = """
from Customer
expect column("email") to_not_be_null
expect column("age") to_be_between(18, 100)
"""

ast = parse_dql(dql_text)
executor = MyExecutor()
result = executor.execute(ast)

print(f"Validation passed: {result.overall_passed}")
print(f"Expectations: {result.total_expectations}")
print(f"Failed: {result.failed_expectations}")

Features

Abstract Validation Framework

  • 6 Built-in Validators: to_be_null, to_not_be_null, to_match_pattern, to_be_between, to_be_in, to_be_unique
  • Custom Validators: Extend Validator base class
  • Validator Registry: Register validators for operators
  • Framework-Agnostic: Works with any data source (Django ORM, SQLAlchemy, Pandas, raw dicts)

Abstract Executor

  • Template Method Pattern: Implement 4 abstract methods, get full validation logic
  • Multi-Model Support: Validate multiple models in one DQL file
  • Rich Results: Detailed validation results with failure info
  • Severity Levels: Support for critical, warning, info

Cleaner Framework (Stories 2.4-2.8)

Cleaners automatically remediate data quality issues when expectations fail.

  • 8 Built-in Cleaners: String normalization, phone/date formatting, NULL handling
  • Custom Cleaners: Build your own with @cleaner decorator
  • Cleaner Chains: Execute multiple cleaners sequentially
  • Transaction Safety: Automatic rollback on failure
  • Audit Logging: Track all modifications
  • Dry-Run Mode: Preview changes before applying

Quick Example:

from dql_core import normalize_email, CleanerChain, SafeCleanerExecutor

# Single cleaner
record = {'email': '  [email protected]  '}
cleaner = normalize_email('email')
result = cleaner(record, {})
print(record['email'])  # '[email protected]'

# Cleaner chain
chain = (CleanerChain()
    .add('trim_whitespace', 'email')
    .add('lowercase', 'email'))
result = chain.execute(record, {})

# Transaction safety
from dql_core import DictTransactionManager
manager = DictTransactionManager()
executor = SafeCleanerExecutor(manager)
result = executor.execute_cleaners(cleaners, record, {})
# Automatic rollback if any cleaner fails

Documentation:

External API Adapters

  • Adapter Pattern: Create adapters for external APIs
  • Rate Limiting: Built-in rate limiter
  • Retry Logic: Exponential backoff retry utilities
  • Factory Pattern: APIAdapterFactory for creating adapters

Built-in Validators

to_be_null / to_not_be_null

expect column("optional_field") to_be_null
expect column("required_field") to_not_be_null

to_match_pattern

expect column("email") to_match_pattern("^[\\w\\.-]+@[\\w\\.-]+\\.\\w+$")
expect column("phone") to_match_pattern("^\\d{3}-\\d{3}-\\d{4}$")

to_be_between

expect column("age") to_be_between(18, 120)
expect column("price") to_be_between(0.0, 9999.99)

to_be_in

expect column("status") to_be_in("active", "inactive", "pending")
expect column("category") to_be_in("A", "B", "C")

to_be_unique

expect column("email") to_be_unique
expect column("username") to_be_unique

Development

# Install dev dependencies
pip install -e ".[dev]"

# Run tests
pytest

# Format code
black dql_core tests

License

MIT License - see LICENSE file for details.

Documentation

Full documentation: https://yourusername.github.io/dql-core/

Related Packages

Package Selection

Not sure which package to use? See the Package Selection Guide

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dql_core-0.5.2.tar.gz (106.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dql_core-0.5.2-py3-none-any.whl (48.1 kB view details)

Uploaded Python 3

File details

Details for the file dql_core-0.5.2.tar.gz.

File metadata

  • Download URL: dql_core-0.5.2.tar.gz
  • Upload date:
  • Size: 106.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.13

File hashes

Hashes for dql_core-0.5.2.tar.gz
Algorithm Hash digest
SHA256 f75dae6f28cee59ca86c61eeccf354bd7ccc67810b4c7b917ae1f024eadf981e
MD5 d1c868a9e1b6f5fa65cfd59d07c86798
BLAKE2b-256 a26c5e6abf7d2b27a631c121ca61a56e14304476ea37430e92c5b9788a83301b

See more details on using hashes here.

File details

Details for the file dql_core-0.5.2-py3-none-any.whl.

File metadata

  • Download URL: dql_core-0.5.2-py3-none-any.whl
  • Upload date:
  • Size: 48.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.13

File hashes

Hashes for dql_core-0.5.2-py3-none-any.whl
Algorithm Hash digest
SHA256 94e13b818bf08bb22247d7f69fd6df86df3a6ac3e159f847cc2c752fccedc01b
MD5 2b8107bf174bbf46e1e6120739325c3f
BLAKE2b-256 197495839b74c573235e0038e489acccea2988e49d03bbab2df46ac9cf412a67

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page