Skip to main content

DataBase Quality Tool

Project description

DBQT (DataBase Quality Tool) 🎯

DBQT is a lightweight, Python-first data quality testing framework that helps data teams maintain high-quality data through automated checks and intelligent suggestions. Powered by Qwen2 0.5B small language model for smart column classification and check recommendations.

License: MIT

🚀 Key Features

  • No Backend Required: Run locally with SQLite, or scale up as needed
  • AI-Powered: Automatic column classification and check suggestions using Qwen2 0.5B
  • Python-First: Native Python API for seamless integration with existing data pipelines
  • Extensive Check Library: 20+ built-in data quality checks
  • Easy to Deploy: Simple pip install, no Docker needed
  • Customizable: Extend with your own custom checks

🛠️ Installation

pip install dbqt

🏃 Quick Start

from dbqt import DBQT
import polars as pl

# Initialize DBQT
dbqt = DBQT()

# Load your data
df = pl.read_csv("your_data.csv")

# Get automatic check suggestions
suggested_checks = dbqt.suggest_checks(df)

# Run checks
results = dbqt.run_checks(df, suggested_checks)

# View results
print(results.summary())

📊 Available Checks

Completeness

  • not_null: Check for null/missing values

Uniqueness

  • unique: Single column uniqueness
  • unique_combination: Multi-column uniqueness

Format Validation

  • regex_match: Pattern matching
  • date_format: Valid date format
  • timestamp_format: Valid timestamp format
  • email_format: Valid email format
  • phone_format: Valid phone number format
  • numeric_format: Valid number format
  • json_format: Valid JSON structure

Range/Boundary

  • min_value: Minimum value check
  • max_value: Maximum value check
  • value_between: Value range validation
  • no_future_dates: Date not in future
  • min_length: Minimum string length
  • max_length: Maximum string length

Value Validation

  • in_domain: Value from allowed set
  • ref_integrity: Referential integrity
  • positive_only: Positive numbers only
  • consistent_casing: Uniform text casing

Statistical

  • stat_outliers: Statistical outlier detection
  • value_distribution: Distribution analysis
  • trend_check: Trend monitoring

Dependency Checks

  • dependent_column_check: Validate dependencies between columns

📝 Example Configuration

checks = {
    "user_id": [
        {"check": "not_null"},
        {"check": "unique"},
        {"check": "regex_match", "pattern": r"^USER_\d+$"}
    ],
    "email": [
        {"check": "not_null"},
        {"check": "email_format"},
        {"check": "unique"}
    ],
    "age": [
        {"check": "numeric_format"},
        {"check": "value_between", "min": 0, "max": 120},
        {"check": "stat_outliers"}
    ]
}

🔍 AI-Powered Suggestions

DBQT uses Qwen2 0.5B to:

  • Analyze column names and sample data
  • Classify column types and purposes
  • Suggest appropriate quality checks
  • Recommend validation rules

📈 Scaling Up

While DBQT works great with SQLite for smaller datasets, it can be scaled up by:

  • Using a production database backend
  • Implementing parallel check execution
  • Setting up scheduled runs
  • Integrating with your data pipeline

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

📄 License

This project is licensed under the MIT License.

🙏 Acknowledgments

Inspired by MobyDQ, but reimagined as a lightweight, Python-first solution with AI capabilities.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dbqt-0.1.2.tar.gz (14.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dbqt-0.1.2-py3-none-any.whl (16.6 kB view details)

Uploaded Python 3

File details

Details for the file dbqt-0.1.2.tar.gz.

File metadata

  • Download URL: dbqt-0.1.2.tar.gz
  • Upload date:
  • Size: 14.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.5 CPython/3.8.20 Darwin/24.3.0

File hashes

Hashes for dbqt-0.1.2.tar.gz
Algorithm Hash digest
SHA256 f0a1d6ee1dba9f3af05a07e2dc32981371affeb89e586e1c4406b135669c44b3
MD5 e28110d8178c2d37726b80788e3d8106
BLAKE2b-256 66d14fb58f91d1e67caa816de0b046de1840e71b9c16dbdffef8cb9a384b6919

See more details on using hashes here.

File details

Details for the file dbqt-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: dbqt-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 16.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.5 CPython/3.8.20 Darwin/24.3.0

File hashes

Hashes for dbqt-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 a8e01135b2cc116f0961ba7bb24be0c7e217b6e1fa244c76ca3018383da5b0c1
MD5 3ab9391e271dff4fa2485740354d04c1
BLAKE2b-256 1738e9311068e995450b6eea1128bd4bbe82fc9620aac041dac01e403079297a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page