DataBase Quality Tool
Project description
DBQT (DataBase Quality Tool) 🎯
DBQT is a lightweight, Python-first data quality testing framework that helps data teams maintain high-quality data through automated checks and intelligent suggestions. Powered by Qwen2 0.5B small language model for smart column classification and check recommendations.
🚀 Key Features
- No Backend Required: Run locally with SQLite, or scale up as needed
- AI-Powered: Automatic column classification and check suggestions using Qwen2 0.5B
- Python-First: Native Python API for seamless integration with existing data pipelines
- Extensive Check Library: 20+ built-in data quality checks
- Easy to Deploy: Simple pip install, no Docker needed
- Customizable: Extend with your own custom checks
🛠️ Installation
pip install dbqt
🏃 Quick Start
from dbqt import DBQT
import polars as pl
# Initialize DBQT
dbqt = DBQT()
# Load your data
df = pl.read_csv("your_data.csv")
# Get automatic check suggestions
suggested_checks = dbqt.suggest_checks(df)
# Run checks
results = dbqt.run_checks(df, suggested_checks)
# View results
print(results.summary())
📊 Available Checks
Completeness
not_null: Check for null/missing values
Uniqueness
unique: Single column uniquenessunique_combination: Multi-column uniqueness
Format Validation
regex_match: Pattern matchingdate_format: Valid date formattimestamp_format: Valid timestamp formatemail_format: Valid email formatphone_format: Valid phone number formatnumeric_format: Valid number formatjson_format: Valid JSON structure
Range/Boundary
min_value: Minimum value checkmax_value: Maximum value checkvalue_between: Value range validationno_future_dates: Date not in futuremin_length: Minimum string lengthmax_length: Maximum string length
Value Validation
in_domain: Value from allowed setref_integrity: Referential integritypositive_only: Positive numbers onlyconsistent_casing: Uniform text casing
Statistical
stat_outliers: Statistical outlier detectionvalue_distribution: Distribution analysistrend_check: Trend monitoring
Dependency Checks
dependent_column_check: Validate dependencies between columns
📝 Example Configuration
checks = {
"user_id": [
{"check": "not_null"},
{"check": "unique"},
{"check": "regex_match", "pattern": r"^USER_\d+$"}
],
"email": [
{"check": "not_null"},
{"check": "email_format"},
{"check": "unique"}
],
"age": [
{"check": "numeric_format"},
{"check": "value_between", "min": 0, "max": 120},
{"check": "stat_outliers"}
]
}
🔍 AI-Powered Suggestions
DBQT uses Qwen2 0.5B to:
- Analyze column names and sample data
- Classify column types and purposes
- Suggest appropriate quality checks
- Recommend validation rules
📈 Scaling Up
While DBQT works great with SQLite for smaller datasets, it can be scaled up by:
- Using a production database backend
- Implementing parallel check execution
- Setting up scheduled runs
- Integrating with your data pipeline
🤝 Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
📄 License
This project is licensed under the MIT License.
🙏 Acknowledgments
Inspired by MobyDQ, but reimagined as a lightweight, Python-first solution with AI capabilities.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dbqt-0.1.2.tar.gz.
File metadata
- Download URL: dbqt-0.1.2.tar.gz
- Upload date:
- Size: 14.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.5 CPython/3.8.20 Darwin/24.3.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f0a1d6ee1dba9f3af05a07e2dc32981371affeb89e586e1c4406b135669c44b3
|
|
| MD5 |
e28110d8178c2d37726b80788e3d8106
|
|
| BLAKE2b-256 |
66d14fb58f91d1e67caa816de0b046de1840e71b9c16dbdffef8cb9a384b6919
|
File details
Details for the file dbqt-0.1.2-py3-none-any.whl.
File metadata
- Download URL: dbqt-0.1.2-py3-none-any.whl
- Upload date:
- Size: 16.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.5 CPython/3.8.20 Darwin/24.3.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a8e01135b2cc116f0961ba7bb24be0c7e217b6e1fa244c76ca3018383da5b0c1
|
|
| MD5 |
3ab9391e271dff4fa2485740354d04c1
|
|
| BLAKE2b-256 |
1738e9311068e995450b6eea1128bd4bbe82fc9620aac041dac01e403079297a
|