DataBase Quality Tool
Project description
DBQT (DataBase Quality Tool) 🎯
DBQT is a lightweight, Python-first data quality testing framework that helps data teams maintain high-quality data through automated checks and intelligent suggestions.
🛠️ Current Tools
Column Comparison Tool (dbqt compare)
Compare database schemas between source and target databases:
- Table-level comparison
- Column-level comparison with data type compatibility checks
- Generates detailed Excel report with:
- Table differences
- Column differences
- Data type mismatches
- Formatted worksheets for easy analysis
Usage:
dbqt compare source_schema.csv target_schema.csv
To generate the required CSV schema files from your database, run this query:
SELECT
upper(table_schema) as sch,
upper(table_name) as name,
upper(column_name) as col_name,
upper(data_type) as data_type,
ordinal_position
FROM information_schema.columns
where table_schema = 'YOUR_SCHEMA'
order by table_name, ordinal_position;
Export the results to CSV format to use with the compare tool.
Database Statistics Tool (dbqt dbstats)
Collect and analyze database statistics:
- Table row counts
- Updates statistics in CSV format
- Configurable through YAML
Usage:
dbqt dbstats config.yaml
Example config.yaml:
# Database connection configuration
connection:
type: mysql # mysql, snowflake, duckdb, csv, parquet, s3parquet
host: localhost
user: myuser
password: mypassword
database: mydb
# Optional AWS configs for s3parquet
# aws_profile: default
# aws_region: us-west-2
# bucket: my-bucket
# Snowflake-specific configs
# type: snowflake
# account: your_account.region
# warehouse: YOUR_WAREHOUSE
# database: YOUR_DB
# schema: YOUR_SCHEMA
# role: YOUR_ROLE
# authenticator: externalbrowser # Optional: use SSO authentication
# user: your_username
# password: your_password # Not needed if using externalbrowser auth
# Path to CSV file containing table names to analyze
tables_file: tables.csv
The tables.csv file should contain at minimum a table_name column. The tool will add/update a row_count column with the results.
🚀 Future Plans
Core DBQT Features (Coming Soon)
- AI-Powered column classification using Qwen2 0.5B
- Automatic check suggestions
- 20+ built-in data quality checks
- Python-first API
- No backend required
- Customizable check framework
Planned Checks
- Completeness checks (null values)
- Uniqueness validation
- Format validation (regex, dates, emails)
- Range/boundary checks
- Value validation
- Statistical analysis
- Dependency checks
Integration Plans
- Data pipeline integration
- Scheduled runs
- Parallel check execution
- Multiple database backend support
📄 License
This project is licensed under the MIT License.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dbqt-0.1.3.tar.gz.
File metadata
- Download URL: dbqt-0.1.3.tar.gz
- Upload date:
- Size: 10.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.5 CPython/3.8.20 Darwin/24.3.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f50db3ee0990202924a3f713e5843ac8b3b14441aba726c38f7292e58f76886d
|
|
| MD5 |
bab10041ee8cae8bb342ea6766e989d0
|
|
| BLAKE2b-256 |
c015540eb1dd9f5d5826f24fb01af8e6f21745879b2c0264407d9672e88f9aa8
|
File details
Details for the file dbqt-0.1.3-py3-none-any.whl.
File metadata
- Download URL: dbqt-0.1.3-py3-none-any.whl
- Upload date:
- Size: 10.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.5 CPython/3.8.20 Darwin/24.3.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
67cce34bbef8cb8c68a57816ea0784f982c5e64faf5f4bb8965b52a034c9ad6d
|
|
| MD5 |
b469498ab340d1238974cacaf57c6094
|
|
| BLAKE2b-256 |
79d8ee69e8f90848b0432ee098402d19f97e94550d681518d6c07fe7393e76a7
|