Skip to main content

DataBase Quality Tool

Project description

DBQT (DataBase Quality Tool) 🎯

DBQT is a lightweight, Python-first data quality testing framework that helps data teams maintain high-quality data through automated checks and intelligent suggestions.

🛠️ Current Tools

Column Comparison Tool (dbqt compare)

Compare database schemas between source and target databases:

  • Table-level comparison
  • Column-level comparison with data type compatibility checks
  • Generates detailed Excel report with:
    • Table differences
    • Column differences
    • Data type mismatches
    • Formatted worksheets for easy analysis

Usage:

dbqt compare source_schema.csv target_schema.csv

To generate the required CSV schema files from your database, run this query:

SELECT
    upper(table_schema) as sch,
    upper(table_name) as name,
    upper(column_name) as col_name,
    upper(data_type) as data_type,
    ordinal_position
FROM information_schema.columns
where table_schema = 'YOUR_SCHEMA'
order by table_name, ordinal_position;

Export the results to CSV format to use with the compare tool.

Database Statistics Tool (dbqt dbstats)

Collect and analyze database statistics:

  • Table row counts
  • Updates statistics in CSV format
  • Configurable through YAML

Usage:

dbqt dbstats config.yaml

Example config.yaml:

# Database connection configuration
connection:
  type: mysql  # mysql, snowflake, duckdb, csv, parquet, s3parquet
  host: localhost
  user: myuser
  password: mypassword
  database: mydb
  # Optional AWS configs for s3parquet
  # aws_profile: default
  # aws_region: us-west-2
  # bucket: my-bucket

  # Snowflake-specific configs
  # type: snowflake
  # account: your_account.region
  # warehouse: YOUR_WAREHOUSE
  # database: YOUR_DB
  # schema: YOUR_SCHEMA
  # role: YOUR_ROLE
  # authenticator: externalbrowser  # Optional: use SSO authentication
  # user: your_username
  # password: your_password  # Not needed if using externalbrowser auth

# Path to CSV file containing table names to analyze
tables_file: tables.csv

The tables.csv file should contain at minimum a table_name column. The tool will add/update a row_count column with the results.

🚀 Future Plans

Core DBQT Features (Coming Soon)

  • AI-Powered column classification using Qwen2 0.5B
  • Automatic check suggestions
  • 20+ built-in data quality checks
  • Python-first API
  • No backend required
  • Customizable check framework

Planned Checks

  • Completeness checks (null values)
  • Uniqueness validation
  • Format validation (regex, dates, emails)
  • Range/boundary checks
  • Value validation
  • Statistical analysis
  • Dependency checks

Integration Plans

  • Data pipeline integration
  • Scheduled runs
  • Parallel check execution
  • Multiple database backend support

📄 License

This project is licensed under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dbqt-0.1.4.tar.gz (10.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dbqt-0.1.4-py3-none-any.whl (10.3 kB view details)

Uploaded Python 3

File details

Details for the file dbqt-0.1.4.tar.gz.

File metadata

  • Download URL: dbqt-0.1.4.tar.gz
  • Upload date:
  • Size: 10.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.5 CPython/3.8.20 Darwin/24.3.0

File hashes

Hashes for dbqt-0.1.4.tar.gz
Algorithm Hash digest
SHA256 5c155c414a78a4ee341dca1cebaf6a24846985d04601d5bd4f80cbe9cfd60f5e
MD5 cbe2fc2998f566b4e9c67499dd481dfa
BLAKE2b-256 50e7cd65cc644795e4a4581f78a9cd12aa7265d88dfa3a39e55c1f017ee054d6

See more details on using hashes here.

File details

Details for the file dbqt-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: dbqt-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 10.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.5 CPython/3.8.20 Darwin/24.3.0

File hashes

Hashes for dbqt-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 463f7d96574687b6d7e2e9e03e312ccd2a842aa1be133582adb19284ff4165eb
MD5 ecd5cbf949ead97e7b3f019c5b214f8a
BLAKE2b-256 e7df0e3c32ee242b255a53339bf282e816c6486d5e4d31cc0bad1c5ca5d0f226

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page