Skip to main content

AI-Native Data Governance: TypeScript for Databases

Project description

GoQuality CLI

AI-Native Data Governance: TypeScript for Databases

GoQuality brings type safety to your data. Define types once, validate everywhere. Let AI generate the types, you govern the rules.

┌─────────────────────────────────────────────────────────────────┐
│  Database  →  AI Inference  →  YAML Types  →  Validation  →  ✓ │
│                                                                 │
│  "email"      Email           pattern: ^...   99.8% valid      │
│  "amount"     USD             min: 0          100% valid       │
│  "status"     OrderStatus     enum: [...]     98.2% valid      │
└─────────────────────────────────────────────────────────────────┘

Installation

# Basic installation (includes PostgreSQL and DuckDB)
pip install goquality

# Cloud Data Warehouses
pip install goquality[snowflake]     # Snowflake
pip install goquality[bigquery]      # Google BigQuery
pip install goquality[databricks]    # Databricks SQL & Unity Catalog

# Traditional Databases
pip install goquality[mysql]         # MySQL / MariaDB
pip install goquality[mssql]         # Microsoft SQL Server / Azure SQL

# All database connectors
pip install goquality[all-connectors]

# Development installation
pip install goquality[dev]

Quick Start

# 1. Initialize a new project
goquality init

# 2. Generate types from your database using AI
goquality generate --source postgres://user:pass@localhost/mydb

# 3. Review and edit the generated goquality.yaml

# 4. Run validation checks
goquality check --source postgres://user:pass@localhost/mydb

# 5. Diagnose any issues
goquality doctor --source postgres://user:pass@localhost/mydb

Commands

goquality init

Initialize a new GoQuality configuration file.

goquality init [OPTIONS]

Options:

Option Short Description
--source -s Database connection string to test
--path -p Path for configuration file (default: goquality.yaml)

Examples:

# Create default config
goquality init

# Create config and test database connection
goquality init --source postgres://localhost/mydb

# Create config at custom path
goquality init --path config/goquality.yaml

goquality generate

Generate type mappings using AI inference. Profiles your database schema and uses an LLM to suggest appropriate types for each column.

goquality generate [OPTIONS]

Options:

Option Short Description
--source -s Database connection string (required)
--output -o Output file path (default: goquality.yaml)
--schema Database schema to profile
--provider LLM provider: openai, anthropic, ollama (default: openai)

Environment Variables:

  • OPENAI_API_KEY - Required for OpenAI provider
  • ANTHROPIC_API_KEY - Required for Anthropic provider
  • OLLAMA_HOST - Ollama server URL (default: http://localhost:11434)

Examples:

# Generate using OpenAI (default)
goquality generate --source postgres://localhost/mydb

# Generate using Anthropic Claude
goquality generate --source postgres://localhost/mydb --provider anthropic

# Generate for specific schema
goquality generate --source postgres://localhost/mydb --schema public

# Generate using local Ollama
OLLAMA_HOST=http://localhost:11434 goquality generate \
  --source postgres://localhost/mydb \
  --provider ollama

goquality check

Run validation checks against your database. This is the core command that validates your data against the defined types.

goquality check [OPTIONS]

Options:

Option Short Description
--config -c Configuration file path (default: goquality.yaml)
--source -s Database connection string
--table -t Only check this specific table
--output -o Output format: table, json, yaml, csv, markdown, junit
--fail-threshold Percentage of failures allowed (0-100)
--fail-on-error/--no-fail-on-error Exit with error code on failures (default: true)
--quiet -q Only show errors and summary
--skip-references Skip reference (FK) validation
--skip-contracts Skip contract validation
--skip-freshness Skip freshness validation
--skip-volume Skip volume (row count) validation
--only Only run specific validation: types, references, contracts, freshness, volume
--parallel Run table validations in parallel for faster execution
--workers Number of parallel workers (default: 4, only used with --parallel)
--sample-size Validate only N random rows per table (for large tables)
--sample-percent Validate only X% of rows per table (0.0-100.0)
--notify/--no-notify Send notifications configured in goquality.yaml (default: true)
--webhook Send results to this webhook URL
--slack-webhook Send Slack notification to this webhook URL

Exit Codes:

  • 0 - All checks passed (or within threshold)
  • 1 - Validation failures detected (above threshold)

Examples:

# Basic check
goquality check --source postgres://localhost/mydb

# Check specific table
goquality check --source postgres://localhost/mydb --table users

# Output as JSON (for CI/CD pipelines)
goquality check --source postgres://localhost/mydb --output json

# Output as JUnit XML (for CI/CD test reporting)
goquality check --source postgres://localhost/mydb --output junit > results.xml

# Allow up to 5% failures
goquality check --source postgres://localhost/mydb --fail-threshold 5

# Generate markdown report
goquality check --source postgres://localhost/mydb --output markdown > report.md

# Send results to a webhook
goquality check --source postgres://localhost/mydb --webhook https://your-api.com/results

# Send Slack notification
goquality check --source postgres://localhost/mydb --slack-webhook https://hooks.slack.com/services/xxx

# Quiet mode for scripts
goquality check --source postgres://localhost/mydb --quiet

# Don't fail on errors (always exit 0)
goquality check --source postgres://localhost/mydb --no-fail-on-error

# Run in parallel for faster validation of many tables
goquality check --source postgres://localhost/mydb --parallel --workers 8

# Only run type validation (skip references and contracts)
goquality check --source postgres://localhost/mydb --only types

# Only run reference/FK validation
goquality check --source postgres://localhost/mydb --only references

# Only run freshness checks
goquality check --source postgres://localhost/mydb --only freshness

# Only run volume checks
goquality check --source postgres://localhost/mydb --only volume

# Use sampling for large tables (10,000 rows per table)
goquality check --source postgres://localhost/mydb --sample-size 10000

# Sample 1% of each table for quick validation
goquality check --source postgres://localhost/mydb --sample-percent 1

# Save metrics to JSON file
goquality check --source postgres://localhost/mydb --metrics-file metrics.json

# Enable performance profiling
goquality check --source postgres://localhost/mydb --profile

# Both metrics and profiling
goquality check --source postgres://localhost/mydb --metrics-file metrics.json --profile

Output Formats:

Table (default):

┏━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Column   ┃ Type   ┃ Rows   ┃ Valid % ┃ Status   ┃ Details       ┃
┡━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ email    │ Email  │ 10,000 │ 99.8%   │ ✓ PASS   │               │
│ status   │ Status │ 10,000 │ 98.2%   │ ✗ FAIL   │ 180 invalid   │
└──────────┴────────┴────────┴─────────┴──────────┴───────────────┘

JSON:

{
  "summary": {
    "total_checks": 5,
    "passed": 4,
    "failed": 1,
    "failure_rate": 20.0,
    "threshold": 0.0,
    "threshold_passed": false
  },
  "tables": [...]
}

JUnit XML:

<?xml version="1.0" encoding="UTF-8"?>
<testsuites name="GoQuality Data Validation" tests="5" failures="1">
  <testsuite name="public.users" tests="3" failures="0">
    <testcase name="email (Email)" classname="public.users"/>
    <testcase name="id (UUID)" classname="public.users"/>
  </testsuite>
  <testsuite name="public.orders" tests="2" failures="1">
    <testcase name="total (USD)" classname="public.orders">
      <failure message="3 rows failed validation" type="ValidationError">
        Column: total
        Type: USD
        Invalid rows: 3 (0.03%)
      </failure>
    </testcase>
  </testsuite>
</testsuites>

goquality validate

Validate configuration file syntax without connecting to a database.

goquality validate [OPTIONS]

Options:

Option Short Description
--config -c Configuration file to validate (default: goquality.yaml)

Examples:

# Validate default config
goquality validate

# Validate specific config
goquality validate --config staging.yaml

goquality types

List and search available types in the standard library.

goquality types [OPTIONS]

Options:

Option Short Description
--search -s Search types by name or description
--tag -t Filter by tag
--show Show details for a specific type

Examples:

# List all types
goquality types

# Search for email types
goquality types --search email

# Filter by tag
goquality types --tag finance
goquality types --tag healthcare
goquality types --tag regional

# Show type details
goquality types --show Email
goquality types --show CreditCardNumber

Available Tags:

  • core - Basic string/number types
  • finance - Currency, banking, payments
  • healthcare - Medical codes, identifiers
  • ecommerce - Products, orders, shipping
  • saas - API keys, tokens, SaaS identifiers
  • regional - Country-specific formats
  • analytics - Metrics, percentages, scores
  • iot - Sensors, devices, protocols
  • pii - Personally identifiable information

goquality doctor

Diagnose your GoQuality environment and configuration.

goquality doctor [OPTIONS]

Options:

Option Short Description
--config -c Configuration file to check
--source -s Database connection to test
--verbose -v Show detailed information

Checks Performed:

  • Python version compatibility
  • Core dependencies installed
  • Database drivers available
  • LLM providers configured
  • Type library loading
  • Configuration file validity
  • Database connectivity
  • Environment variables

Examples:

# Basic diagnostics
goquality doctor

# Check with database connection
goquality doctor --source postgres://localhost/mydb

# Verbose output
goquality doctor --verbose

goquality stats

Show statistics about the type library and configuration.

goquality stats [OPTIONS]

Options:

Option Short Description
--config -c Configuration file path

Examples:

goquality stats

goquality version

Show version information.

goquality version

goquality connections

Manage database connections configured in goquality.toml.

# List configured connections
goquality connections list

# Test a specific connection
goquality connections test dev

# Test all connections
goquality connections test-all

# Show connection details (credentials masked)
goquality connections show dev

goquality config

Manage project configuration (goquality.toml).

# Create a new goquality.toml
goquality config init

# Show current configuration
goquality config show

# Validate configuration
goquality config validate

# Show config file path
goquality config path

Configuration File

GoQuality uses YAML configuration files. The default file is goquality.yaml.

Full Example

# GoQuality Configuration
# https://goquality.dev/docs

# Custom type definitions (extend or override stdlib)
types:
  # Simple type with pattern
  - name: EmployeeId
    description: "Internal employee identifier"
    base: String
    pattern: "^EMP-[0-9]{6}$"
    min_length: 10
    max_length: 10

  # Type extending stdlib
  - name: CorporateEmail
    description: "Company email address"
    base: String
    extends: Email
    pattern: "^[a-z.]+@acme\\.com$"

  # Numeric type with range
  - name: DiscountPercent
    description: "Discount percentage"
    base: Decimal
    min: 0
    max: 100
    precision: 2

  # Enum type
  - name: Department
    description: "Company department"
    base: String
    enum: ["engineering", "sales", "marketing", "hr", "finance"]

  # Type with uniqueness constraint
  - name: ProductSKU
    description: "Unique product SKU"
    base: String
    pattern: "^[A-Z]{2}-[0-9]{6}$"
    unique: true

# Model mappings (table → column types)
models:
  - table: public.users
    columns:
      - name: id
        type: UUID
      - name: email
        type: CorporateEmail
      - name: employee_id
        type: EmployeeId
      - name: department
        type: Department
      - name: created_at
        type: Timestamp
    # Volume check: ensure table isn't empty
    volume:
      min_rows: 1

  - table: public.orders
    columns:
      - name: id
        type: UUID
      - name: user_id
        type: UUID
      - name: total_amount
        type: USD
      - name: discount
        type: DiscountPercent
        allow_null: true  # Override type's nullability
      - name: status
        type: OrderStatus
    # Freshness check: ensure recent data
    freshness:
      column: created_at
      warn_after:
        hours: 1
      error_after:
        hours: 6
    # Volume check: bounded row count
    volume:
      min_rows: 1000
      max_rows: 50000000

  # Pattern matching - apply to all audit tables
  - table: "*_audit"
    columns:
      - name: "*_at"       # Match created_at, updated_at, deleted_at
        type: Timestamp
      - name: "*_by"       # Match created_by, updated_by
        type: UUID

# Explicit relationships (FK validation)
relationships:
  - from: orders.user_id
    to: users.id

  - from: orders.shipping_address_id
    to: addresses.id
    name: "Order Shipping Address"
    nullable: true

  - from: order_items.order_id
    to: orders.id

# Ad-hoc checks (quick SQL rules)
checks:
  - "on": orders
    name: "Order integrity"
    rules:
      - "total_amount >= 0"
      - "created_at <= NOW()"
      - "status IS NOT NULL"

  - "on": users
    name: "User constraints"
    rules:
      - "email IS NOT NULL"
      - "created_at <= NOW()"

# SQL contracts (complex cross-table validation)
contracts:
  - name: order_items_sum_matches_total
    description: "Order items should sum to order total"
    sql: |
      SELECT o.id, o.total, SUM(oi.quantity * oi.unit_price) as items_sum
      FROM orders o
      JOIN order_items oi ON o.id = oi.order_id
      GROUP BY o.id, o.total
      HAVING ABS(o.total - SUM(oi.quantity * oi.unit_price)) > 0.01
    expect: empty
    severity: error

  - name: recent_orders_exist
    description: "Should have orders in the last 24 hours"
    sql: |
      SELECT 1 FROM orders
      WHERE created_at > NOW() - INTERVAL '24 hours'
      LIMIT 1
    expect: not_empty
    severity: warning

# Notifications (optional)
notifications:
  # Slack notification on failures
  - type: slack
    url: ${SLACK_WEBHOOK_URL}
    trigger: failure
    mention_on_failure:
      - U12345678  # Slack user ID

  # Webhook for custom integrations
  - type: webhook
    url: https://your-api.com/goquality-results
    trigger: always
    headers:
      Authorization: "Bearer ${API_TOKEN}"

Notifications Configuration

Configure notifications to alert your team when validation runs complete.

Field Type Description
type string Notification type: webhook, slack
url string Webhook URL (use ${ENV_VAR} for secrets)
trigger string When to notify: always, failure, success, threshold_exceeded
headers object Custom HTTP headers (optional)
auth_token string Bearer token for Authorization header (optional)
timeout_seconds float Request timeout (default: 30)
retry_count int Number of retries on failure (default: 3)
include_samples bool Include sample failure values (default: true)
max_failures_shown int Maximum failures to show (default: 10)
channel string Slack channel override (Slack only)
mention_on_failure array Slack user IDs to mention on failure (Slack only)

Type Definition Fields

Field Type Description
name string PascalCase type name (required)
description string Human-readable description (required)
base string Base type: String, Integer, Decimal, Boolean, Date, Timestamp
extends string Parent type to inherit from
pattern string Regex pattern (String types)
min_length int Minimum string length
max_length int Maximum string length
not_empty bool Reject empty/whitespace strings
min number Minimum value (numeric types)
max number Maximum value (numeric types)
precision int Decimal places (Decimal type)
enum array Allowed values
allow_null bool Whether NULL is permitted (default: false)
unique bool Values must be unique
references string FK reference as table.column (for cross-table validation)
where string SQL WHERE clause filter on referenced table
tags array Searchable tags
examples array Example valid values
deprecated bool Mark as deprecated

Connection Strings

GoQuality supports multiple database backends via connection strings.

PostgreSQL

# Full format
postgres://user:password@host:port/database

# Examples
postgres://postgres:secret@localhost:5432/mydb
postgresql://user:pass@db.example.com/production
postgres://localhost/mydb  # Local with defaults

DuckDB

# In-memory database
duckdb://:memory:

# File database
duckdb:///path/to/database.db

# CSV/Parquet files (auto-detected)
/path/to/data.csv
/path/to/data.parquet
./relative/path/data.csv

Snowflake

# Full format
snowflake://user@account/database/schema?warehouse=WAREHOUSE

# Examples
snowflake://john@xy12345/analytics/public?warehouse=COMPUTE_WH
snowflake://user@account/db/schema?warehouse=WH&role=ANALYST

Environment Variables:

export SNOWFLAKE_ACCOUNT=xy12345
export SNOWFLAKE_USER=john
export SNOWFLAKE_PASSWORD=secret
export SNOWFLAKE_DATABASE=analytics
export SNOWFLAKE_SCHEMA=public
export SNOWFLAKE_WAREHOUSE=COMPUTE_WH

BigQuery

# Format
bigquery://project-id/dataset

# Examples
bigquery://my-project/analytics
bigquery://prod-data-warehouse/sales

Environment Variables:

export GOOGLE_CLOUD_PROJECT=my-project
export BIGQUERY_DATASET=analytics
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/key.json

MySQL / MariaDB

# Full format
mysql://user:password@host:port/database

# Examples
mysql://root:secret@localhost:3306/mydb
mysql://user:pass@db.example.com/production
mariadb://user:pass@localhost/mydb  # MariaDB compatible

# Cloud providers
mysql://admin:pass@mydb.cluster-xxxxx.us-east-1.rds.amazonaws.com:3306/mydb  # AWS RDS
mysql://user:pass@34.xxx.xxx.xxx:3306/mydb                                    # Cloud SQL

Environment Variables:

export MYSQL_HOST=localhost
export MYSQL_PORT=3306
export MYSQL_USER=myuser
export MYSQL_PASSWORD=secret
export MYSQL_DATABASE=mydb

Microsoft SQL Server

# Full format
mssql://user:password@host:port/database

# Examples
mssql://sa:Password123@localhost:1433/mydb
sqlserver://user:pass@server.database.windows.net/mydb  # Azure SQL

# With schema
mssql://user:pass@localhost/mydb?schema=dbo

# Windows Authentication
mssql://localhost/mydb?trusted_connection=true

Environment Variables:

export MSSQL_HOST=localhost
export MSSQL_PORT=1433
export MSSQL_USER=sa
export MSSQL_PASSWORD=secret
export MSSQL_DATABASE=mydb
export MSSQL_SCHEMA=dbo

Databricks

# Full format
databricks://hostname/schema?http_path=/sql/...&catalog=main&access_token=xxx

# Examples
databricks://my-workspace.cloud.databricks.com/default?http_path=/sql/1.0/warehouses/abc123&access_token=dapiXXX

# Azure Databricks
databricks://adb-123456789.7.azuredatabricks.net/default?http_path=/sql/1.0/warehouses/abc&access_token=dapiXXX

# With Unity Catalog
databricks://hostname/myschema?http_path=/sql/1.0/warehouses/abc&catalog=production&access_token=dapiXXX

Environment Variables:

export DATABRICKS_HOST=my-workspace.cloud.databricks.com
export DATABRICKS_TOKEN=dapiXXXXXXXXXX
export DATABRICKS_HTTP_PATH=/sql/1.0/warehouses/abc123
export DATABRICKS_CATALOG=main
export DATABRICKS_SCHEMA=default

Project Configuration (goquality.toml)

Store database connections, AI provider settings, and environments in a TOML configuration file.

File Location

GoQuality looks for project config in:

  1. ./goquality.toml (project root)
  2. ./.goquality/config.toml
  3. ~/.config/goquality/config.toml (user-level defaults)

Quick Start

# Create a new goquality.toml
goquality config init

# List configured connections
goquality connections list

# Test a connection
goquality connections test dev

# Validate configuration
goquality config validate

Full Example

# goquality.toml - Project configuration

[project]
name = "My Data Project"

#──────────────────────────────────────────────────────────────────────────────
# DATABASE CONNECTIONS
#──────────────────────────────────────────────────────────────────────────────

[connections]
default = "dev"  # Default connection when --source not specified

[connections.local]
connection_string = "duckdb://:memory:"
description = "Local testing with DuckDB"

[connections.dev]
dialect = "postgres"
host = "localhost"
port = 5432
database = "myapp_dev"
user = "${DB_USER}"           # Environment variable interpolation
password = "${DB_PASSWORD}"
description = "Development database"

[connections.staging]
dialect = "postgres"
host = "${STAGING_DB_HOST}"
database = "myapp_staging"
user = "${STAGING_DB_USER}"
password = "${STAGING_DB_PASSWORD}"
description = "Staging environment"

[connections.prod]
connection_string = "postgres://${PROD_USER}:${PROD_PASS}@prod.example.com/myapp"
description = "Production database (read-only)"

[connections.warehouse]
dialect = "snowflake"
host = "xy12345.snowflakecomputing.com"
database = "analytics"
schema = "public"
user = "${SNOWFLAKE_USER}"
password = "${SNOWFLAKE_PASSWORD}"
description = "Snowflake data warehouse"

[connections.warehouse.options]
warehouse = "COMPUTE_WH"
role = "ANALYST"

#──────────────────────────────────────────────────────────────────────────────
# AI / LLM CONFIGURATION (for `goquality generate`)
#──────────────────────────────────────────────────────────────────────────────

[ai]
default = "openai"  # Default AI provider

[ai.openai]
api_key = "${OPENAI_API_KEY}"
model = "gpt-4o"  # Optional: override default model

[ai.anthropic]
api_key = "${ANTHROPIC_API_KEY}"
model = "claude-sonnet-4-20250514"

[ai.ollama]
host = "http://localhost:11434"
model = "llama3"

#──────────────────────────────────────────────────────────────────────────────
# ENVIRONMENTS (bundle connection + AI + settings per environment)
#──────────────────────────────────────────────────────────────────────────────

[environments]
default = "dev"

[environments.dev]
connection = "dev"
ai = "ollama"              # Use local LLM in dev
fail_threshold = 10        # More lenient in development

[environments.staging]
connection = "staging"
ai = "openai"
fail_threshold = 5

[environments.prod]
connection = "prod"
ai = "openai"
fail_threshold = 0         # Zero tolerance in production

#──────────────────────────────────────────────────────────────────────────────
# DEFAULT CLI OPTIONS
#──────────────────────────────────────────────────────────────────────────────

[defaults]
parallel = true
workers = 4
output = "table"
notify = true

Using Named Connections

# Use default connection (from goquality.toml)
goquality check

# Use named connection
goquality check --source dev
goquality check --source staging
goquality check --source warehouse

# Explicit connection string still works
goquality check --source postgres://user:pass@localhost/mydb

Using Environments

# Use default environment
goquality check

# Use named environment (bundles connection + AI + settings)
goquality check --env prod
goquality generate --env dev

# Environment via environment variable
export GOQUALITY_ENV=staging
goquality check

Connection Management Commands

# List all configured connections
goquality connections list
goquality connections list --verbose

# Test a specific connection
goquality connections test dev
goquality connections test  # Tests default connection

# Test all connections
goquality connections test-all

# Show connection details (credentials masked)
goquality connections show dev

Config Management Commands

# Create new config file
goquality config init
goquality config init --name "My Project"

# Show current configuration
goquality config show
goquality config show --verbose

# Validate configuration
goquality config validate

# Show config file path
goquality config path

CI/CD Integration

GitHub Actions

name: Data Quality

on:
  push:
    branches: [main]
  schedule:
    - cron: '0 6 * * *'  # Daily at 6 AM

jobs:
  validate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: '3.11'

      - name: Install GoQuality
        run: pip install goquality[postgres]

      - name: Validate Configuration
        run: goquality validate

      - name: Run Data Quality Checks
        run: |
          goquality check \
            --source ${{ secrets.DATABASE_URL }} \
            --output junit \
            --fail-threshold 1 \
            > results.xml

      - name: Upload Test Results
        uses: actions/upload-artifact@v4
        if: always()
        with:
          name: quality-report
          path: results.xml

      - name: Publish Test Results
        uses: dorny/test-reporter@v1
        if: always()
        with:
          name: GoQuality Results
          path: results.xml
          reporter: java-junit

GitHub Actions with Slack Notifications

name: Data Quality with Notifications

on:
  schedule:
    - cron: '0 6 * * *'  # Daily at 6 AM

jobs:
  validate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: '3.11'

      - name: Install GoQuality
        run: pip install goquality[postgres]

      - name: Run Data Quality Checks
        run: |
          goquality check \
            --source ${{ secrets.DATABASE_URL }} \
            --output junit \
            --slack-webhook ${{ secrets.SLACK_WEBHOOK_URL }} \
            > results.xml
        env:
          GOQUALITY_SLACK_WEBHOOK: ${{ secrets.SLACK_WEBHOOK_URL }}

GitLab CI

data-quality:
  image: python:3.11
  stage: test
  script:
    - pip install goquality[postgres]
    - goquality validate
    - goquality check --source $DATABASE_URL --output junit > report.xml
  artifacts:
    reports:
      junit: report.xml
    paths:
      - report.xml
    expire_in: 1 week

Pre-commit Hook

# .pre-commit-config.yaml
repos:
  - repo: local
    hooks:
      - id: goquality-validate
        name: Validate GoQuality Config
        entry: goquality validate
        language: system
        files: goquality\.yaml$
        pass_filenames: false

Standard Library Types

GoQuality includes 320+ pre-defined types organized by category.

Core Types

Type Base Description
Email String Email address
EmailNullable String Optional email
UUID String UUID v4
URL String HTTP/HTTPS URL
PhoneNumber String International phone
Hostname String DNS hostname

Finance Types

Type Base Description
USD Decimal US Dollar amount
EUR Decimal Euro amount
CreditCardNumber String Credit card (Luhn)
IBAN String International bank account
BIC String Bank identifier code
ABARoutingNumber String US routing number

Healthcare Types

Type Base Description
ICD10 String ICD-10 diagnosis code
CPT String CPT procedure code
NPI String National Provider ID
NDC String National Drug Code
LOINC String Lab test code

E-commerce Types

Type Base Description
SKU String Stock keeping unit
UPC String UPC-A barcode
EAN13 String EAN-13 barcode
ASIN String Amazon product ID
ISBN13 String Book ISBN-13

Regional Types

Type Base Description
SSN String US Social Security
USZipCode String US ZIP code
USState String US state code
GermanVATNumber String German VAT
UKPostcode String UK postcode
IndianPAN String Indian tax ID

Analytics Types

Type Base Description
Percentage Decimal 0-100 percentage
Rate Decimal 0-1 rate
Score Decimal 0-100 score
MRR Decimal Monthly recurring revenue
NPSScore Integer Net promoter score

Browse all types:

goquality types
goquality types --tag finance
goquality types --search email

Custom Validators (Plugins)

GoQuality supports custom validation logic via Python plugins.

Creating a Validator

# .goquality/plugins/my_validators.py

from goquality.plugins import register_validator

@register_validator("is_palindrome", description="Check if string is palindrome")
def is_palindrome(value: str) -> bool:
    clean = value.lower().replace(" ", "")
    return clean == clean[::-1]

@register_validator("divisible_by", description="Check divisibility")
def divisible_by_three(value: int) -> bool:
    return value % 3 == 0

Built-in Advanced Validators

Validator Description
luhn Luhn checksum (credit cards)
iban IBAN checksum
isbn10 ISBN-10 checksum
isbn13 ISBN-13 checksum
ean13 EAN-13 barcode checksum
upc UPC-A barcode checksum
email_format Email format validation
ipv4 IPv4 address format
ipv6 IPv6 address format
mac_address MAC address format
json Valid JSON string
base64 Valid Base64 encoding
future_date Date in the future
past_date Date in the past

Security & Observability

Security Features

GoQuality includes comprehensive security features for production use:

SQL Injection Prevention

All contract SQL is validated before execution:

  • Only SELECT statements are allowed
  • Dangerous keywords are blocked (DROP, DELETE, UPDATE, etc.)
  • SQL is parsed and validated using AST analysis
  • Invalid SQL is rejected at config load time
contracts:
  - name: safe_contract
    sql: SELECT * FROM users WHERE active = true  # ✅ Valid
    expect: not_empty
    
  # This would be rejected:
  # sql: DROP TABLE users  # ❌ Rejected

Query Timeout

Configure query timeouts to prevent hanging validations:

# Via environment variable
export GOQUALITY_QUERY_TIMEOUT_SECONDS=600

# Or in .env file
GOQUALITY_QUERY_TIMEOUT_SECONDS=600

Default timeout: 300 seconds (5 minutes)

Observability Features

Structured Logging

GoQuality supports structured logging to files:

# Log to file
export GOQUALITY_LOG_FILE=goquality.log

# JSON format for log aggregation
export GOQUALITY_LOG_JSON=true

# Set log level
export GOQUALITY_LOG_LEVEL=DEBUG

Log files automatically rotate (10MB max, 5 backups).

Metrics Collection

Collect validation metrics for analysis:

# Save metrics to JSON
goquality check --source postgres://... --metrics-file metrics.json

Metrics include:

  • Overall statistics (tables, columns, checks, pass rates)
  • Type validation metrics
  • Reference validation metrics
  • Contract validation metrics
  • Performance metrics (query times, durations)

Example metrics output:

{
  "run_id": "abc123",
  "timestamp": "2024-01-15T10:30:00Z",
  "total_tables": 10,
  "total_columns": 45,
  "total_checks": 45,
  "passed_checks": 42,
  "failed_checks": 3,
  "duration_seconds": 12.5,
  "pass_rate": 0.933,
  "query_count": 45,
  "avg_query_time_seconds": 0.278
}

Performance Profiling

Enable performance profiling to identify bottlenecks:

goquality check --source postgres://... --profile

Profiling shows:

  • Query execution times
  • Table validation durations
  • Slowest tables
  • Overall performance summary

Example output:

Performance Summary:
  Total duration: 12.50s
  Tables validated: 10
  Total queries: 45
  Total rows: 1,234,567
  Avg query time: 0.278s
  Slowest table: orders (3.45s)

Environment Variables

Variable Description Default
GOQUALITY_QUERY_TIMEOUT_SECONDS Query timeout in seconds 300
GOQUALITY_LOG_FILE Path to log file None (stderr)
GOQUALITY_LOG_JSON Output logs as JSON false
GOQUALITY_LOG_LEVEL Log level (DEBUG, INFO, WARNING, ERROR) INFO
GOQUALITY_LOG_FILE_MAX_BYTES Max log file size before rotation 10485760 (10MB)
GOQUALITY_LOG_FILE_BACKUP_COUNT Number of backup log files 5

Troubleshooting

Common Issues

"Config file not found"

# Create a config file
goquality init

# Or specify path
goquality check --config path/to/config.yaml

"Unknown type: X"

# List available types
goquality types --search X

# Check if custom type is defined in config
goquality validate

"Connection failed"

# Run diagnostics
goquality doctor --source YOUR_CONNECTION_STRING

# Check if driver is installed
pip install goquality[postgres]  # or [snowflake], [bigquery]

"LLM API error"

# Check API key is set
echo $OPENAI_API_KEY

# Try different provider
goquality generate --source ... --provider anthropic
goquality generate --source ... --provider ollama

Debug Mode

# Enable verbose logging
GOQUALITY_DEBUG=1 goquality check --source ...

# Or use log level
GOQUALITY_LOG_LEVEL=DEBUG goquality check --source ...

Logging to File

# Log to file
export GOQUALITY_LOG_FILE=goquality.log
goquality check --source postgres://localhost/mydb

# JSON format for log aggregation
export GOQUALITY_LOG_JSON=true
export GOQUALITY_LOG_FILE=goquality.log
goquality check --source postgres://localhost/mydb

Notification Environment Variables

# Webhook URL for notifications (alternative to --webhook flag)
export GOQUALITY_WEBHOOK_URL=https://your-api.com/goquality-results

# Slack webhook URL for notifications (alternative to --slack-webhook flag)
export GOQUALITY_SLACK_WEBHOOK=https://hooks.slack.com/services/xxx/yyy/zzz

Getting Help

# General help
goquality --help

# Command-specific help
goquality check --help
goquality generate --help

License

MIT License - see LICENSE for details.

Contributing

Contributions welcome! See CONTRIBUTING.md for guidelines.

Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

goquality-0.5.0.tar.gz (343.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

goquality-0.5.0-py3-none-any.whl (309.3 kB view details)

Uploaded Python 3

File details

Details for the file goquality-0.5.0.tar.gz.

File metadata

  • Download URL: goquality-0.5.0.tar.gz
  • Upload date:
  • Size: 343.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for goquality-0.5.0.tar.gz
Algorithm Hash digest
SHA256 171d2af7435efade561cbc39c599d456cb904af09f049c49f60a376506382f50
MD5 3241d236dffc1188b8790ab125d62756
BLAKE2b-256 6e68960bff451b618af1665533abb065227c410bf0206e24837b7819e3a5cf0d

See more details on using hashes here.

File details

Details for the file goquality-0.5.0-py3-none-any.whl.

File metadata

  • Download URL: goquality-0.5.0-py3-none-any.whl
  • Upload date:
  • Size: 309.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for goquality-0.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 23eafa1344d77c9ca2b426915ba30a65d8b9485fc6187cc536ff94334cac5f59
MD5 8469f1d4246b7cb4f7adb4e527d0faf9
BLAKE2b-256 e37dee4ce0b6cfe926e868fcf891fe748b04a756b0edc61364bf69a18cb04bb7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page