Skip to main content

AI-Native Data Governance: TypeScript for Databases

Project description

GoQuality CLI

AI-Native Data Governance: TypeScript for Databases

GoQuality brings type safety to your data. Define types once, validate everywhere. Let AI generate the types, you govern the rules.

┌─────────────────────────────────────────────────────────────────┐
│  Database  →  AI Inference  →  YAML Types  →  Validation  →  ✓ │
│                                                                 │
│  "email"      Email           pattern: ^...   99.8% valid      │
│  "amount"     USD             min: 0          100% valid       │
│  "status"     OrderStatus     enum: [...]     98.2% valid      │
└─────────────────────────────────────────────────────────────────┘

Installation

# Basic installation
pip install goquality

# With PostgreSQL support
pip install goquality[postgres]

# With Snowflake support
pip install goquality[snowflake]

# With BigQuery support
pip install goquality[bigquery]

# With all database drivers
pip install goquality[all]

# Development installation
pip install goquality[dev]

Quick Start

# 1. Initialize a new project
goquality init

# 2. Generate types from your database using AI
goquality generate --source postgres://user:pass@localhost/mydb

# 3. Review and edit the generated goquality.yaml

# 4. Run validation checks
goquality check --source postgres://user:pass@localhost/mydb

# 5. Diagnose any issues
goquality doctor --source postgres://user:pass@localhost/mydb

Commands

goquality init

Initialize a new GoQuality configuration file.

goquality init [OPTIONS]

Options:

Option Short Description
--source -s Database connection string to test
--path -p Path for configuration file (default: goquality.yaml)

Examples:

# Create default config
goquality init

# Create config and test database connection
goquality init --source postgres://localhost/mydb

# Create config at custom path
goquality init --path config/goquality.yaml

goquality generate

Generate type mappings using AI inference. Profiles your database schema and uses an LLM to suggest appropriate types for each column.

goquality generate [OPTIONS]

Options:

Option Short Description
--source -s Database connection string (required)
--output -o Output file path (default: goquality.yaml)
--schema Database schema to profile
--provider LLM provider: openai, anthropic, ollama (default: openai)

Environment Variables:

  • OPENAI_API_KEY - Required for OpenAI provider
  • ANTHROPIC_API_KEY - Required for Anthropic provider
  • OLLAMA_HOST - Ollama server URL (default: http://localhost:11434)

Examples:

# Generate using OpenAI (default)
goquality generate --source postgres://localhost/mydb

# Generate using Anthropic Claude
goquality generate --source postgres://localhost/mydb --provider anthropic

# Generate for specific schema
goquality generate --source postgres://localhost/mydb --schema public

# Generate using local Ollama
OLLAMA_HOST=http://localhost:11434 goquality generate \
  --source postgres://localhost/mydb \
  --provider ollama

goquality check

Run validation checks against your database. This is the core command that validates your data against the defined types.

goquality check [OPTIONS]

Options:

Option Short Description
--config -c Configuration file path (default: goquality.yaml)
--source -s Database connection string
--table -t Only check this specific table
--output -o Output format: table, json, yaml, csv, markdown
--fail-threshold Percentage of failures allowed (0-100)
--fail-on-error/--no-fail-on-error Exit with error code on failures (default: true)
--quiet -q Only show errors and summary

Exit Codes:

  • 0 - All checks passed (or within threshold)
  • 1 - Validation failures detected (above threshold)

Examples:

# Basic check
goquality check --source postgres://localhost/mydb

# Check specific table
goquality check --source postgres://localhost/mydb --table users

# Output as JSON (for CI/CD pipelines)
goquality check --source postgres://localhost/mydb --output json

# Allow up to 5% failures
goquality check --source postgres://localhost/mydb --fail-threshold 5

# Generate markdown report
goquality check --source postgres://localhost/mydb --output markdown > report.md

# Quiet mode for scripts
goquality check --source postgres://localhost/mydb --quiet

# Don't fail on errors (always exit 0)
goquality check --source postgres://localhost/mydb --no-fail-on-error

Output Formats:

Table (default):

┏━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Column   ┃ Type   ┃ Rows   ┃ Valid % ┃ Status   ┃ Details       ┃
┡━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ email    │ Email  │ 10,000 │ 99.8%   │ ✓ PASS   │               │
│ status   │ Status │ 10,000 │ 98.2%   │ ✗ FAIL   │ 180 invalid   │
└──────────┴────────┴────────┴─────────┴──────────┴───────────────┘

JSON:

{
  "summary": {
    "total_checks": 5,
    "passed": 4,
    "failed": 1,
    "failure_rate": 20.0,
    "threshold": 0.0,
    "threshold_passed": false
  },
  "tables": [...]
}

goquality validate

Validate configuration file syntax without connecting to a database.

goquality validate [OPTIONS]

Options:

Option Short Description
--config -c Configuration file to validate (default: goquality.yaml)

Examples:

# Validate default config
goquality validate

# Validate specific config
goquality validate --config staging.yaml

goquality types

List and search available types in the standard library.

goquality types [OPTIONS]

Options:

Option Short Description
--search -s Search types by name or description
--tag -t Filter by tag
--show Show details for a specific type

Examples:

# List all types
goquality types

# Search for email types
goquality types --search email

# Filter by tag
goquality types --tag finance
goquality types --tag healthcare
goquality types --tag regional

# Show type details
goquality types --show Email
goquality types --show CreditCardNumber

Available Tags:

  • core - Basic string/number types
  • finance - Currency, banking, payments
  • healthcare - Medical codes, identifiers
  • ecommerce - Products, orders, shipping
  • saas - API keys, tokens, SaaS identifiers
  • regional - Country-specific formats
  • analytics - Metrics, percentages, scores
  • iot - Sensors, devices, protocols
  • pii - Personally identifiable information

goquality doctor

Diagnose your GoQuality environment and configuration.

goquality doctor [OPTIONS]

Options:

Option Short Description
--config -c Configuration file to check
--source -s Database connection to test
--verbose -v Show detailed information

Checks Performed:

  • Python version compatibility
  • Core dependencies installed
  • Database drivers available
  • LLM providers configured
  • Type library loading
  • Configuration file validity
  • Database connectivity
  • Environment variables

Examples:

# Basic diagnostics
goquality doctor

# Check with database connection
goquality doctor --source postgres://localhost/mydb

# Verbose output
goquality doctor --verbose

goquality stats

Show statistics about the type library and configuration.

goquality stats [OPTIONS]

Options:

Option Short Description
--config -c Configuration file path

Examples:

goquality stats

goquality version

Show version information.

goquality version

Configuration File

GoQuality uses YAML configuration files. The default file is goquality.yaml.

Full Example

# GoQuality Configuration
# https://goquality.dev/docs

# Custom type definitions (extend or override stdlib)
types:
  # Simple type with pattern
  - name: EmployeeId
    description: "Internal employee identifier"
    base: String
    pattern: "^EMP-[0-9]{6}$"
    min_length: 10
    max_length: 10

  # Type extending stdlib
  - name: CorporateEmail
    description: "Company email address"
    base: String
    extends: Email
    pattern: "^[a-z.]+@acme\\.com$"

  # Numeric type with range
  - name: DiscountPercent
    description: "Discount percentage"
    base: Decimal
    min: 0
    max: 100
    precision: 2

  # Enum type
  - name: Department
    description: "Company department"
    base: String
    enum: ["engineering", "sales", "marketing", "hr", "finance"]

  # Type with uniqueness constraint
  - name: ProductSKU
    description: "Unique product SKU"
    base: String
    pattern: "^[A-Z]{2}-[0-9]{6}$"
    unique: true

# Model mappings (table → column types)
models:
  - table: public.users
    columns:
      - name: id
        type: UUID
      - name: email
        type: CorporateEmail
      - name: employee_id
        type: EmployeeId
      - name: department
        type: Department
      - name: created_at
        type: Timestamp

  - table: public.orders
    columns:
      - name: id
        type: UUID
      - name: user_id
        type: UUID
      - name: total_amount
        type: USD
      - name: discount
        type: DiscountPercent
        allow_null: true  # Override type's nullability
      - name: status
        type: OrderStatus

# Ad-hoc checks (quick SQL rules)
checks:
  - "on": orders
    name: "Order integrity"
    rules:
      - "total_amount >= 0"
      - "created_at <= NOW()"
      - "status IS NOT NULL"

  - "on": users
    name: "User constraints"
    rules:
      - "email IS NOT NULL"
      - "created_at <= NOW()"

Type Definition Fields

Field Type Description
name string PascalCase type name (required)
description string Human-readable description (required)
base string Base type: String, Integer, Decimal, Boolean, Date, Timestamp
extends string Parent type to inherit from
pattern string Regex pattern (String types)
min_length int Minimum string length
max_length int Maximum string length
not_empty bool Reject empty/whitespace strings
min number Minimum value (numeric types)
max number Maximum value (numeric types)
precision int Decimal places (Decimal type)
enum array Allowed values
allow_null bool Whether NULL is permitted (default: false)
unique bool Values must be unique
foreign_key string Reference table.column for FK validation
tags array Searchable tags
examples array Example valid values
deprecated bool Mark as deprecated

Connection Strings

GoQuality supports multiple database backends via connection strings.

PostgreSQL

# Full format
postgres://user:password@host:port/database

# Examples
postgres://postgres:secret@localhost:5432/mydb
postgresql://user:pass@db.example.com/production
postgres://localhost/mydb  # Local with defaults

DuckDB

# In-memory database
duckdb://:memory:

# File database
duckdb:///path/to/database.db

# CSV/Parquet files (auto-detected)
/path/to/data.csv
/path/to/data.parquet
./relative/path/data.csv

Snowflake

# Full format
snowflake://user@account/database/schema?warehouse=WAREHOUSE

# Examples
snowflake://john@xy12345/analytics/public?warehouse=COMPUTE_WH
snowflake://user@account/db/schema?warehouse=WH&role=ANALYST

Environment Variables:

export SNOWFLAKE_ACCOUNT=xy12345
export SNOWFLAKE_USER=john
export SNOWFLAKE_PASSWORD=secret
export SNOWFLAKE_DATABASE=analytics
export SNOWFLAKE_SCHEMA=public
export SNOWFLAKE_WAREHOUSE=COMPUTE_WH

BigQuery

# Format
bigquery://project-id/dataset

# Examples
bigquery://my-project/analytics
bigquery://prod-data-warehouse/sales

Environment Variables:

export GOOGLE_CLOUD_PROJECT=my-project
export BIGQUERY_DATASET=analytics
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/key.json

Connection Configuration File

Store multiple database connections in a YAML file for easy switching.

File Location

GoQuality looks for connection configs in:

  1. .goquality/connections.yaml
  2. goquality-connections.yaml
  3. ~/.config/goquality/connections.yaml

Example

# .goquality/connections.yaml

# Default connection to use
default: dev

connections:
  local:
    connection_string: duckdb://:memory:
    description: Local testing with DuckDB

  dev:
    dialect: postgres
    host: localhost
    port: 5432
    database: myapp_dev
    user: developer
    password: devpass
    description: Development database

  staging:
    dialect: postgres
    host: ${STAGING_DB_HOST}
    database: myapp_staging
    user: ${STAGING_DB_USER}
    password: ${STAGING_DB_PASSWORD}
    description: Staging environment

  prod:
    connection_string: postgres://${PROD_USER}:${PROD_PASS}@prod.example.com/myapp
    description: Production database (read-only)

  warehouse:
    dialect: snowflake
    host: xy12345.snowflakecomputing.com
    database: analytics
    schema: public
    user: ${SNOWFLAKE_USER}
    password: ${SNOWFLAKE_PASSWORD}
    options:
      warehouse: COMPUTE_WH
      role: ANALYST

Using Named Connections

# Use default connection
goquality check

# Use named connection
goquality check --source dev
goquality check --source staging
goquality check --source warehouse

CI/CD Integration

GitHub Actions

name: Data Quality

on:
  push:
    branches: [main]
  schedule:
    - cron: '0 6 * * *'  # Daily at 6 AM

jobs:
  validate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: '3.11'

      - name: Install GoQuality
        run: pip install goquality[postgres]

      - name: Validate Configuration
        run: goquality validate

      - name: Run Data Quality Checks
        run: |
          goquality check \
            --source ${{ secrets.DATABASE_URL }} \
            --output json \
            --fail-threshold 1 \
            > results.json

      - name: Upload Results
        uses: actions/upload-artifact@v4
        with:
          name: quality-report
          path: results.json

GitLab CI

data-quality:
  image: python:3.11
  stage: test
  script:
    - pip install goquality[postgres]
    - goquality validate
    - goquality check --source $DATABASE_URL --output markdown > report.md
  artifacts:
    paths:
      - report.md
    expire_in: 1 week

Pre-commit Hook

# .pre-commit-config.yaml
repos:
  - repo: local
    hooks:
      - id: goquality-validate
        name: Validate GoQuality Config
        entry: goquality validate
        language: system
        files: goquality\.yaml$
        pass_filenames: false

Standard Library Types

GoQuality includes 300+ pre-defined types organized by category.

Core Types

Type Base Description
Email String Email address
EmailNullable String Optional email
UUID String UUID v4
URL String HTTP/HTTPS URL
PhoneNumber String International phone
Hostname String DNS hostname

Finance Types

Type Base Description
USD Decimal US Dollar amount
EUR Decimal Euro amount
CreditCardNumber String Credit card (Luhn)
IBAN String International bank account
BIC String Bank identifier code
ABARoutingNumber String US routing number

Healthcare Types

Type Base Description
ICD10 String ICD-10 diagnosis code
CPT String CPT procedure code
NPI String National Provider ID
NDC String National Drug Code
LOINC String Lab test code

E-commerce Types

Type Base Description
SKU String Stock keeping unit
UPC String UPC-A barcode
EAN13 String EAN-13 barcode
ASIN String Amazon product ID
ISBN13 String Book ISBN-13

Regional Types

Type Base Description
SSN String US Social Security
USZipCode String US ZIP code
USState String US state code
GermanVATNumber String German VAT
UKPostcode String UK postcode
IndianPAN String Indian tax ID

Analytics Types

Type Base Description
Percentage Decimal 0-100 percentage
Rate Decimal 0-1 rate
Score Decimal 0-100 score
MRR Decimal Monthly recurring revenue
NPSScore Integer Net promoter score

Browse all types:

goquality types
goquality types --tag finance
goquality types --search email

Custom Validators (Plugins)

GoQuality supports custom validation logic via Python plugins.

Creating a Validator

# .goquality/plugins/my_validators.py

from goquality.plugins import register_validator

@register_validator("is_palindrome", description="Check if string is palindrome")
def is_palindrome(value: str) -> bool:
    clean = value.lower().replace(" ", "")
    return clean == clean[::-1]

@register_validator("divisible_by", description="Check divisibility")
def divisible_by_three(value: int) -> bool:
    return value % 3 == 0

Built-in Advanced Validators

Validator Description
luhn Luhn checksum (credit cards)
iban IBAN checksum
isbn10 ISBN-10 checksum
isbn13 ISBN-13 checksum
ean13 EAN-13 barcode checksum
upc UPC-A barcode checksum
email_format Email format validation
ipv4 IPv4 address format
ipv6 IPv6 address format
mac_address MAC address format
json Valid JSON string
base64 Valid Base64 encoding
future_date Date in the future
past_date Date in the past

Troubleshooting

Common Issues

"Config file not found"

# Create a config file
goquality init

# Or specify path
goquality check --config path/to/config.yaml

"Unknown type: X"

# List available types
goquality types --search X

# Check if custom type is defined in config
goquality validate

"Connection failed"

# Run diagnostics
goquality doctor --source YOUR_CONNECTION_STRING

# Check if driver is installed
pip install goquality[postgres]  # or [snowflake], [bigquery]

"LLM API error"

# Check API key is set
echo $OPENAI_API_KEY

# Try different provider
goquality generate --source ... --provider anthropic
goquality generate --source ... --provider ollama

Debug Mode

# Enable verbose logging
GOQUALITY_DEBUG=1 goquality check --source ...

Getting Help

# General help
goquality --help

# Command-specific help
goquality check --help
goquality generate --help

License

MIT License - see LICENSE for details.

Contributing

Contributions welcome! See CONTRIBUTING.md for guidelines.

Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

goquality-0.1.0.tar.gz (131.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

goquality-0.1.0-py3-none-any.whl (116.6 kB view details)

Uploaded Python 3

File details

Details for the file goquality-0.1.0.tar.gz.

File metadata

  • Download URL: goquality-0.1.0.tar.gz
  • Upload date:
  • Size: 131.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for goquality-0.1.0.tar.gz
Algorithm Hash digest
SHA256 4ba1f261dc6c0b2053b65269f7ae7f228444b6ace8277ad522cc15f18315b3ae
MD5 3468836d6ae237b0aebcd0061bcfe673
BLAKE2b-256 4830896cd38f9f9a36ebe1594a07c53e1660e156976a3f913f3955e40873852e

See more details on using hashes here.

File details

Details for the file goquality-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: goquality-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 116.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for goquality-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3f2b5e5fd00ea518937f5393e8944af28f30d34a5af2e3758911a167b69e6096
MD5 0ca161db7cb2a790baeeba7cf1ec9e08
BLAKE2b-256 2b38f8ae1fa387eb9bdedce408a0e049813eb3b1759e8daba3e1171008bd9b20

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page