Skip to main content

Analyze PostgreSQL SQL logs for CockroachDB compatibility

Project description

PyPI version Python version License

Downloads

crdb-sql-audit

A powerful CLI tool to extract, deduplicate, and analyze SQL logs for CockroachDB compatibility using a flexible, rule-based engine.

🚀 Features

  • Works with any SQL dialect (PostgreSQL, MySQL, Oracle, etc.)
  • Extracts SQL and function calls using customizable search terms (e.g. execute, pg_)
  • Deduplicates repeated SQL statements from logs
  • Analyzes SQL using a YAML-based rule engine
  • Supports default compatibility rules (PostgreSQL ➜ CockroachDB)
  • Allows custom rule sets via --rules
  • Logs analysis output to both terminal and crdb_sql_audit.log
  • Automatically detects SQL statement types (e.g. SELECT, DELETE)
  • Friendly CLI with --help and --version
  • Export full reports in multiple formats:
    • .sql: Deduplicated queries
    • .csv: Raw compatibility issue list
    • .md: Developer-friendly Markdown report
    • .html: Interactive browser report with sorting/filtering
    • .png: Visual bar chart of issues

🖼 Sample Output

Report Type Preview
HTML HTML Report Screenshot
Chart Bar Chart
CSV CSV Snippet
SQL SQL Snippet
Markdown Markdown Snippet

📦 Installation

Option A: Local Dev Install

git clone https://github.com/your-org/crdb-sql-audit.git
cd crdb-sql-audit
python -m venv venv
source venv/bin/activate
pip install .

Option B: Build via pyproject.toml

python -m build
pip install dist/crdb_sql_audit-0.2.0-py3-none-any.whl

🧪 Usage

crdb-sql-audit \
  --dir /path/to/logs \
  --terms execute,pg_ \
  --out output/report

Additional Options

--rules    Path to YAML rules file (optional, default: built-in PostgreSQL rules)
--help     Show usage help
--version  Show current version

Custom Rules Example

crdb-sql-audit \
  --dir ./logs \
  --terms execute,pg_ \
  --rules ./rules/mysql_to_crdb.yaml \
  --out output/mysql_report

💡 This tool supports auditing any SQL dialect — just provide a rule set for your source database (e.g., PostgreSQL, MySQL, Oracle).

📁 Output

output/
├── report.sql          # Deduplicated SQL
├── report.csv          # Compatibility issues
├── report.md           # Markdown summary
├── report.html         # Interactive dashboard
├── report_chart.png    # Visual chart of issues
├── crdb_sql_audit.log  # Full run log

🧹 Preparing Your Log Files

To analyze SQL logs effectively, we recommend the following preprocessing steps:

1. Extract SQL-related Lines

grep "execute" app.log > sql_only.log
# or to include pg_ built-in function usage:
grep -E "execute|pg_" app.log > sql_only.log

2. Split Into Manageable Chunks (Optional but Recommended)

split -b 50M sql_only.log chunks/sql_chunk_

3. Run the Audit

crdb-sql-audit --dir chunks --terms execute,pg_ --out output/report

📚 Rule Engine Format

Rules are written in YAML and matched against each SQL line. Example:

- id: malformed_dml_statements
  match: '^(SELECT|INSERT|UPDATE|DELETE FROM)\s*$'
  message: "Possibly malformed or incomplete SQL statement"
  level: warning
  tags: [syntax]

- id: special_char_in_identifier
  match: '"[^"]+#NAU"'
  message: "Table name contains unsupported special character (#NAU)"
  level: error
  tags: [table, identifier]

- id: pg_builtins
  match: '^.*\bpg_\w+\s*\(.*$'
  message: "PostgreSQL pg_* function not supported in CockroachDB"
  level: error
  tags: [function]

- id: low_token_sql
  match: '^(\S+\s*){1,2}$'
  message: "Extremely short SQL likely malformed"
  level: warning
  tags: [syntax]

📦 Multiple rule sets can be created to target different SQL dialects (e.g., postgres_to_crdb.yaml, mysql_to_crdb.yaml, etc.)

🧪 Validate Your Regex Rules

🔍 Online (Recommended)

Use regex101.com to test your patterns:

  • Set the flavor to Python
  • Paste your rule into the regex field
  • Paste a sample SQL line into the test area

🐍 In Python

You can also test your rules directly:

import re
pattern = re.compile(r'^.*\bpg_\w+\s*\(.*$', re.IGNORECASE)
sql = "SELECT pg_backend_pid()"
print(bool(pattern.search(sql)))  # ✅ True

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

crdb_sql_audit-0.2.2.tar.gz (79.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

crdb_sql_audit-0.2.2-py3-none-any.whl (77.4 kB view details)

Uploaded Python 3

File details

Details for the file crdb_sql_audit-0.2.2.tar.gz.

File metadata

  • Download URL: crdb_sql_audit-0.2.2.tar.gz
  • Upload date:
  • Size: 79.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.6

File hashes

Hashes for crdb_sql_audit-0.2.2.tar.gz
Algorithm Hash digest
SHA256 c0e01734d430fec4979b75ceb297a6cd62ee839bef08e65abb65592329b5939a
MD5 9fefee761349435c31b53ee1fe5f2ee3
BLAKE2b-256 78c13a5acf970ab4fa45e1b41db6032b1d055ef57b6fcf055dba289e1b2ffd81

See more details on using hashes here.

File details

Details for the file crdb_sql_audit-0.2.2-py3-none-any.whl.

File metadata

  • Download URL: crdb_sql_audit-0.2.2-py3-none-any.whl
  • Upload date:
  • Size: 77.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.6

File hashes

Hashes for crdb_sql_audit-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 6ad98a0a560644d4d5a14d00cafd6a4fefb02403e3f53cfc3a3065cf612a7626
MD5 cc906a138c75f5c35da9837071699ae6
BLAKE2b-256 c9799dc42ed97bcf1486ef4494e5fb0edea2c0af2027bc87d3ec58ac2e6b54b7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page