Skip to main content

Db2 dialect plugin for SQLGlot

Project description

Db2 SQLGlot Dialect Plugin

A Db2 dialect plugin for SQLGlot - a powerful SQL parser, transpiler, and optimizer.

Requirements

  • Python: 3.10 - 3.12
  • SQLGlot: 30.8.0 - 30.9.x (compatible with SQLMesh and other tools using SQLGlot 30.8.0)

Features

  • Full Db2 SQL syntax support
  • Cross-dialect transpilation (Db2 ↔ PostgreSQL, MySQL, Snowflake, etc.)
  • Type mapping (BOOLEAN → SMALLINT, NCHAR/NVARCHAR support, etc.)
  • Db2-specific functions (POSSTR, VARCHAR_FORMAT, DAYOFWEEK, DAYOFYEAR)
  • FETCH FIRST syntax support
  • NULL ordering support
  • Window functions: Full support for RANK, DENSE_RANK, PERCENT_RANK, ROW_NUMBER, PARTITION BY
  • SQLMesh compatibility: Automatic conversion of SQLMesh-specific functions (DATE_STR_TO_DATE, TIME_STR_TO_TIME, StrToTime) for SEED model support

✅ Test Results

All tests passing: 12 tests with 87% code coverage

$ python3 -m pytest tests/test_db2_dialect.py -v
============================= test session starts ==============================
tests/test_db2_dialect.py ............                                   [100%]
============================== 12 passed in 0.12s ==============================

Code Coverage:

Name                       Stmts   Miss  Cover
----------------------------------------------
db2_sqlglot/__init__.py        8      2    75%
db2_sqlglot/dialect.py        13      0   100%
db2_sqlglot/generator.py      57      9    84%
db2_sqlglot/parser.py          6      0   100%
----------------------------------------------
TOTAL                         84     11    87%

Test Coverage

The test suite validates:

  • Basic SQL: SELECT, INSERT, UPDATE, DELETE, CREATE/DROP TABLE
  • Type conversions: INTEGER→INT, NCHAR→GRAPHIC, NVARCHAR→VARGRAPHIC, DBCLOB→CLOB
  • Functions: POSSTR, VARCHAR_FORMAT, DAYOFWEEK, DAYOFYEAR, GREATEST, LEAST
  • Boolean handling: TRUE/FALSE → 1/0
  • Date/Time: CURRENT DATE, CURRENT TIMESTAMP, DATEDIFF→DAYS
  • FETCH FIRST: Pagination with FETCH FIRST n ROWS ONLY
  • OFFSET: OFFSET n ROWS syntax
  • NULL ordering: NULLS FIRST, NULLS LAST
  • Joins: INNER, LEFT, RIGHT joins
  • Aggregations: COUNT, SUM, AVG, MIN, MAX
  • Subqueries & CTEs: WITH clause support
  • CASE expressions: Simple and searched CASE
  • Operators: IN, BETWEEN, LIKE, IS NULL
  • Set operations: UNION, UNION ALL
  • Variable tokens: @var syntax
  • Typed division: Proper numeric division handling
  • SQLMesh compatibility: DATE_STR_TO_DATE, TIME_STR_TO_TIME, StrToTime function conversions

Installation

From Source (Development)

cd db2-sqlglot-dialect
pip install -e .

From PyPI

pip install db2-sqlglot-dialect

Usage

Once installed, the Db2 dialect is automatically discovered by SQLGlot:

from sqlglot import transpile

# Transpile from PostgreSQL to Db2 (LIMIT → FETCH FIRST)
result = transpile(
    "SELECT * FROM table1 LIMIT 10",
    read="postgres",
    write="db2"
)
print(result[0])
# Output: SELECT * FROM table1 FETCH FIRST 10 ROWS ONLY

# Transpile from Db2 to PostgreSQL
# Note: FETCH FIRST is preserved (PostgreSQL supports it natively - SQL standard)
result = transpile(
    "SELECT * FROM table1 FETCH FIRST 10 ROWS ONLY",
    read="db2",
    write="postgres"
)
print(result[0])
# Output: SELECT * FROM table1 FETCH FIRST 10 ROWS ONLY

# Both LIMIT and FETCH FIRST work in PostgreSQL
# If you need LIMIT specifically, use it in the source query
result = transpile(
    "SELECT * FROM table1 LIMIT 10",
    read="db2",
    write="postgres"
)
print(result[0])
# Output: SELECT * FROM table1 LIMIT 10

Supported Features

Data Types

Native Db2 Types (Perfect Roundtrip)

  • Standard types: INTEGER, BIGINT, SMALLINT, DECIMAL, VARCHAR, CHAR, DATE, TIMESTAMP, CLOB, BLOB, BOOLEAN
  • Db2-specific types: GRAPHIC, VARGRAPHIC, DBCLOB, DATALINK, ROWID, DECFLOAT, XML

Cross-Dialect Type Conversions

When transpiling from other databases to Db2:

  • TEXT → CLOB
  • BYTEA/BINARY → BLOB
  • TINYINT → SMALLINT
  • TIMESTAMPTZ → TIMESTAMP
  • ⚠️ SERIAL/BIGSERIAL → Preserved as-is (may need manual conversion)
  • ⚠️ LONGTEXT/MEDIUMTEXT → Preserved as-is (may need manual conversion)

Note: Unsupported or dialect-specific constructs may be preserved as-is during transpilation. This behavior prioritizes syntax preservation over potentially lossy or incorrect transformations. Users should review and adjust preserved types as needed for their specific Db2 version and requirements.

Functions

  • POSSTR(haystack, needle) - String position
  • VARCHAR_FORMAT(timestamp, format) - Time to string conversion
  • DAYOFWEEK(date) - Extract day of week
  • DAYOFYEAR(date) - Extract day of year
  • CURRENT DATE, CURRENT TIMESTAMP
  • Date arithmetic with + and -

SQL Features

  • FETCH FIRST n ROWS ONLY syntax
  • OFFSET n ROWS syntax
  • NULL ordering (NULLS FIRST, NULLS LAST)
  • Typed division
  • Variable tokens (@var)

Window Functions

Full support for SQL window functions commonly used in analytics and data transformation:

  • RANK() - Assigns a rank to each row within a partition
  • DENSE_RANK() - Like RANK() but without gaps in ranking
  • PERCENT_RANK() - Relative rank of a row within a partition
  • ROW_NUMBER() - Sequential number for each row
  • PARTITION BY - Divides result set into partitions
  • ORDER BY - Defines ordering within window

Example:

-- SQLMesh model with window functions
SELECT
    customer_id,
    total_revenue,
    RANK() OVER (ORDER BY total_revenue DESC) AS revenue_rank,
    DENSE_RANK() OVER (ORDER BY total_revenue DESC) AS dense_rank,
    PERCENT_RANK() OVER (ORDER BY total_revenue DESC) AS percent_rank
FROM customer_stats

All window functions work seamlessly with Db2 and are fully compatible with SQLMesh models.

SQLMesh Integration

This dialect includes full support for SQLMesh:

SEED Models:

  • DATE_STR_TO_DATE() → Automatically converted to Db2's DATE() function
  • TIME_STR_TO_TIME() → Automatically converted to Db2's TIME() function
  • StrToTime() → Automatically converted to Db2's TIMESTAMP() function

Analytics Models:

  • Full window function support (RANK, DENSE_RANK, PERCENT_RANK, ROW_NUMBER)
  • CTE (Common Table Expressions) support
  • Complex aggregations with GROUP BY

These features enable SQLMesh to work seamlessly with Db2 for both data loading (SEED models) and transformation (analytics models).

Example:

# SQLMesh generates this for SEED models:
SELECT DATE_STR_TO_DATE('2024-01-15') FROM table

# Automatically transpiled to Db2:
SELECT DATE('2024-01-15') FROM table

CI/CD

GitHub Actions Workflows

This project includes two automated CI/CD workflows:

  1. Unit Tests (.github/workflows/unit-tests.yml)

    • Triggers: Push to main, Pull Requests, Manual dispatch
    • Platforms: Ubuntu, macOS, Windows
    • Python versions: 3.10, 3.11, 3.12
    • Features:
      • Concurrency control (cancels outdated runs)
      • Unit tests with coverage reporting
      • Code quality checks (ruff, flake8)
      • Codecov integration
  2. Build and Publish Release (.github/workflows/build_release.yaml)

    • Triggers:
      • Version tag push (e.g., v1.0.1) - Publishes to Test PyPI only
      • Manual workflow dispatch (with choice: Test PyPI or PyPI, plus optional ref)
    • Features:
      • Authorized release gating (only ShubhamKapoor992 and amitkumar293)
      • Builds distribution packages (wheel + sdist)
      • Tests installation on multiple platforms before publish
      • Tag push: Automatically publishes to Test PyPI for testing
      • Production release: Requires manual workflow dispatch to publish to PyPI
      • Supports approval-gated publishing through environment protection rules

Creating a Release

Step 1: Update Version

# Edit pyproject.toml
version = "1.0.1"

# Commit and push
git commit -am "Bump version to 1.0.1"
git push origin main

Step 2: Trigger the Release Workflow

You can trigger the release workflow in either of these ways:

Option A: Push a Version Tag
git tag v1.0.1
git push origin v1.0.1

This triggers .github/workflows/build_release.yaml.

Option B: Run Manually from GitHub Actions
  1. Go to Actions tab → Build and Publish Release
  2. Click "Run workflow"
  3. Select:
    • Where to publish?test-pypi or pypi
    • Git ref → tag or branch (for example, v1.0.1 or main)
  4. Click "Run workflow"

Step 3: What Happens Next

When you push a tag (e.g., v1.0.1):

  1. Workflow checks authorization (only ShubhamKapoor992 and amitkumar293)
  2. Builds the package
  3. Validates with twine check
  4. Tests wheel installation on Ubuntu, macOS, Windows
  5. Publishes to Test PyPI automatically for testing

To publish to production PyPI:

  1. Go to Actions tab → Build and Publish Release
  2. Click "Run workflow"
  3. Select:
    • Where to publish?pypi
    • Git ref → tag (e.g., v1.0.1)
  4. Click "Run workflow"
  5. Approve if environment protection is configured

Step 4: Verify Installation

Test PyPI (after tag push):

pip install --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ db2-sqlglot-dialect

Production PyPI (after manual release):

pip install db2-sqlglot-dialect

Why This Approach Works Well

Authorized releases only: Release workflow checks allowed GitHub users ✅ Build verification first: Package is built and validated before publish ✅ Install verification: Wheel is tested before publish ✅ Approval support: GitHub environments can require approval before publish ✅ Flexible triggering: Supports both version tags and manual dispatch

Development

Running Tests

pytest tests/

Running Tests with Coverage

pytest tests/ -v --cov=db2_sqlglot --cov-report=term --cov-report=html

Code Quality

The project uses ruff for formatting and linting:

# Run formatting
ruff format .

# Run linting
ruff check .

# Run formatting and repository checks
pre-commit run --all-files

Contributors should always run pre-commit run --all-files before opening a pull request.

Project Structure

db2-sqlglot-dialect/
├── db2_sqlglot/
│   ├── __init__.py
│   ├── dialect.py      # Main dialect class
│   ├── generator.py    # SQL generation logic
│   └── parser.py       # SQL parsing logic
├── tests/
│   └── test_db2_dialect.py
├── pyproject.toml      # Modern Python packaging (PEP 517/518/621)
└── README.md

Note: This project uses modern Python packaging with pyproject.toml only (no setup.py needed). Entry points are defined in pyproject.toml and work seamlessly with SQLGlot's plugin discovery system.

Contributing

Contributions are welcome! Please:

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests for new functionality
  5. Run pre-commit run --all-files
  6. Ensure all tests pass and code passes linting
  7. Submit a pull request

All pull requests will automatically run:

  • Linting checks (ruff, flake8)
  • Unit tests across multiple Python versions and platforms
  • Code quality validation

Support

For issues and questions:

Credits

Built on top of SQLGlot by Toby Mao.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

db2_sqlglot_dialect-1.0.0.tar.gz (23.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

db2_sqlglot_dialect-1.0.0-py3-none-any.whl (17.6 kB view details)

Uploaded Python 3

File details

Details for the file db2_sqlglot_dialect-1.0.0.tar.gz.

File metadata

  • Download URL: db2_sqlglot_dialect-1.0.0.tar.gz
  • Upload date:
  • Size: 23.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for db2_sqlglot_dialect-1.0.0.tar.gz
Algorithm Hash digest
SHA256 0a7a014114f23b6fc763342620aa3d351f5b4fd4f8232fa05c8e04049d151e94
MD5 1e2e19a3ed8026d84b4010d736071939
BLAKE2b-256 85e946cd8774e95c4718a74317c15069533154aabb33fe810026d0573e2cb3ab

See more details on using hashes here.

Provenance

The following attestation bundles were made for db2_sqlglot_dialect-1.0.0.tar.gz:

Publisher: build_release.yaml on IBM/db2-sqlglot-plugin

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file db2_sqlglot_dialect-1.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for db2_sqlglot_dialect-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8d1ca15cea5a015f87032e4f1b0aa5f6663b4349bb6c31afdddceda30444c5cf
MD5 20fddcdbadb85e8c8e07ec2a8cbd1a7d
BLAKE2b-256 f716b8562734625fd3dec26a16a01dd3612be95ac11ce8afb5597bb0a424135d

See more details on using hashes here.

Provenance

The following attestation bundles were made for db2_sqlglot_dialect-1.0.0-py3-none-any.whl:

Publisher: build_release.yaml on IBM/db2-sqlglot-plugin

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page