Skip to main content

YDB dialect plugin for sqlglot

Project description

ydb-sqlglot-plugin

YDB dialect plugin for sqlglot — bidirectional transpilation between YDB/YQL and any SQL dialect.

Installation

pip install ydb-sqlglot-plugin

Usage

After installing the package, the ydb dialect is available in sqlglot automatically — no extra imports needed:

import sqlglot

# Any dialect → YDB
result = sqlglot.transpile("SELECT * FROM users WHERE id = 1", read="mysql", write="ydb")[0]
# → SELECT * FROM `users` WHERE id = 1

# YDB → any dialect
result = sqlglot.transpile("$t = (SELECT id FROM users); SELECT * FROM $t AS t", read="ydb", write="postgres")[0]
# → WITH t AS (SELECT id FROM users) SELECT * FROM t AS t

What the plugin does

Any SQL → YDB

Table names

Database-qualified names are rewritten to the YDB path format and wrapped in backticks:

-- input
SELECT * FROM analytics.events

-- output
SELECT * FROM `analytics/events`

CTEs → YDB variables

-- input
WITH active AS (SELECT * FROM users WHERE status = 'active')
SELECT * FROM active

-- output
$active = (SELECT * FROM `users` WHERE status = 'active');

SELECT * FROM $active AS active

Subquery decorrelation

Correlated subqueries (which YQL does not support) are rewritten as JOINs:

-- input
SELECT id, (SELECT MAX(amount) FROM orders WHERE orders.user_id = users.id) AS max_order
FROM users

-- output
SELECT users.id AS id, _u_0._u_2 AS max_order
FROM `users`
LEFT JOIN (
    SELECT MAX(amount) AS _u_2, user_id AS _u_1
    FROM `orders`
    WHERE TRUE
    GROUP BY user_id AS _u_1
) AS _u_0 ON users.id = _u_0._u_1

The same rewriting applies to EXISTS, IN (subquery), and ANY/ALL subqueries.


YDB → any SQL

The plugin parses YDB/YQL back into sqlglot's AST, enabling round-trips, YDB-to-YDB transformations, and transpilation to other dialects.

Supported YQL constructs

Construct Example
$variable references SELECT * FROM $t AS t
Module::Function() DateTime::GetYear(ts)
DECLARE $p AS Type DECLARE $p AS Int32
FLATTEN [LIST|DICT] BY col FROM t FLATTEN LIST BY col
Optional<T> / T? CAST(x AS Optional<Utf8>)
Container types CAST(x AS List<Int32>), Dict<Utf8, Int64>, Set<Utf8>, Tuple<Int32, Utf8>
ASSUME ORDER BY SELECT * FROM t ASSUME ORDER BY id
Named expressions $t = (SELECT 1 AS x)
PRAGMA PRAGMA AnsiImplicitCrossJoin

Table names without backticks are accepted on input; the generator always produces backtick-quoted output.

CTEs reassembly

YDB-style named expressions are automatically reassembled into standard WITH CTEs when targeting other dialects:

ydb_sql = "$t = (SELECT 1 AS x); SELECT * FROM $t AS t"
parse_one(ydb_sql, dialect="ydb").sql(dialect="postgres")
# → WITH t AS (SELECT 1 AS x) SELECT * FROM t AS t

Column lineage

Because YDB SQL is fully parsed into sqlglot's AST, column-level lineage works out of the box:

from sqlglot.lineage import lineage

node = lineage("total", "$orders = (SELECT user_id, amount FROM orders); SELECT SUM(amount) AS total FROM $orders AS o", dialect="ydb")
for dep in node.walk():
    print(dep.name, "→", dep.source)

Function reference

Functions below are recognized by sqlglot as standard SQL expressions and translated to their YQL equivalents. Dialect-specific functions that sqlglot does not parse into typed AST nodes are passed through unchanged — see Limitations.

Date / time

Input YQL output
DATE_TRUNC('day', x) DATE(x)
DATE_TRUNC('week', x) DateTime::MakeDate(DateTime::StartOfWeek(x))
DATE_TRUNC('month', x) DateTime::MakeDate(DateTime::StartOfMonth(x))
DATE_TRUNC('quarter', x) DateTime::MakeDate(DateTime::StartOfQuarter(x))
DATE_TRUNC('year', x) DateTime::MakeDate(DateTime::StartOfYear(x))
EXTRACT(WEEK FROM x) DateTime::GetWeekOfYear(x)
EXTRACT(MONTH FROM x) DateTime::GetMonth(x)
EXTRACT(YEAR FROM x) DateTime::GetYear(x)
CURRENT_TIMESTAMP CurrentUtcTimestamp()
STR_TO_DATE(str, fmt) / TO_DATE(str, fmt) DateTime::MakeTimestamp(DateTime::Parse(fmt)(str))
DATE_ADD(x, INTERVAL n MONTH) DateTime::MakeDate(DateTime::ShiftMonths(x, n))
DATE_ADD(x, INTERVAL n YEAR) DateTime::MakeDate(DateTime::ShiftYears(x, n))
DATE_ADD(x, INTERVAL n DAY) x + DateTime::IntervalFromDays(n)
DATE_ADD(x, INTERVAL n HOUR) x + DateTime::IntervalFromHours(n)
DATE_ADD(x, INTERVAL n MINUTE) x + DateTime::IntervalFromMinutes(n)
DATE_ADD(x, INTERVAL n SECOND) x + DateTime::IntervalFromSeconds(n)
DATE_SUB(x, INTERVAL n ...) same as DATE_ADD with
INTERVAL n DAY (literal) DateTime::IntervalFromDays(n)
INTERVAL n HOUR (literal) DateTime::IntervalFromHours(n)
INTERVAL n MINUTE (literal) DateTime::IntervalFromMinutes(n)
INTERVAL n SECOND (literal) DateTime::IntervalFromSeconds(n)
dateDiff('minute', a, b) (CAST(b AS Int64) - CAST(a AS Int64)) / 60000000
dateDiff('hour', a, b) (CAST(b AS Int64) - CAST(a AS Int64)) / 3600000000
dateDiff('day', a, b) (CAST(b AS Int64) - CAST(a AS Int64)) / 86400000000
dateDiff('week', a, b) (CAST(b AS Int64) - CAST(a AS Int64)) / 604800000000

Note on dateDiff: YDB stores Timestamp as microseconds since epoch. The formula above gives exact integer units assuming both arguments are Timestamp. Results for Date-typed columns will differ.

Strings

Input YQL output
CONCAT(a, b, ...) a || b || ...
UPPER(x) Unicode::ToUpper(x)
LOWER(x) Unicode::ToLower(x)
LENGTH(x) / CHAR_LENGTH(x) Unicode::GetLength(x)
POSITION(sub IN x) / STRPOS(x, sub) Find(x, sub)
STRING_TO_ARRAY(x, delim) String::SplitToList(x, delim)
ARRAY_TO_STRING(arr, delim) String::JoinFromList(arr, delim)

Arrays / collections

Input YQL output
ARRAY(v1, v2, ...) AsList(v1, v2, ...)
ARRAY_LENGTH(x) / ARRAY_SIZE(x) ListLength(x)
ARRAY_FILTER(arr, x -> cond) ListFilter(arr, ($x) -> {RETURN cond})
ARRAY_ANY(arr, x -> cond) ListHasItems(ListFilter(arr, ($x) -> {RETURN cond}))
ARRAY_AGG(x) AGGREGATE_LIST(x)
UNNEST(x) FLATTEN BY x

Conditional / math

Input YQL output
NULLIF(x, y) IF(x = y, NULL, x)
ROUND(x, n) Math::Round(x, -n)
COUNT() (zero-argument form) COUNT(*)

JSON

Input YQL output
jsonb_col @> value (PostgreSQL) Yson::Contains(jsonb_col, value)

Type mapping

Standard SQL → YDB

SQL type YDB type
TINYINT Int8
SMALLINT Int16
INT / INTEGER Int32
BIGINT Int64
FLOAT Float
DOUBLE / DOUBLE PRECISION Double
DECIMAL(p, s) Decimal(p, s)
BOOLEAN / BIT Uint8
TIMESTAMP Timestamp
VARCHAR / NVARCHAR / CHAR / TEXT Utf8
BLOB / BINARY / VARBINARY String

YDB types → standard SQL

YDB type Standard SQL Postgres ClickHouse
Utf8 TEXT TEXT String
String BLOB BYTEA String
Int32 INT INT Int32
Int64 BIGINT BIGINT Int64
Optional<T> T (nullable) T Nullable(T)
List<T> LIST<T> LIST<T> Array(T)
Dict<K,V> MAP<K,V> MAP<K,V> Map(K,V)
Tuple<T1,T2> STRUCT<...> STRUCT<...> Tuple(T1,T2)

Limitations

Dialect-specific functions

Functions that sqlglot does not parse into typed AST nodes are passed through unchanged and must be replaced manually. Common examples from ClickHouse: now(), today(), parseDateTimeBestEffort(), toDate(), toFloat64(), toString(), countDistinct(), groupArray().

Correlated subqueries in DML

Correlated subqueries inside UPDATE or INSERT statements cannot be automatically decorrelated — YDB does not support them natively, and rewriting requires knowledge of the table's primary key. Rewrite manually using a $variable:

-- not supported (will raise an error)
UPDATE t SET col = (SELECT val FROM other WHERE other.id = t.id)

-- workaround
$vals = (SELECT id, val FROM other);
UPDATE t SET col = (SELECT val FROM $vals WHERE id = t.id)

Correlated subqueries inside SELECT are handled automatically via JOIN rewriting.

dateDiff with month granularity

dateDiff('month', a, b) has no exact equivalent in YDB because months have variable length. Use DateTime::ShiftMonths for date arithmetic instead.

YDB container types in other dialects

Uint8/Uint16/Uint32/Uint64 and YDB-specific container types (Struct<...>, Variant<...>, Enum<...>) do not have direct equivalents in standard SQL and are passed through as-is when targeting other dialects.


Development

git clone https://github.com/ydb-platform/ydb-sqlglot-plugin.git
cd ydb-sqlglot-plugin
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"
python -m pytest tests/

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ydb_sqlglot_plugin-0.2.1.tar.gz (38.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ydb_sqlglot_plugin-0.2.1-py3-none-any.whl (35.4 kB view details)

Uploaded Python 3

File details

Details for the file ydb_sqlglot_plugin-0.2.1.tar.gz.

File metadata

  • Download URL: ydb_sqlglot_plugin-0.2.1.tar.gz
  • Upload date:
  • Size: 38.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ydb_sqlglot_plugin-0.2.1.tar.gz
Algorithm Hash digest
SHA256 2ce50c36b3dfa8d8cfe8d034d4dd73b42cc61308b328342b24738dc8e1fda06e
MD5 2e9ee701ce9bb8f55bcca1c562315e39
BLAKE2b-256 523c5706f22d969efbf3bef303af355173de5e17d4f937a0b501bb6782290717

See more details on using hashes here.

Provenance

The following attestation bundles were made for ydb_sqlglot_plugin-0.2.1.tar.gz:

Publisher: python-publish.yml on ydb-platform/ydb-sqlglot-plugin

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ydb_sqlglot_plugin-0.2.1-py3-none-any.whl.

File metadata

File hashes

Hashes for ydb_sqlglot_plugin-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 881ea793d0fabef42a668f2596baeb4c891ff18fd308e187505e01940ff8e469
MD5 bc85e004874c5506750aa692307acace
BLAKE2b-256 fe60e580ba69a0aada09d50099ed004501619cc900169c46f57c1112059ad719

See more details on using hashes here.

Provenance

The following attestation bundles were made for ydb_sqlglot_plugin-0.2.1-py3-none-any.whl:

Publisher: python-publish.yml on ydb-platform/ydb-sqlglot-plugin

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page