Skip to main content

YDB dialect plugin for sqlglot

Project description

ydb-sqlglot-plugin

YDB dialect plugin for sqlglot — bidirectional transpilation between YDB/YQL and any SQL dialect.

Installation

pip install ydb-sqlglot-plugin

Usage

After installing the package, the ydb dialect is available in sqlglot automatically — no extra imports needed:

import sqlglot

# Any dialect → YDB
result = sqlglot.transpile("SELECT * FROM users WHERE id = 1", read="mysql", write="ydb")[0]
# → SELECT * FROM `users` WHERE id = 1

# YDB → any dialect
result = sqlglot.transpile("$t = (SELECT id FROM users); SELECT * FROM $t AS t", read="ydb", write="postgres")[0]
# → WITH t AS (SELECT id FROM users) SELECT * FROM t AS t

What the plugin does

Any SQL → YDB

Table names

Database-qualified names are rewritten to the YDB path format and wrapped in backticks:

-- input
SELECT * FROM analytics.events

-- output
SELECT * FROM `analytics/events`

CTEs → YDB variables

-- input
WITH active AS (SELECT * FROM users WHERE status = 'active')
SELECT * FROM active

-- output
$active = (SELECT * FROM `users` WHERE status = 'active');

SELECT * FROM $active AS active

Subquery decorrelation

Correlated subqueries (which YQL does not support) are rewritten as JOINs:

-- input
SELECT id, (SELECT MAX(amount) FROM orders WHERE orders.user_id = users.id) AS max_order
FROM users

-- output
SELECT users.id AS id, _u_0._u_2 AS max_order
FROM `users`
LEFT JOIN (
    SELECT MAX(amount) AS _u_2, user_id AS _u_1
    FROM `orders`
    WHERE TRUE
    GROUP BY user_id AS _u_1
) AS _u_0 ON users.id = _u_0._u_1

The same rewriting applies to EXISTS, IN (subquery), and ANY/ALL subqueries.

GROUP BY aliases

YDB accepts aliases directly inside GROUP BY items. The generator uses this form for grouped columns so later clauses and decorrelated subqueries can refer to a stable grouping name:

-- input
SELECT user_id, COUNT(*) FROM events GROUP BY user_id

-- output
SELECT user_id, COUNT(*) FROM `events` GROUP BY user_id AS user_id

If a grouped column is selected under a generated alias, the GROUP BY item uses that alias as well:

SELECT user_id AS _u_1, COUNT(*) FROM `events` GROUP BY user_id AS _u_1

Positional GROUP BY references are expanded before generation. When a positional reference points to a constant expression, the grouping item is removed because YDB rejects grouping by constants.


YDB → any SQL

The plugin parses YDB/YQL back into sqlglot's AST, enabling round-trips, YDB-to-YDB transformations, and transpilation to other dialects.

Supported YQL constructs

Construct Example
$variable references SELECT * FROM $t AS t
Module::Function() DateTime::GetYear(ts)
DECLARE $p AS Type DECLARE $p AS Int32
FLATTEN [LIST|DICT|OPTIONAL] BY ... / FLATTEN COLUMNS FROM t FLATTEN LIST BY col AS item, FROM t FLATTEN BY (a, b), FROM t FLATTEN COLUMNS
Optional<T> / T? CAST(x AS Optional<Utf8>)
Container types CAST(x AS List<Int32>), Dict<Utf8, Int64>, Set<Utf8>, Tuple<Int32, Utf8>
ASSUME ORDER BY SELECT * FROM t ASSUME ORDER BY id
GROUP BY expr AS alias / GROUP COMPACT BY SELECT v, COUNT(*) FROM t GROUP BY v AS v
LEFT ONLY JOIN SELECT * FROM a LEFT ONLY JOIN b USING (id)
* WITHOUT (...) projections SELECT b.* WITHOUT (b.id) FROM t AS b
Named expressions $t = (SELECT 1 AS x)
Lambda expressions ($x, $y?) -> ($x + COALESCE($y, 0)), ($y) -> { $p = "x"; RETURN $p || $y }
YQL struct literals `AsList(<
IN COMPACT WHERE key IN COMPACT $values
PRAGMA PRAGMA AnsiImplicitCrossJoin
Table-valued functions SELECT * FROM AS_TABLE($Input) AS k
Table source options and index views FROM `t` WITH TabletId='...', FROM `t` VIEW PRIMARY KEY v
Function-valued expressions $grep(x), DateTime::Format("%Y-%m-%d")(ts), Interval("P7D")

Table names without backticks are accepted on input; the generator always produces backtick-quoted output.

The parser also tolerates case variants that appear in real YQL dumps, such as set<Utf8>, Tuple<Int32, Utf8>?, and lowercase return in lambda blocks.

CTEs reassembly

YDB-style named expressions are automatically reassembled into standard WITH CTEs when targeting other dialects:

ydb_sql = "$t = (SELECT 1 AS x); SELECT * FROM $t AS t"
parse_one(ydb_sql, dialect="ydb").sql(dialect="postgres")
# → WITH t AS (SELECT 1 AS x) SELECT * FROM t AS t

Column lineage

Because YDB SQL is fully parsed into sqlglot's AST, column-level lineage works out of the box:

from sqlglot.lineage import lineage

node = lineage("total", "$orders = (SELECT user_id, amount FROM orders); SELECT SUM(amount) AS total FROM $orders AS o", dialect="ydb")
for dep in node.walk():
    print(dep.name, "→", dep.source)

Function reference

Functions below are recognized by sqlglot as standard SQL expressions and translated to their YQL equivalents. Dialect-specific functions that sqlglot does not parse into typed AST nodes are passed through unchanged — see Limitations.

Date / time

Input YQL output
DATE_TRUNC('day', x) DATE(x)
DATE_TRUNC('week', x) DateTime::MakeDate(DateTime::StartOfWeek(x))
DATE_TRUNC('month', x) DateTime::MakeDate(DateTime::StartOfMonth(x))
DATE_TRUNC('quarter', x) DateTime::MakeDate(DateTime::StartOfQuarter(x))
DATE_TRUNC('year', x) DateTime::MakeDate(DateTime::StartOfYear(x))
EXTRACT(WEEK FROM x) DateTime::GetWeekOfYear(x)
EXTRACT(MONTH FROM x) DateTime::GetMonth(x)
EXTRACT(YEAR FROM x) DateTime::GetYear(x)
CURRENT_TIMESTAMP CurrentUtcTimestamp()
STR_TO_DATE(str, fmt) / TO_DATE(str, fmt) DateTime::MakeTimestamp(DateTime::Parse(fmt)(str))
DATE_ADD(x, INTERVAL n MONTH) DateTime::MakeDate(DateTime::ShiftMonths(x, n))
DATE_ADD(x, INTERVAL n YEAR) DateTime::MakeDate(DateTime::ShiftYears(x, n))
DATE_ADD(x, INTERVAL n DAY) x + DateTime::IntervalFromDays(n)
DATE_ADD(x, INTERVAL n HOUR) x + DateTime::IntervalFromHours(n)
DATE_ADD(x, INTERVAL n MINUTE) x + DateTime::IntervalFromMinutes(n)
DATE_ADD(x, INTERVAL n SECOND) x + DateTime::IntervalFromSeconds(n)
DATE_SUB(x, INTERVAL n ...) same as DATE_ADD with
INTERVAL n DAY (literal) DateTime::IntervalFromDays(n)
INTERVAL n HOUR (literal) DateTime::IntervalFromHours(n)
INTERVAL n MINUTE (literal) DateTime::IntervalFromMinutes(n)
INTERVAL n SECOND (literal) DateTime::IntervalFromSeconds(n)
Interval("P7D") (YQL input) passed through unchanged
dateDiff('minute', a, b) (CAST(b AS Int64) - CAST(a AS Int64)) / 60000000
dateDiff('hour', a, b) (CAST(b AS Int64) - CAST(a AS Int64)) / 3600000000
dateDiff('day', a, b) (CAST(b AS Int64) - CAST(a AS Int64)) / 86400000000
dateDiff('week', a, b) (CAST(b AS Int64) - CAST(a AS Int64)) / 604800000000

Note on dateDiff: YDB stores Timestamp as microseconds since epoch. The formula above gives exact integer units assuming both arguments are Timestamp. Results for Date-typed columns will differ.

Strings

Input YQL output
CONCAT(a, b, ...) a || b || ...
UPPER(x) Unicode::ToUpper(x)
LOWER(x) Unicode::ToLower(x)
LENGTH(x) / CHAR_LENGTH(x) Unicode::GetLength(x)
POSITION(sub IN x) / STRPOS(x, sub) Find(x, sub)
STRING_TO_ARRAY(x, delim) String::SplitToList(x, delim)
ARRAY_TO_STRING(arr, delim) String::JoinFromList(arr, delim)

Arrays / collections

Input YQL output
ARRAY(v1, v2, ...) AsList(v1, v2, ...)
ARRAY_LENGTH(x) / ARRAY_SIZE(x) ListLength(x)
ARRAY_FILTER(arr, x -> cond) ListFilter(arr, ($x) -> (cond))
ARRAY_ANY(arr, x -> cond) ListHasItems(ListFilter(arr, ($x) -> (cond)))
ARRAY_AGG(x) AGGREGATE_LIST(x)
UNNEST(x) FLATTEN BY x

Lambda expressions are represented with sqlglot's standard exp.Lambda AST node. When a source dialect parses lambdas, the YDB generator emits YQL lambda syntax:

-- DuckDB input
SELECT list_filter(arr, x -> x > 0) FROM t

-- YDB output
SELECT ListFilter(arr, ($x) -> ($x > 0)) FROM `t`

YDB input also supports documented YQL lambda forms, including optional arguments and block bodies with local named expressions:

($x, $y?) -> ($x + COALESCE($y, 0));
($y) -> { $prefix = "x"; RETURN $prefix || $y; };

ClickHouse ARRAY JOIN and simple arrayJoin(...) projections, and PostgreSQL LATERAL unnest(...), are converted to YDB FLATTEN BY when the operation is directly tied to the source table.

Conditional / math

Input YQL output
NULLIF(x, y) IF(x = y, NULL, x)
ROUND(x, n) Math::Round(x, -n)
COUNT() (zero-argument form) COUNT(*)

JSON

Input YQL output
jsonb_col @> value (PostgreSQL) Yson::Contains(jsonb_col, value)

YDB JSON functions are parsed and round-tripped, including PASSING, RETURNING, wrapper modes, and ON EMPTY / ON ERROR clauses:

JSON_VALUE(payload, '$.value + $delta' PASSING 1 AS delta RETURNING Int64 DEFAULT 0 ON EMPTY ERROR ON ERROR)
JSON_QUERY(payload, '$.items' WITH CONDITIONAL ARRAY WRAPPER NULL ON EMPTY ERROR ON ERROR)
JSON_EXISTS(payload, '$.items[$Index]' PASSING 0 AS "Index" FALSE ON ERROR)

JSON paths can contain quoted keys, for example JSON_EXISTS(item_result, "$.'P_008 device playback test'").


Type mapping

Standard SQL → YDB

SQL type YDB type
TINYINT Int8
SMALLINT Int16
INT / INTEGER Int32
BIGINT Int64
FLOAT Float
DOUBLE / DOUBLE PRECISION Double
DECIMAL(p, s) Decimal(p, s)
BOOLEAN / BIT Uint8
TIMESTAMP Timestamp
VARCHAR / NVARCHAR / CHAR / TEXT Utf8
BLOB / BINARY / VARBINARY String

YDB types → standard SQL

YDB type Standard SQL Postgres ClickHouse
Utf8 TEXT TEXT String
String BLOB BYTEA String
Int32 INT INT Int32
Int64 BIGINT BIGINT Int64
Optional<T> T (nullable) T Nullable(T)
List<T> LIST<T> LIST<T> Array(T)
Dict<K,V> MAP<K,V> MAP<K,V> Map(K,V)
Tuple<T1,T2> STRUCT<...> STRUCT<...> Tuple(T1,T2)

Limitations

Dialect-specific functions

Functions that sqlglot does not parse into typed AST nodes are passed through unchanged and must be replaced manually. Common examples from ClickHouse: now(), today(), parseDateTimeBestEffort(), toDate(), toFloat64(), toString(), countDistinct(), groupArray().

Correlated subqueries in DML

Correlated subqueries inside UPDATE or INSERT statements cannot be automatically decorrelated — YDB does not support them natively, and rewriting requires knowledge of the table's primary key. Rewrite manually using a $variable:

-- not supported (will raise an error)
UPDATE t SET col = (SELECT val FROM other WHERE other.id = t.id)

-- workaround
$vals = (SELECT id, val FROM other);
UPDATE t SET col = (SELECT val FROM $vals WHERE id = t.id)

Correlated subqueries inside SELECT are handled automatically via JOIN rewriting.

dateDiff with month granularity

dateDiff('month', a, b) has no exact equivalent in YDB because months have variable length. Use DateTime::ShiftMonths for date arithmetic instead.

YDB container types in other dialects

Uint8/Uint16/Uint32/Uint64 and YDB-specific container types (Struct<...>, Variant<...>, Enum<...>) do not have direct equivalents in standard SQL and are passed through as-is when targeting other dialects.


Development

git clone https://github.com/ydb-platform/ydb-sqlglot-plugin.git
cd ydb-sqlglot-plugin
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"
python -m pytest tests/

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ydb_sqlglot_plugin-0.2.4.tar.gz (49.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ydb_sqlglot_plugin-0.2.4-py3-none-any.whl (45.2 kB view details)

Uploaded Python 3

File details

Details for the file ydb_sqlglot_plugin-0.2.4.tar.gz.

File metadata

  • Download URL: ydb_sqlglot_plugin-0.2.4.tar.gz
  • Upload date:
  • Size: 49.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ydb_sqlglot_plugin-0.2.4.tar.gz
Algorithm Hash digest
SHA256 c909be8ad3578533fc78d7e5c46d4165f67d78d4a941c97c9b357b55f067b183
MD5 06961719f3a7c69c99d2c503cff7d28d
BLAKE2b-256 73e5a89a5079f82868195f406d0efcb8741c590807ee3db56c1c8c3f5b49927c

See more details on using hashes here.

Provenance

The following attestation bundles were made for ydb_sqlglot_plugin-0.2.4.tar.gz:

Publisher: python-publish.yml on ydb-platform/ydb-sqlglot-plugin

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ydb_sqlglot_plugin-0.2.4-py3-none-any.whl.

File metadata

File hashes

Hashes for ydb_sqlglot_plugin-0.2.4-py3-none-any.whl
Algorithm Hash digest
SHA256 7b015f1135353674fdcc17893872044e1bc87efbbb2aaa0558e02d4a7b332fd2
MD5 1c54a695a9298b7b90fb121ae8479bd1
BLAKE2b-256 62371620c14fe8132ea98242947f334771386149459e5e60bea436e54de45503

See more details on using hashes here.

Provenance

The following attestation bundles were made for ydb_sqlglot_plugin-0.2.4-py3-none-any.whl:

Publisher: python-publish.yml on ydb-platform/ydb-sqlglot-plugin

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page