Convert Apache Spark Catalyst LogicalPlan JSON back into a logically equivalent SQL statement.
Project description
spark-plan-to-sql
Convert Apache Spark Catalyst LogicalPlan JSON dumps back into a logically
equivalent SQL statement.
The converter walks the pre-order JSON serialization Spark produces for its
LogicalPlan (the same shape exposed via df.queryExecution.logical.toJSON)
and emits readable SQL that, when re-executed, returns the same rows as the
original query.
Install
pip install spark-plan-to-sql
Python API
import json
from spark_plan_to_sql import plan_to_sql, dict_to_sql
# 1) Plain JSON string
sql = plan_to_sql('[{"class": "...OneRowRelation", "num-children": 0}]')
# 2) Already-parsed Python list (Spark's native format)
with open("plan.json") as f:
sql = plan_to_sql(json.load(f))
# 3) A dict wrapper, e.g. {"plan": [...]} or {"logicalPlan": [...]}
sql = plan_to_sql({"plan": json.load(open("plan.json"))})
# 4) Strict dict-only helper
sql = dict_to_sql({"logicalPlan": [...]})
The function accepts:
str/bytes— JSON textlist[dict]— Spark's native pre-order plan listdict— either a single leaf node or a wrapper such as{"plan": [...]}/{"logicalPlan": [...]}/{"nodes": [...]}
CLI
# Convert one or more files (prints to stdout)
spark-plan-to-sql plan.json plan2.json
# Batch convert a directory of plans into ./restored_sql/
spark-plan-to-sql --dir test_json --out restored_sql
Supported plan nodes
DDL/DML: CreateNamespace, DropNamespace, SetCatalogAndNamespace,
CreateTable*, DropTable*, DropView, CreateViewCommand,
CacheTable, UncacheTable, InsertInto*, AppendData.
Query: Project, Filter, Sort, GlobalLimit/LocalLimit,
Distinct/Deduplicate, Aggregate (incl. ROLLUP/CUBE/GROUPING SETS
through the Expand+spark_grouping_id pattern), Join (Inner / Left / Right
/ Full / LeftSemi / LeftAnti / Cross), Union/Intersect/Except, Window
(with frame suppression for LAG/LEAD/RANK/ROW_NUMBER/...),
Generate (LATERAL VIEW), WithCTE/CTERelationDef/CTERelationRef,
SubqueryAlias, LogicalRelation, DataSourceV2Relation, LocalRelation,
OneRowRelation.
Expressions: literals (incl. interval/decimal/date/timestamp), Cast,
Alias, AttributeReference, OuterReference, binary/unary operators,
CaseWhen/If, Coalesce/IfNull/Nvl, aggregates (Count/Sum/
Avg/...), datetime (Year/Month/AddMonths/DateAdd rewriting from
ExtractANSIIntervalDays/...), windowing (WindowExpression/
WindowSpecDefinition/SpecifiedWindowFrame), array/map/struct
(GetStructField/GetArrayItem/ElementAt/ArrayTransform/
LambdaFunction/CreateNamedStruct/MapFromArrays/...), JSON
(JsonToStructs/GetJsonObject/StructsToJson).
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file spark_plan_to_sql-0.1.1.tar.gz.
File metadata
- Download URL: spark_plan_to_sql-0.1.1.tar.gz
- Upload date:
- Size: 24.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f2d5182883d4d36e313fc5497ac387484b48216d37963036b25f2836ddbe3a6f
|
|
| MD5 |
9de736b210b8df3c951b74f820a7b50e
|
|
| BLAKE2b-256 |
b3ac1f14c33946c2875d32b1a0f99480415d7c87ea44ab6f88d53d4f5c0a2088
|
File details
Details for the file spark_plan_to_sql-0.1.1-py3-none-any.whl.
File metadata
- Download URL: spark_plan_to_sql-0.1.1-py3-none-any.whl
- Upload date:
- Size: 23.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
690eb0903130815e17d1339797a816e2f69218a5006bada6b7384e3240c5a3f2
|
|
| MD5 |
a519e0d5ee22fb0ba7678007b218f002
|
|
| BLAKE2b-256 |
3951c2d9796ede9818fa2ff19eed2460461eda8aeacdb5383c57212bc05df2f9
|