SQL Parser ans Transpiler
Project description
Parser
Dora's parser and transpiler tool for some big data SQL dialects, based on Mozilla SQL Parser project.
Getting Started
An application that will translate source code from a given language (Impala, Spark, Hive, Presto and Athena) and produce equivalent code in another language that has a similar level of abstraction.
Installation
To install dora-parser if you has pandas and seaborn on your machine, use
pip install dora-parser
otherwise you can use
pip install dora-parser pandas seaborn
Usage
Translate Query
I. Import Modules
from dora_parser.parser import Parser
from dora_parser.transpiler import Transpiler
II. Generate the parse tree
from query
tree = Parser(query)
Where query
have the following value:
WITH t1 as (select now() att_day)
SELECT DECODE(DAYOFWEEK(att_day)
, 1 , 'Monday'
, 2 , 'Tuesday'
, 3 , 'Wednesday'
, 4 , 'Thursday'
, 5 , 'Friday'
, 6 , 'Saturday'
, 7 , 'Sunday'
, 'Unknown day') "Day of week"
, TRUNC(MONTHS_ADD(att_day,2),'SYEAR') as "Trunc SYEAR"
, TRUNC(MONTHS_ADD(att_day,2),'YEAR') as "Trunc YEAR"
FROM t1;
creates the tree object as shown below:
{
"select": [
{
"value": {"decode": [
{"dayofweek": "att_day"},1,
{"literal": "Monday"},2,
{"literal": "Tuesday"},3,
{"literal": "Wednesday"},4,
{"literal": "Thursday"},5,
{"literal": "Friday"},6,
{"literal": "Saturday"},7,
{"literal": "Sunday"},
{"literal": "Unknown day"}]},
"name": "Day of week"},
{
"value": {"trunc": [{"months_add": ["att_day",2]},{"literal": "SYEAR"}]},
"name": "Trunc SYEAR"
},
{
"value": {"trunc": [{"months_add": ["att_day",2]},{"literal": "YEAR"}]},
"name": "Trunc YEAR"
}
],
"from": "t1",
"with": {
"name": "t1",
"value": {
"select": {
"value": {"now": {}},
"name": "att_day"}}
}
}
III. Translate the tree
object from your original SQL Dialect (impala) to the new one (spark)
transpiler = Transpiler(from_dialect='impala', to_dialect='spark')
result, errors = transpiler.translate(tree)
the result
value will be like
WITH t1 AS (SELECT NOW() AS att_day)
SELECT CASE
WHEN DAYOFWEEK(att_day) = 1 THEN 'Monday'
WHEN DAYOFWEEK(att_day) = 2 THEN 'Tuesday'
WHEN DAYOFWEEK(att_day) = 3 THEN 'Wednesday'
WHEN DAYOFWEEK(att_day) = 4 THEN 'Thursday'
WHEN DAYOFWEEK(att_day) = 5 THEN 'Friday'
WHEN DAYOFWEEK(att_day) = 6 THEN 'Saturday'
WHEN DAYOFWEEK(att_day) = 7 THEN 'Sunday'
ELSE 'Unknown day'
END AS `Day of week`
, TRUNC(att_day + INTERVAL 2 MONTHS, 'SYEAR') AS `Trunc SYEAR`
, TRUNC(att_day + INTERVAL 2 MONTHS, 'YEAR') AS `Trunc YEAR`
FROM t1
You can also have access to a list with information about any errors
, as well as where they occur.
In this example, The TRUNC
function in Spark only works with a few data formats, so you can not use it with "SYEAR"
[
{
"trunc": "MEDIUM:20:[{'add': ['att_day', {'interval': [2,'MONTHS']}]}, {'literal': 'SYEAR'}]"
}
]
Information about the errors will also appear in the output log.
dora_parser 2021-08-12 17:24:24,650 WARNING _TRUNC_ Spark
data formats:['YEAR', 'YYYY', 'YY', 'QUARTER', 'MONTH', 'MM', 'MON', 'WEEK']
dora_parser 2021-08-12 17:24:24,650 WARNING resolve trunc NotImplemented:
--TRANSPILER:MEDIUM:LEVEL20:'trunc'
Translate Script
Are considered an script any type of string with multiple SQL statements
I. Import Module
from dora_parser.reader import Reader
II. Translate
script="""
INSERT INTO t.customer SELECT DCEIL(p_sale) FROM Customers;
COMPUTE STATS customer;
SELECT staff_id, staff_name, CHAR_LENGTH(staff_name) AS lengthofname, COUNT(*) order_count
FROM sales.orders
WHERE YEAR(order_date) = 2021
GROUP BY staff_id;
"""
reader = Reader(from_dialect='impala',to_dialect='athena')
result, errors, n_queries = reader.translate_script(script)
Give you as result
INSERT INTO t.customer SELECT CEIL(p_sale) FROM Customers;
/* STATEMENT ERRORS:COMPUTE STATS*/
COMPUTE STATS customer;
SELECT staff_id, staff_name, LENGTH(staff_name) AS lengthofname,
COUNT(*) AS order_count
FROM sales.orders
WHERE YEAR(order_date) = 2021
GROUP BY staff_id;
III. Generate a summary list (optional)
summary = reader.create_summary(errors, n_queries)
value for summary
variable:
[
{"N_queries": 3},
{"Success": 2},
{"Failed": {"HARD": 1}},
{"Er_types": ["compute stats"]}
]
Translate multiple Files
I. Import Module
from dora_parser.reader import Reader
II. Translate
dir_impala = 'scrpits/impala/'
dir_spark = 'scripts/spark'
reader = Reader(from_dialect='impala', to_dialect='spark', input_dir =dir_impala,output_dir=dir_spark)
reader.translate_files()
The translated files will be saved to folders in the output directory according to the result of the translation. If you don't specify the output directory, the resulting folders will be in the input directory.
III. Generate a migration report (optional)
You can also have access to a report in HTML with an overview of the result of migration process. To do this, set the migration_report argument equal to True".
reader = Reader(from_dialect='impala', to_dialect='spark', input_dir =dir_impala,output_dir=dir_spark, migration_report=True)
reader.translate_files()
IV. Generate a summary (optional)
If you want to access a summary dictionary of the migration process, set the summary_dict argument equal to True, as in the example below:
reader = Reader(from_dialect='impala', to_dialect='spark', input_dir =dir_impala,output_dir=dir_spark)
reader.translate_files(summary_dict=True)
{
"Input_dir": "/scripts/impala",
"From_dialect": "impala",
"To_dialect": "spark",
"Sucess_files": 3,
"Failed_files": 1,
"Files": {
"row_benchmark.sql": [
{"N_queries": 7},
{"Success": 7},
{"Failed": {}},
{"Er_types": []}],
"customer_facts.sql": [
{"N_queries": 12},
{"Success": 12},
{"Failed": {}},
{"Er_types": []}],
"document.sh": [
{"N_queries": 1},
{"Success": 1},
{"Failed": {}},
{"Er_types": []}],
"orches.sql": [
{"N_queries": 9},
{"Success": 5},
{"Failed": {"HARD": 3}},
{"Er_types": ["create role", "appx_median", "parser"]}]
}
}
Error Types
Parser
: Errors when generating the tree structure.Transpiler
: Unimplemented functions and their levels of complexity.Statement
: Commands that are not capable of being reproduced, since there are no equivalent in the target language.
Supported Languages
- Apache Impala 2.12
- Hive 3.12
- Spark 3.1
- Presto 0.217
Getting Help
We use GitHub issues for tracking bugs, questions and feature requests.
Contributing
Please read through this contributing document to get start and before submitting any issues or pull requests to ensure we have all the necessary information to effectively respond to your contribution.
Dora Project is a recent open-source project based on technology developed at Compasso UOL
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file dora-parser-0.1.3.tar.gz
.
File metadata
- Download URL: dora-parser-0.1.3.tar.gz
- Upload date:
- Size: 47.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.1.12 CPython/3.8.2 Linux/5.11.0-1022-azure
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a2f1e52a312601d1950759bc18dd02416419e1205021795b959dec1a3dd68794 |
|
MD5 | 2e7d1ffbaabe9e6728c63a49de4807f1 |
|
BLAKE2b-256 | 9e511ea8a3b49a1a32d0e86afd84fbc29e621e10f8162e887abc9ce7cb18673b |
File details
Details for the file dora_parser-0.1.3-py3-none-any.whl
.
File metadata
- Download URL: dora_parser-0.1.3-py3-none-any.whl
- Upload date:
- Size: 53.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.1.12 CPython/3.8.2 Linux/5.11.0-1022-azure
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f35ede207e65a891cfc4c30fb5b5330e57ecaa8f36daa7d60e82b985e5800ddc |
|
MD5 | 777310f399b3c0c2a501fed97ebe0d91 |
|
BLAKE2b-256 | 19aaf98f96bbb1fc1775340e7f5aad1ac1a83989c232729b4f2a0ba6b386e34d |