Skip to main content

SQL Parser ans Transpiler

Project description

Parser

Dora's parser and transpiler tool for some big data SQL dialects, based on Mozilla SQL Parser project.

PyPI PyPI - Python Version

Getting Started

An application that will translate source code from a given language (Impala, Spark, Hive, Presto and Athena) and produce equivalent code in another language that has a similar level of abstraction.

Installation

To install dora-parser if you has pandas and seaborn on your machine, use

pip install dora-parser

otherwise you can use

pip install dora-parser pandas seaborn 

Usage

Translate Query

I. Import Modules

from dora_parser.parser import Parser
from dora_parser.transpiler import Transpiler

II. Generate the parse tree from query

tree = Parser(query)

Where query have the following value:

WITH t1 as (select now() att_day)
SELECT DECODE(DAYOFWEEK(att_day)
            , 1 , 'Monday'
            , 2 , 'Tuesday'
            , 3 , 'Wednesday'
            , 4 , 'Thursday'
            , 5 , 'Friday'
            , 6 , 'Saturday'
            , 7 , 'Sunday'
            , 'Unknown day') "Day of week"
  , TRUNC(MONTHS_ADD(att_day,2),'SYEAR') as "Trunc SYEAR"
  , TRUNC(MONTHS_ADD(att_day,2),'YEAR') as  "Trunc YEAR"
  FROM t1;

creates the tree object as shown below:

{
    "select": [
        {
          "value": {"decode": [
            {"dayofweek": "att_day"},1,
              {"literal": "Monday"},2,
              {"literal": "Tuesday"},3,
              {"literal": "Wednesday"},4,
              {"literal": "Thursday"},5,
              {"literal": "Friday"},6,
              {"literal": "Saturday"},7,
              {"literal": "Sunday"},
              {"literal": "Unknown day"}]},
          "name": "Day of week"},
        {
          "value": {"trunc": [{"months_add": ["att_day",2]},{"literal": "SYEAR"}]},
          "name": "Trunc SYEAR"
        },
        {
          "value": {"trunc": [{"months_add": ["att_day",2]},{"literal": "YEAR"}]},
          "name": "Trunc YEAR"
        }
    ],
    "from": "t1",
    "with": {
        "name": "t1",
        "value": {
          "select": {
            "value": {"now": {}},
            "name": "att_day"}}
    }
}

III. Translate the tree object from your original SQL Dialect (impala) to the new one (spark)

transpiler = Transpiler(from_dialect='impala', to_dialect='spark')
result, errors = transpiler.translate(tree)

the result value will be like

WITH t1 AS (SELECT NOW() AS att_day) 
SELECT CASE
        WHEN DAYOFWEEK(att_day) = 1 THEN 'Monday' 
        WHEN DAYOFWEEK(att_day) = 2 THEN 'Tuesday' 
        WHEN DAYOFWEEK(att_day) = 3 THEN 'Wednesday'
        WHEN DAYOFWEEK(att_day) = 4 THEN 'Thursday' 
        WHEN DAYOFWEEK(att_day) = 5 THEN 'Friday' 
        WHEN DAYOFWEEK(att_day) = 6 THEN 'Saturday' 
        WHEN DAYOFWEEK(att_day) = 7 THEN 'Sunday' 
        ELSE 'Unknown day' 
      END AS `Day of week`
  , TRUNC(att_day + INTERVAL 2 MONTHS, 'SYEAR') AS `Trunc SYEAR`
  , TRUNC(att_day + INTERVAL 2 MONTHS, 'YEAR') AS `Trunc YEAR`
FROM t1

You can also have access to a list with information about any errors, as well as where they occur. In this example, The TRUNC function in Spark only works with a few data formats, so you can not use it with "SYEAR"

[
  {
    "trunc": "MEDIUM:20:[{'add': ['att_day', {'interval': [2,'MONTHS']}]}, {'literal': 'SYEAR'}]"
  }
] 

Information about the errors will also appear in the output log.

dora_parser 2021-08-12 17:24:24,650 WARNING _TRUNC_ Spark 
data formats:['YEAR', 'YYYY', 'YY', 'QUARTER', 'MONTH', 'MM', 'MON', 'WEEK']
dora_parser 2021-08-12 17:24:24,650 WARNING resolve trunc NotImplemented: 
--TRANSPILER:MEDIUM:LEVEL20:'trunc'

Translate Script

Are considered an script any type of string with multiple SQL statements

I. Import Module

from dora_parser.reader import Reader

II. Translate

script="""
INSERT INTO t.customer SELECT DCEIL(p_sale) FROM Customers;

COMPUTE STATS customer;

SELECT staff_id, staff_name, CHAR_LENGTH(staff_name) AS lengthofname, COUNT(*) order_count  
FROM sales.orders 
WHERE YEAR(order_date) = 2021 
GROUP BY staff_id;
"""
reader = Reader(from_dialect='impala',to_dialect='athena')
result, errors, n_queries = reader.translate_script(script)

Give you as result

INSERT INTO t.customer SELECT CEIL(p_sale) FROM Customers;

/* STATEMENT ERRORS:COMPUTE STATS*/
COMPUTE STATS customer;

SELECT staff_id, staff_name, LENGTH(staff_name) AS lengthofname, 
COUNT(*) AS order_count 
FROM sales.orders 
WHERE YEAR(order_date) = 2021 
GROUP BY staff_id;

III. Generate a summary list (optional)

summary = reader.create_summary(errors, n_queries)

value for summary variable:

[
  {"N_queries": 3}, 
  {"Success": 2}, 
  {"Failed": {"HARD": 1}}, 
  {"Er_types": ["compute stats"]}
]

Translate multiple Files

I. Import Module

from dora_parser.reader import Reader

II. Translate

dir_impala = 'scrpits/impala/'
dir_spark = 'scripts/spark'
reader = Reader(from_dialect='impala', to_dialect='spark', input_dir =dir_impala,output_dir=dir_spark)
reader.translate_files()

The translated files will be saved to folders in the output directory according to the result of the translation. If you don't specify the output directory, the resulting folders will be in the input directory.

III. Generate a migration report (optional)

You can also have access to a report in HTML with an overview of the result of migration process. To do this, set the migration_report argument equal to True".

reader = Reader(from_dialect='impala', to_dialect='spark', input_dir =dir_impala,output_dir=dir_spark, migration_report=True)
reader.translate_files()

IV. Generate a summary (optional)

If you want to access a summary dictionary of the migration process, set the summary_dict argument equal to True, as in the example below:

reader = Reader(from_dialect='impala', to_dialect='spark', input_dir =dir_impala,output_dir=dir_spark)
reader.translate_files(summary_dict=True)
{
    "Input_dir": "/scripts/impala",
    "From_dialect": "impala",
    "To_dialect": "spark",
    "Sucess_files": 3,
    "Failed_files": 1,
    "Files": {
        "row_benchmark.sql": [
            {"N_queries": 7},
            {"Success": 7},
            {"Failed": {}},
            {"Er_types": []}],
        "customer_facts.sql": [
            {"N_queries": 12},
            {"Success": 12},
            {"Failed": {}},
            {"Er_types": []}],
        "document.sh": [
            {"N_queries": 1},
            {"Success": 1},
            {"Failed": {}},
            {"Er_types": []}],
        "orches.sql": [
            {"N_queries": 9}, 
            {"Success": 5}, 
            {"Failed": {"HARD": 3}}, 
            {"Er_types": ["create role", "appx_median", "parser"]}]
    }
}

Error Types

  • Parser: Errors when generating the tree structure.
  • Transpiler: Unimplemented functions and their levels of complexity.
  • Statement: Commands that are not capable of being reproduced, since there are no equivalent in the target language.

Supported Languages

  • Apache Impala 2.12
  • Hive 3.12
  • Spark 3.1
  • Presto 0.217

Getting Help

We use GitHub issues for tracking bugs, questions and feature requests.

Contributing

Please read through this contributing document to get start and before submitting any issues or pull requests to ensure we have all the necessary information to effectively respond to your contribution.


Dora Project is a recent open-source project based on technology developed at Compasso UOL

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dora-parser-0.1.3.tar.gz (47.3 kB view details)

Uploaded Source

Built Distribution

dora_parser-0.1.3-py3-none-any.whl (53.9 kB view details)

Uploaded Python 3

File details

Details for the file dora-parser-0.1.3.tar.gz.

File metadata

  • Download URL: dora-parser-0.1.3.tar.gz
  • Upload date:
  • Size: 47.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.12 CPython/3.8.2 Linux/5.11.0-1022-azure

File hashes

Hashes for dora-parser-0.1.3.tar.gz
Algorithm Hash digest
SHA256 a2f1e52a312601d1950759bc18dd02416419e1205021795b959dec1a3dd68794
MD5 2e7d1ffbaabe9e6728c63a49de4807f1
BLAKE2b-256 9e511ea8a3b49a1a32d0e86afd84fbc29e621e10f8162e887abc9ce7cb18673b

See more details on using hashes here.

File details

Details for the file dora_parser-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: dora_parser-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 53.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.12 CPython/3.8.2 Linux/5.11.0-1022-azure

File hashes

Hashes for dora_parser-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 f35ede207e65a891cfc4c30fb5b5330e57ecaa8f36daa7d60e82b985e5800ddc
MD5 777310f399b3c0c2a501fed97ebe0d91
BLAKE2b-256 19aaf98f96bbb1fc1775340e7f5aad1ac1a83989c232729b4f2a0ba6b386e34d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page