SQL Parser ans Transpiler

These details have not been verified by PyPI

Project links

Project description

Parser

Dora's parser and transpiler tool for some big data SQL dialects, based on Mozilla SQL Parser project.

PyPI - Python Version

Getting Started

An application that will translate source code from a given language (Impala, Spark, Hive, Presto and Athena) and produce equivalent code in another language that has a similar level of abstraction.

Installation

pip install dora-parser

Usage

Translate Query

I. Import Modules

from dora_parser.parser import Parser
from dora_parser.transpiler import Transpiler

II. Generate the parse tree from query

tree = Parser(query)

Where query have the following value:

WITH t1 as (select now() att_day)
SELECT DECODE(DAYOFWEEK(att_day)
            , 1 , 'Monday'
            , 2 , 'Tuesday'
            , 3 , 'Wednesday'
            , 4 , 'Thursday'
            , 5 , 'Friday'
            , 6 , 'Saturday'
            , 7 , 'Sunday'
            , 'Unknown day') "Day of week"
  , TRUNC(MONTHS_ADD(att_day,2),'SYEAR') as "Trunc SYEAR"
  , TRUNC(MONTHS_ADD(att_day,2),'YEAR') as  "Trunc YEAR"
  FROM t1;

creates the tree object as shown below:

{
    "select": [
        {
          "value": {"decode": [
            {"dayofweek": "att_day"},1,
              {"literal": "Monday"},2,
              {"literal": "Tuesday"},3,
              {"literal": "Wednesday"},4,
              {"literal": "Thursday"},5,
              {"literal": "Friday"},6,
              {"literal": "Saturday"},7,
              {"literal": "Sunday"},
              {"literal": "Unknown day"}]},
          "name": "Day of week"},
        {
          "value": {"trunc": [{"months_add": ["att_day",2]},{"literal": "SYEAR"}]},
          "name": "Trunc SYEAR"
        },
        {
          "value": {"trunc": [{"months_add": ["att_day",2]},{"literal": "YEAR"}]},
          "name": "Trunc YEAR"
        }
    ],
    "from": "t1",
    "with": {
        "name": "t1",
        "value": {
          "select": {
            "value": {"now": {}},
            "name": "att_day"}}
    }
}

III. Translate the tree object from your original SQL Dialect (impala) to the new one (spark)

transpiler = Transpiler(from_dialect='impala', to_dialect='spark')
result, errors = transpiler.translate(tree)

the result value will be like

WITH t1 AS (SELECT NOW() AS att_day) 
SELECT CASE
        WHEN DAYOFWEEK(att_day) = 1 THEN 'Monday' 
        WHEN DAYOFWEEK(att_day) = 2 THEN 'Tuesday' 
        WHEN DAYOFWEEK(att_day) = 3 THEN 'Wednesday'
        WHEN DAYOFWEEK(att_day) = 4 THEN 'Thursday' 
        WHEN DAYOFWEEK(att_day) = 5 THEN 'Friday' 
        WHEN DAYOFWEEK(att_day) = 6 THEN 'Saturday' 
        WHEN DAYOFWEEK(att_day) = 7 THEN 'Sunday' 
        ELSE 'Unknown day' 
      END AS `Day of week`
  , TRUNC(att_day + INTERVAL 2 MONTHS, 'SYEAR') AS `Trunc SYEAR`
  , TRUNC(att_day + INTERVAL 2 MONTHS, 'YEAR') AS `Trunc YEAR`
FROM t1

You can also have access to a list with information about any errors, as well as where they occur. In this example, The TRUNC function in Spark only works with a few data formats, so you can not use it with "SYEAR"

[
  {
    "trunc": "MEDIUM:20:[{'add': ['att_day', {'interval': [2,'MONTHS']}]}, {'literal': 'SYEAR'}]"
  }
]

Information about the errors will also appear in the output log.

dora_parser 2021-08-12 17:24:24,650 WARNING _TRUNC_ Spark 
data formats:['YEAR', 'YYYY', 'YY', 'QUARTER', 'MONTH', 'MM', 'MON', 'WEEK']
dora_parser 2021-08-12 17:24:24,650 WARNING resolve trunc NotImplemented: 
--TRANSPILER:MEDIUM:LEVEL20:'trunc'

Translate Script

Are considered an script any type of string with multiple SQL statements

I. Import Module

from dora_parser.reader import Reader

II. Translate

script="""
INSERT INTO t.customer SELECT DCEIL(p_sale) FROM Customers;

COMPUTE STATS customer;

SELECT staff_id, staff_name, CHAR_LENGTH(staff_name) AS lengthofname, COUNT(*) order_count  
FROM sales.orders 
WHERE YEAR(order_date) = 2021 
GROUP BY staff_id;
"""
reader = Reader(from_dialect='impala',to_dialect='presto')
result, errors, n_queries = reader.translate_script(script)

Give you as result

INSERT INTO t.customer SELECT CEIL(p_sale) FROM Customers;

/* STATEMENT ERRORS:COMPUTE STATS*/
COMPUTE STATS customer;

SELECT staff_id, staff_name, LENGTH(staff_name) AS lengthofname, 
COUNT(*) AS order_count 
FROM sales.orders 
WHERE YEAR(order_date) = 2021 
GROUP BY staff_id;

III. Generate a summary list (optional)

summary = reader.create_summary(errors, n_queries)

value for summary variable:

[
  {"N_queries": 3}, 
  {"Success": 2}, 
  {"Failed": {"HARD": 1}}, 
  {"Er_types": ["compute stats"]}
]

Translate multiple Files

I. Import Module

from dora_parser.reader import Reader

II. Translate

dir_impala = 'scrpits/impala/'
dir_spark = 'scripts/spark'
reader = Reader(from_dialect='impala', to_dialect='spark', input_dir =dir_impala,output_dir=dir_spark)
reader.translate_files()

The translated files will be saved to folders in the output directory according to the result of the translation. If you don't specify the output directory, the resulting folders will be in the input directory.

III. Generate a migration report (optional)

You can also have access to a report in HTML with an overview of the result of migration process. To do this, set the migration_report argument equal to True".

reader = Reader(from_dialect='impala', to_dialect='spark', input_dir =dir_impala,output_dir=dir_spark, migration_report=True)
reader.translate_files()

IV. Generate a summary (optional)

If you want to access a summary dictionary of the migration process, set the summary_dict argument equal to True, as in the example below:

reader = Reader(from_dialect='impala', to_dialect='spark', input_dir =dir_impala,output_dir=dir_spark)
reader.translate_files(summary_dict=True)

{
    "Input_dir": "/scripts/impala",
    "From_dialect": "impala",
    "To_dialect": "spark",
    "Sucess_files": 3,
    "Failed_files": 1,
    "Files": {
        "row_benchmark.sql": [
            {"N_queries": 7},
            {"Success": 7},
            {"Failed": {}},
            {"Er_types": []}],
        "customer_facts.sql": [
            {"N_queries": 12},
            {"Success": 12},
            {"Failed": {}},
            {"Er_types": []}],
        "document.sh": [
            {"N_queries": 1},
            {"Success": 1},
            {"Failed": {}},
            {"Er_types": []}],
        "orches.sql": [
            {"N_queries": 9}, 
            {"Success": 5}, 
            {"Failed": {"HARD": 3}}, 
            {"Er_types": ["create role", "appx_median", "parser"]}]
    }
}

Error Types

Parser: Errors when generating the tree structure.
Transpiler: Unimplemented functions and their levels of complexity.
Statement: Commands that are not capable of being reproduced, since there are no equivalent in the target language.

Supported Languages

Apache Impala 2.12
Hive 3.12
Spark 3.1
Presto 0.217

Getting Help

We use GitHub issues for tracking bugs, questions and feature requests.

Contributing

Please read through this contributing document to get start and before submitting any issues or pull requests to ensure we have all the necessary information to effectively respond to your contribution.

Dora Project is a recent open-source project based on technology developed at Compasso UOL

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.3

Dec 13, 2021

0.1.2

Sep 13, 2021

This version

0.1.1.4

Sep 1, 2021

0.1.1.3

Sep 1, 2021

0.1.1.2

Sep 1, 2021

0.1.1.1

Sep 1, 2021

0.1.1

Aug 27, 2021

0.1.0

Aug 26, 2021

0.0.2

Aug 11, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dora-parser-0.1.1.4.tar.gz (46.7 kB view details)

Uploaded Sep 1, 2021 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

dora_parser-0.1.1.4-py3-none-any.whl (53.2 kB view details)

Uploaded Sep 1, 2021 Python 3

File details

Details for the file dora-parser-0.1.1.4.tar.gz.

File metadata

Download URL: dora-parser-0.1.1.4.tar.gz
Upload date: Sep 1, 2021
Size: 46.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.1.6 CPython/3.9.6 Linux/5.10.16.3-microsoft-standard-WSL2

File hashes

Hashes for dora-parser-0.1.1.4.tar.gz
Algorithm	Hash digest
SHA256	`ee076847e012174e365eb8b2308fc2f6b24f8f239b7cfadd3163b16f15ee7d11`
MD5	`7d4034fe858f47eb26e81ed2693553d4`
BLAKE2b-256	`8bfe40f3b7dba31c14709993fe05db42f2d3e3b4dbe4a26007eeba82571b5777`

See more details on using hashes here.

File details

Details for the file dora_parser-0.1.1.4-py3-none-any.whl.

File metadata

Download URL: dora_parser-0.1.1.4-py3-none-any.whl
Upload date: Sep 1, 2021
Size: 53.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.1.6 CPython/3.9.6 Linux/5.10.16.3-microsoft-standard-WSL2

File hashes

Hashes for dora_parser-0.1.1.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`88522aea82245a54006f223d07095ce96dd3f35dc9e02aa572500ef1455069ba`
MD5	`8903f4d85f0819f10c7325bb9ce174e4`
BLAKE2b-256	`61f780605fcae03ccc405a6a4ea04edfa275757d40a62a9d7427eea9241d92bc`

See more details on using hashes here.

dora-parser 0.1.1.4

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Parser

Getting Started

Installation

Usage

Translate Query

Translate Script

Translate multiple Files

Error Types

Supported Languages

Getting Help

Contributing

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes