Parsing DDL files of HIVE
Project description
hql_parser
A parser package which extracts fields, types and comments from a HIVE DDL. For instance, given the next ddl.sql
file
CREATE TABLE `school.student`(
`dni` varchar(100) COMMENT 'Identificator National Number',
`first_name` varchar(10) COMMENT 'First name',
`second_name` varchar(50) COMMENT 'Second name',
`age` int COMMENT 'How old is this student',
`nickname` varchar(30) COMMENT 'Nickname',
`flg_estado` smallint COMMENT 'Flag (1 - Active, 0 - No Active)')
CLUSTERED BY (dni)
INTO 1 BUCKETS
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
'hdfs://nnha/data/environment/datalake/school/student'
TBLPROPERTIES (
'last_modified_by'='root',
'last_modified_time'='1662590768',
'numFiles'='1600',
'totalSize'='1913197',
'transactional'='true',
'transient_lastDdlTime'='1666985788')
It can be parsed as followed:
from hql_parser import DDL_Handler
ddlh = DDL_Handler()
obj = ddlh.file_parser('ddl.sql')
print(obj)
The result is a three-items list:
-
Position 0: schema name
-
Position 1: table name
-
Position 2: a list of table field with following the format
{'field': '', 'ttype': '', 'comment': ''}
This example prints the next output
[
'school',
'student',
[
{'field': 'dni', 'ttype': 'varchar(100)', 'comment': 'Identificator National Number'},
{'field': 'first_name', 'ttype': 'varchar(10)', 'comment': 'First name'},
{'field': 'second_name', 'ttype': 'varchar(50)', 'comment': 'Second name'},
{'field': 'age', 'ttype': 'int', 'comment': 'How old is this student'},
{'field': 'nickname', 'ttype': 'varchar(30)', 'comment': 'Nickname'},
{'field': 'flg_estado', 'ttype': 'smallint', 'comment': 'Flag (1 - Active, 0 - No Active)'}
]
]
On the other hand, we can parse a content variable as next:
obj = ddl_parser(ddl_content_str)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
hql_parser-0.0.5.tar.gz
(3.8 kB
view details)
Built Distribution
File details
Details for the file hql_parser-0.0.5.tar.gz
.
File metadata
- Download URL: hql_parser-0.0.5.tar.gz
- Upload date:
- Size: 3.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.8.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3387a002a4410c8a8f7ea7e2d7cb2099ad86ea66fdaaa43e76e6480c5708596d |
|
MD5 | aeae85580b827d79ce9f9038e05c3884 |
|
BLAKE2b-256 | d9c46847e0d365f46c08d6f928f6bceb1d9d9dd94c62c0756562be8f98c391b7 |
File details
Details for the file hql_parser-0.0.5-py2.py3-none-any.whl
.
File metadata
- Download URL: hql_parser-0.0.5-py2.py3-none-any.whl
- Upload date:
- Size: 3.6 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.8.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b466611a2012d8cd85ecc6fb4913b9ee17e52c21968f1cae477a3b816fd650df |
|
MD5 | 4fbf7be59bf9fd26593622a321947eeb |
|
BLAKE2b-256 | b078c1540d690fce1da9e9276e58067d490a33c9cdd1beb25c131131e7f6379f |