Parsing DDL files of HIVE
Project description
hql_parser
A parser package which extracts fields, types and comments from a HIVE DDL. For instance, given the next ddl.sql
file
CREATE TABLE `school.student`(
`dni` varchar(100) COMMENT 'Identificator National Number',
`first_name` varchar(10) COMMENT 'First name',
`second_name` varchar(50) COMMENT 'Second name',
`age` int COMMENT 'How old is this student',
`nickname` varchar(30) COMMENT 'Nickname',
`flg_estado` smallint COMMENT 'Flag (1 - Active, 0 - No Active)')
CLUSTERED BY (dni)
INTO 1 BUCKETS
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
'hdfs://nnha/data/environment/datalake/school/student'
TBLPROPERTIES (
'last_modified_by'='root',
'last_modified_time'='1662590768',
'numFiles'='1600',
'totalSize'='1913197',
'transactional'='true',
'transient_lastDdlTime'='1666985788')
It can be parsed as followed:
from hql_parser import DDL_Handler
ddlh = DDL_Handler()
obj = ddlh.file_parser('ddl.sql')
print(obj)
The result is a three-items list:
-
Position 0: schema name
-
Position 1: table name
-
Position 2: a list of table field with following the format
{'field': '', 'ttype': '', 'comment': ''}
This example prints the next output
[
'school',
'student',
[
{'field': 'dni', 'ttype': 'varchar(100)', 'comment': 'Identificator National Number'},
{'field': 'first_name', 'ttype': 'varchar(10)', 'comment': 'First name'},
{'field': 'second_name', 'ttype': 'varchar(50)', 'comment': 'Second name'},
{'field': 'age', 'ttype': 'int', 'comment': 'How old is this student'},
{'field': 'nickname', 'ttype': 'varchar(30)', 'comment': 'Nickname'},
{'field': 'flg_estado', 'ttype': 'smallint', 'comment': 'Flag (1 - Active, 0 - No Active)'}
]
]
On the other hand, we can parse a content variable as next:
obj = ddl_parser(ddl_content_str)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
hql_parser-0.0.4.tar.gz
(3.5 kB
view details)
Built Distribution
File details
Details for the file hql_parser-0.0.4.tar.gz
.
File metadata
- Download URL: hql_parser-0.0.4.tar.gz
- Upload date:
- Size: 3.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.8.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f6c2a4e487ee67acf639e3a17fcc3a3676cbfaf8fc9dd3d97f292429e63135ee |
|
MD5 | 4838938c9131b65a09da089580036f92 |
|
BLAKE2b-256 | 6e0bc15fd9fae3d8baed88820d944a2f2fd0955ff9b545e2ba8c423ad2fe4de8 |
File details
Details for the file hql_parser-0.0.4-py2.py3-none-any.whl
.
File metadata
- Download URL: hql_parser-0.0.4-py2.py3-none-any.whl
- Upload date:
- Size: 3.4 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.8.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6f638b359294e1c44b26fd40929fbdde211efa11ceb7d64ad476e354bcf7bfdb |
|
MD5 | f6401ff1dce83f35c9ed41528669c3a8 |
|
BLAKE2b-256 | 4bac5705f0d82b98295efc650e6b9b2f7b19094602a9b3f76c08758741c132a0 |