Parsing DDL files of HIVE
Project description
hql_parser
A parser package which extracts fields, types and comments from a HIVE DDL. For instance, given the next ddl.sql
file
CREATE TABLE `school.student`(
`dni` varchar(100) COMMENT 'Identificator National Number',
`first_name` varchar(10) COMMENT 'First name',
`second_name` varchar(50) COMMENT 'Second name',
`age` int COMMENT 'How old is this student',
`nickname` varchar(30) COMMENT 'Nickname',
`flg_estado` smallint COMMENT 'Flag (1 - Active, 0 - No Active)')
CLUSTERED BY (dni)
INTO 1 BUCKETS
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
'hdfs://nnha/data/environment/datalake/school/student'
TBLPROPERTIES (
'last_modified_by'='root',
'last_modified_time'='1662590768',
'numFiles'='1600',
'totalSize'='1913197',
'transactional'='true',
'transient_lastDdlTime'='1666985788')
It can be parsed as followed:
from hql_parser import DDL_Handler
ddlh = DDL_Handler()
obj = ddlh.file_parser('ddl.sql')
print(obj)
The result is a three-items list:
-
Position 0: schema name
-
Position 1: table name
-
Position 2: a list of table field with following the format
{'field': '', 'ttype': '', 'comment': ''}
This example prints the next output
[
'school',
'student',
[
{'field': 'dni', 'ttype': 'varchar(100)', 'comment': 'Identificator National Number'},
{'field': 'first_name', 'ttype': 'varchar(10)', 'comment': 'First name'},
{'field': 'second_name', 'ttype': 'varchar(50)', 'comment': 'Second name'},
{'field': 'age', 'ttype': 'int', 'comment': 'How old is this student'},
{'field': 'nickname', 'ttype': 'varchar(30)', 'comment': 'Nickname'},
{'field': 'flg_estado', 'ttype': 'smallint', 'comment': 'Flag (1 - Active, 0 - No Active)'}
]
]
On the other hand, we can parse a content variable as next:
obj = ddl_parser(ddl_content_str)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
hql_parser-0.0.6.tar.gz
(3.8 kB
view details)
Built Distribution
File details
Details for the file hql_parser-0.0.6.tar.gz
.
File metadata
- Download URL: hql_parser-0.0.6.tar.gz
- Upload date:
- Size: 3.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.8.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 100ccccff1704bafb09a610655e445914134ed3b56e3d8d33a1402c96637dbd1 |
|
MD5 | 7ef5a1740da8619b1b419c3f7a4981f7 |
|
BLAKE2b-256 | 1c51953bfd490dc9efe6bf0156d84515dfed194dcc8eb91571db64fc30c591c7 |
File details
Details for the file hql_parser-0.0.6-py2.py3-none-any.whl
.
File metadata
- Download URL: hql_parser-0.0.6-py2.py3-none-any.whl
- Upload date:
- Size: 3.6 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.8.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 309292e5bad59000169aab6975589508a9889feac8e023e5d60f1c8c22e91843 |
|
MD5 | 09be6a3683f99d483047f204ee0f0382 |
|
BLAKE2b-256 | 57874afbbdd405dd5d19db50e14fc35b0aafc4f76c0c8658859ea1eb575ac4c5 |