Parsing DDL files of HIVE
Project description
hql_parser
A parser package which extracts fields, types and comments from a HIVE DDL. For instance, given the next ddl.sql
file
CREATE TABLE `school.student`(
`dni` varchar(100) COMMENT 'Identificator National Number',
`first_name` varchar(10) COMMENT 'First name',
`second_name` varchar(50) COMMENT 'Second name',
`age` int COMMENT 'How old is this student',
`nickname` varchar(30) COMMENT 'Nickname',
`flg_estado` smallint COMMENT 'Flag (1 - Active, 0 - No Active)')
CLUSTERED BY (dni)
INTO 1 BUCKETS
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
'hdfs://nnha/data/environment/datalake/school/student'
TBLPROPERTIES (
'last_modified_by'='root',
'last_modified_time'='1662590768',
'numFiles'='1600',
'totalSize'='1913197',
'transactional'='true',
'transient_lastDdlTime'='1666985788')
It can be parsed as followed:
from hql_parser import DDL_Handler
ddlh = DDL_Handler()
obj = ddlh.file_parser('ddl.sql')
print(obj)
The result is a three-items list:
-
Position 0: schema name
-
Position 1: table name
-
Position 2: a list of table field with following the format
{'field': '', 'ttype': '', 'comment': ''}
This example prints the next output
[
'school',
'student',
[
{'field': 'dni', 'ttype': 'varchar(100)', 'comment': 'Identificator National Number'},
{'field': 'first_name', 'ttype': 'varchar(10)', 'comment': 'First name'},
{'field': 'second_name', 'ttype': 'varchar(50)', 'comment': 'Second name'},
{'field': 'age', 'ttype': 'int', 'comment': 'How old is this student'},
{'field': 'nickname', 'ttype': 'varchar(30)', 'comment': 'Nickname'},
{'field': 'flg_estado', 'ttype': 'smallint', 'comment': 'Flag (1 - Active, 0 - No Active)'}
]
]
On the other hand, we can parse a content variable as next:
obj = ddl_parser(ddl_content_str)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
hql_parser-0.0.1.tar.gz
(3.5 kB
view details)
Built Distribution
File details
Details for the file hql_parser-0.0.1.tar.gz
.
File metadata
- Download URL: hql_parser-0.0.1.tar.gz
- Upload date:
- Size: 3.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.8.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 00b0845082cf5ebb85e02891e9b862764fd1bc1bbc32832bd556787ad1d3eead |
|
MD5 | 453297b26c4365c436672e99f0be9f44 |
|
BLAKE2b-256 | 7326cd93547804506bdeb32802ebd7b1b4fb130e16028903a0b42c5c9f1172c4 |
File details
Details for the file hql_parser-0.0.1-py2.py3-none-any.whl
.
File metadata
- Download URL: hql_parser-0.0.1-py2.py3-none-any.whl
- Upload date:
- Size: 3.3 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.8.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f04346724968bbe7a4310e4dee63310d2f79c782c816131185184fa6de173d5e |
|
MD5 | 0541f6ff05af551231d24d606af1258c |
|
BLAKE2b-256 | 9e4f5aeafb0b97660b69a708ca71b2f512005ab4717387e61f0dcb79271fa854 |