Parsing DDL files of HIVE
Project description
hql_parser
A parser package which extracts fields, types and comments from a HIVE DDL. For instance, given the next ddl.sql
file
CREATE TABLE `school.student`(
`dni` varchar(100) COMMENT 'Identificator National Number',
`first_name` varchar(10) COMMENT 'First name',
`second_name` varchar(50) COMMENT 'Second name',
`age` int COMMENT 'How old is this student',
`nickname` varchar(30) COMMENT 'Nickname',
`flg_estado` smallint COMMENT 'Flag (1 - Active, 0 - No Active)')
CLUSTERED BY (dni)
INTO 1 BUCKETS
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
'hdfs://nnha/data/environment/datalake/school/student'
TBLPROPERTIES (
'last_modified_by'='root',
'last_modified_time'='1662590768',
'numFiles'='1600',
'totalSize'='1913197',
'transactional'='true',
'transient_lastDdlTime'='1666985788')
It can be parsed as followed:
from hql_parser import DDL_Handler
ddlh = DDL_Handler()
obj = ddlh.file_parser('ddl.sql')
print(obj)
The result is a three-items list:
-
Position 0: schema name
-
Position 1: table name
-
Position 2: a list of table field with following the format
{'field': '', 'ttype': '', 'comment': ''}
This example prints the next output
[
'school',
'student',
[
{'field': 'dni', 'ttype': 'varchar(100)', 'comment': 'Identificator National Number'},
{'field': 'first_name', 'ttype': 'varchar(10)', 'comment': 'First name'},
{'field': 'second_name', 'ttype': 'varchar(50)', 'comment': 'Second name'},
{'field': 'age', 'ttype': 'int', 'comment': 'How old is this student'},
{'field': 'nickname', 'ttype': 'varchar(30)', 'comment': 'Nickname'},
{'field': 'flg_estado', 'ttype': 'smallint', 'comment': 'Flag (1 - Active, 0 - No Active)'}
]
]
On the other hand, we can parse a content variable as next:
obj = ddl_parser(ddl_content_str)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
hql_parser-0.0.7.tar.gz
(3.8 kB
view details)
Built Distribution
File details
Details for the file hql_parser-0.0.7.tar.gz
.
File metadata
- Download URL: hql_parser-0.0.7.tar.gz
- Upload date:
- Size: 3.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.8.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 92da001b694cbbf6676f79cd8168d72d8f91091a149feadff92c096b01fed780 |
|
MD5 | 85390e6b6b3ee35e02b4440ac7e6f804 |
|
BLAKE2b-256 | e14a602d2a6326b4ebaee96ab40eec71031e4017e837a6ee19815fcaaa5a6f22 |
File details
Details for the file hql_parser-0.0.7-py2.py3-none-any.whl
.
File metadata
- Download URL: hql_parser-0.0.7-py2.py3-none-any.whl
- Upload date:
- Size: 3.6 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.8.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 61ffe3aab3774a26c3b8b8dd109930578380231773ca0c1582d2f736ca36c9c9 |
|
MD5 | 034aa3db32a2dfd8c961ce2180ea68d9 |
|
BLAKE2b-256 | 0e25a45a36589f4b3e581a91251f8ca1e6aed113958b669670a5585d0a5a4f25 |