Skip to main content

Parsing DDL files of HIVE

Project description

hql_parser

A parser package which extracts fields, types and comments from a HIVE DDL. For instance, given the next ddl.sql file

CREATE TABLE `school.student`(
  `dni` varchar(100) COMMENT 'Identificator National Number', 
  `first_name` varchar(10) COMMENT 'First name', 
  `second_name` varchar(50) COMMENT 'Second name', 
  `age` int COMMENT 'How old is this student', 
  `nickname` varchar(30) COMMENT 'Nickname', 
  `flg_estado` smallint COMMENT 'Flag (1 - Active, 0 - No Active)')
CLUSTERED BY (dni) 
INTO 1 BUCKETS
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.ql.io.orc.OrcSerde' 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
  'hdfs://nnha/data/environment/datalake/school/student'
TBLPROPERTIES (
  'last_modified_by'='root', 
  'last_modified_time'='1662590768', 
  'numFiles'='1600', 
  'totalSize'='1913197', 
  'transactional'='true', 
  'transient_lastDdlTime'='1666985788')

It can be parsed as followed:

from hql_parser import DDL_Handler
ddlh = DDL_Handler()
obj = ddlh.file_parser('ddl.sql')
print(obj)

The result is a three-items list:

  • Position 0: schema name

  • Position 1: table name

  • Position 2: a list of table field with following the format {'field': '', 'ttype': '', 'comment': ''}

This example prints the next output

[
  'school', 
  'student', 
  [
    {'field': 'dni', 'ttype': 'varchar(100)', 'comment': 'Identificator National Number'}, 
    {'field': 'first_name', 'ttype': 'varchar(10)', 'comment': 'First name'}, 
    {'field': 'second_name', 'ttype': 'varchar(50)', 'comment': 'Second name'}, 
    {'field': 'age', 'ttype': 'int', 'comment': 'How old is this student'}, 
    {'field': 'nickname', 'ttype': 'varchar(30)', 'comment': 'Nickname'}, 
    {'field': 'flg_estado', 'ttype': 'smallint', 'comment': 'Flag (1 - Active, 0 - No Active)'}
  ]
]

On the other hand, we can parse a content variable as next:

obj = ddl_parser(ddl_content_str)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hql_parser-0.0.4.tar.gz (3.5 kB view details)

Uploaded Source

Built Distribution

hql_parser-0.0.4-py2.py3-none-any.whl (3.4 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file hql_parser-0.0.4.tar.gz.

File metadata

  • Download URL: hql_parser-0.0.4.tar.gz
  • Upload date:
  • Size: 3.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.10

File hashes

Hashes for hql_parser-0.0.4.tar.gz
Algorithm Hash digest
SHA256 f6c2a4e487ee67acf639e3a17fcc3a3676cbfaf8fc9dd3d97f292429e63135ee
MD5 4838938c9131b65a09da089580036f92
BLAKE2b-256 6e0bc15fd9fae3d8baed88820d944a2f2fd0955ff9b545e2ba8c423ad2fe4de8

See more details on using hashes here.

File details

Details for the file hql_parser-0.0.4-py2.py3-none-any.whl.

File metadata

  • Download URL: hql_parser-0.0.4-py2.py3-none-any.whl
  • Upload date:
  • Size: 3.4 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.10

File hashes

Hashes for hql_parser-0.0.4-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 6f638b359294e1c44b26fd40929fbdde211efa11ceb7d64ad476e354bcf7bfdb
MD5 f6401ff1dce83f35c9ed41528669c3a8
BLAKE2b-256 4bac5705f0d82b98295efc650e6b9b2f7b19094602a9b3f76c08758741c132a0

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page