Skip to main content

Parsing DDL files of HIVE

Project description

hql_parser

A parser package which extracts fields, types and comments from a HIVE DDL. For instance, given the next ddl.sql file

CREATE TABLE `school.student`(
  `dni` varchar(100) COMMENT 'Identificator National Number', 
  `first_name` varchar(10) COMMENT 'First name', 
  `second_name` varchar(50) COMMENT 'Second name', 
  `age` int COMMENT 'How old is this student', 
  `nickname` varchar(30) COMMENT 'Nickname', 
  `flg_estado` smallint COMMENT 'Flag (1 - Active, 0 - No Active)')
CLUSTERED BY (dni) 
INTO 1 BUCKETS
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.ql.io.orc.OrcSerde' 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
  'hdfs://nnha/data/environment/datalake/school/student'
TBLPROPERTIES (
  'last_modified_by'='root', 
  'last_modified_time'='1662590768', 
  'numFiles'='1600', 
  'totalSize'='1913197', 
  'transactional'='true', 
  'transient_lastDdlTime'='1666985788')

It can be parsed as followed:

from hql_parser import DDL_Handler
ddlh = DDL_Handler()
obj = ddlh.file_parser('ddl.sql')
print(obj)

The result is a three-items list:

  • Position 0: schema name

  • Position 1: table name

  • Position 2: a list of table field with following the format {'field': '', 'ttype': '', 'comment': ''}

This example prints the next output

[
  'school', 
  'student', 
  [
    {'field': 'dni', 'ttype': 'varchar(100)', 'comment': 'Identificator National Number'}, 
    {'field': 'first_name', 'ttype': 'varchar(10)', 'comment': 'First name'}, 
    {'field': 'second_name', 'ttype': 'varchar(50)', 'comment': 'Second name'}, 
    {'field': 'age', 'ttype': 'int', 'comment': 'How old is this student'}, 
    {'field': 'nickname', 'ttype': 'varchar(30)', 'comment': 'Nickname'}, 
    {'field': 'flg_estado', 'ttype': 'smallint', 'comment': 'Flag (1 - Active, 0 - No Active)'}
  ]
]

On the other hand, we can parse a content variable as next:

obj = ddl_parser(ddl_content_str)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hql_parser-0.0.5.tar.gz (3.8 kB view details)

Uploaded Source

Built Distribution

hql_parser-0.0.5-py2.py3-none-any.whl (3.6 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file hql_parser-0.0.5.tar.gz.

File metadata

  • Download URL: hql_parser-0.0.5.tar.gz
  • Upload date:
  • Size: 3.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.10

File hashes

Hashes for hql_parser-0.0.5.tar.gz
Algorithm Hash digest
SHA256 3387a002a4410c8a8f7ea7e2d7cb2099ad86ea66fdaaa43e76e6480c5708596d
MD5 aeae85580b827d79ce9f9038e05c3884
BLAKE2b-256 d9c46847e0d365f46c08d6f928f6bceb1d9d9dd94c62c0756562be8f98c391b7

See more details on using hashes here.

File details

Details for the file hql_parser-0.0.5-py2.py3-none-any.whl.

File metadata

  • Download URL: hql_parser-0.0.5-py2.py3-none-any.whl
  • Upload date:
  • Size: 3.6 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.10

File hashes

Hashes for hql_parser-0.0.5-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 b466611a2012d8cd85ecc6fb4913b9ee17e52c21968f1cae477a3b816fd650df
MD5 4fbf7be59bf9fd26593622a321947eeb
BLAKE2b-256 b078c1540d690fce1da9e9276e58067d490a33c9cdd1beb25c131131e7f6379f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page