Skip to main content

Parsing DDL files of HIVE

Project description

hql_parser

A parser package which extracts fields, types and comments from a HIVE DDL. For instance, given the next ddl.sql file

CREATE TABLE `school.student`(
  `dni` varchar(100) COMMENT 'Identificator National Number', 
  `first_name` varchar(10) COMMENT 'First name', 
  `second_name` varchar(50) COMMENT 'Second name', 
  `age` int COMMENT 'How old is this student', 
  `nickname` varchar(30) COMMENT 'Nickname', 
  `flg_estado` smallint COMMENT 'Flag (1 - Active, 0 - No Active)')
CLUSTERED BY (dni) 
INTO 1 BUCKETS
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.ql.io.orc.OrcSerde' 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
  'hdfs://nnha/data/environment/datalake/school/student'
TBLPROPERTIES (
  'last_modified_by'='root', 
  'last_modified_time'='1662590768', 
  'numFiles'='1600', 
  'totalSize'='1913197', 
  'transactional'='true', 
  'transient_lastDdlTime'='1666985788')

It can be parsed as followed:

from hql_parser import DDL_Handler
ddlh = DDL_Handler()
obj = ddlh.file_parser('ddl.sql')
print(obj)

The result is a three-items list:

  • Position 0: schema name

  • Position 1: table name

  • Position 2: a list of table field with following the format {'field': '', 'ttype': '', 'comment': ''}

This example prints the next output

[
  'school', 
  'student', 
  [
    {'field': 'dni', 'ttype': 'varchar(100)', 'comment': 'Identificator National Number'}, 
    {'field': 'first_name', 'ttype': 'varchar(10)', 'comment': 'First name'}, 
    {'field': 'second_name', 'ttype': 'varchar(50)', 'comment': 'Second name'}, 
    {'field': 'age', 'ttype': 'int', 'comment': 'How old is this student'}, 
    {'field': 'nickname', 'ttype': 'varchar(30)', 'comment': 'Nickname'}, 
    {'field': 'flg_estado', 'ttype': 'smallint', 'comment': 'Flag (1 - Active, 0 - No Active)'}
  ]
]

On the other hand, we can parse a content variable as next:

obj = ddl_parser(ddl_content_str)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hql_parser-0.0.6.tar.gz (3.8 kB view details)

Uploaded Source

Built Distribution

hql_parser-0.0.6-py2.py3-none-any.whl (3.6 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file hql_parser-0.0.6.tar.gz.

File metadata

  • Download URL: hql_parser-0.0.6.tar.gz
  • Upload date:
  • Size: 3.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.10

File hashes

Hashes for hql_parser-0.0.6.tar.gz
Algorithm Hash digest
SHA256 100ccccff1704bafb09a610655e445914134ed3b56e3d8d33a1402c96637dbd1
MD5 7ef5a1740da8619b1b419c3f7a4981f7
BLAKE2b-256 1c51953bfd490dc9efe6bf0156d84515dfed194dcc8eb91571db64fc30c591c7

See more details on using hashes here.

File details

Details for the file hql_parser-0.0.6-py2.py3-none-any.whl.

File metadata

  • Download URL: hql_parser-0.0.6-py2.py3-none-any.whl
  • Upload date:
  • Size: 3.6 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.10

File hashes

Hashes for hql_parser-0.0.6-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 309292e5bad59000169aab6975589508a9889feac8e023e5d60f1c8c22e91843
MD5 09be6a3683f99d483047f204ee0f0382
BLAKE2b-256 57874afbbdd405dd5d19db50e14fc35b0aafc4f76c0c8658859ea1eb575ac4c5

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page