Skip to main content

Parsing DDL files of HIVE

Project description

hql_parser

A parser package which extracts fields, types and comments from a HIVE DDL. For instance, given the next ddl.sql file

CREATE TABLE `school.student`(
  `dni` varchar(100) COMMENT 'Identificator National Number', 
  `first_name` varchar(10) COMMENT 'First name', 
  `second_name` varchar(50) COMMENT 'Second name', 
  `age` int COMMENT 'How old is this student', 
  `nickname` varchar(30) COMMENT 'Nickname', 
  `flg_estado` smallint COMMENT 'Flag (1 - Active, 0 - No Active)')
CLUSTERED BY (dni) 
INTO 1 BUCKETS
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.ql.io.orc.OrcSerde' 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
  'hdfs://nnha/data/environment/datalake/school/student'
TBLPROPERTIES (
  'last_modified_by'='root', 
  'last_modified_time'='1662590768', 
  'numFiles'='1600', 
  'totalSize'='1913197', 
  'transactional'='true', 
  'transient_lastDdlTime'='1666985788')

It can be parsed as followed:

from hql_parser import DDL_Handler
ddlh = DDL_Handler()
obj = ddlh.file_parser('ddl.sql')
print(obj)

The result is a three-items list:

  • Position 0: schema name

  • Position 1: table name

  • Position 2: a list of table field with following the format {'field': '', 'ttype': '', 'comment': ''}

This example prints the next output

[
  'school', 
  'student', 
  [
    {'field': 'dni', 'ttype': 'varchar(100)', 'comment': 'Identificator National Number'}, 
    {'field': 'first_name', 'ttype': 'varchar(10)', 'comment': 'First name'}, 
    {'field': 'second_name', 'ttype': 'varchar(50)', 'comment': 'Second name'}, 
    {'field': 'age', 'ttype': 'int', 'comment': 'How old is this student'}, 
    {'field': 'nickname', 'ttype': 'varchar(30)', 'comment': 'Nickname'}, 
    {'field': 'flg_estado', 'ttype': 'smallint', 'comment': 'Flag (1 - Active, 0 - No Active)'}
  ]
]

On the other hand, we can parse a content variable as next:

obj = ddl_parser(ddl_content_str)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hql_parser-0.0.7.tar.gz (3.8 kB view details)

Uploaded Source

Built Distribution

hql_parser-0.0.7-py2.py3-none-any.whl (3.6 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file hql_parser-0.0.7.tar.gz.

File metadata

  • Download URL: hql_parser-0.0.7.tar.gz
  • Upload date:
  • Size: 3.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.10

File hashes

Hashes for hql_parser-0.0.7.tar.gz
Algorithm Hash digest
SHA256 92da001b694cbbf6676f79cd8168d72d8f91091a149feadff92c096b01fed780
MD5 85390e6b6b3ee35e02b4440ac7e6f804
BLAKE2b-256 e14a602d2a6326b4ebaee96ab40eec71031e4017e837a6ee19815fcaaa5a6f22

See more details on using hashes here.

File details

Details for the file hql_parser-0.0.7-py2.py3-none-any.whl.

File metadata

  • Download URL: hql_parser-0.0.7-py2.py3-none-any.whl
  • Upload date:
  • Size: 3.6 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.10

File hashes

Hashes for hql_parser-0.0.7-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 61ffe3aab3774a26c3b8b8dd109930578380231773ca0c1582d2f736ca36c9c9
MD5 034aa3db32a2dfd8c961ce2180ea68d9
BLAKE2b-256 0e25a45a36589f4b3e581a91251f8ca1e6aed113958b669670a5585d0a5a4f25

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page