Skip to main content

Parsing DDL files of HIVE

Project description

hql_parser

A parser package which extracts fields, types and comments from a HIVE DDL. For instance, given the next ddl.sql file

CREATE TABLE `school.student`(
  `dni` varchar(100) COMMENT 'Identificator National Number', 
  `first_name` varchar(10) COMMENT 'First name', 
  `second_name` varchar(50) COMMENT 'Second name', 
  `age` int COMMENT 'How old is this student', 
  `nickname` varchar(30) COMMENT 'Nickname', 
  `flg_estado` smallint COMMENT 'Flag (1 - Active, 0 - No Active)')
CLUSTERED BY (dni) 
INTO 1 BUCKETS
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.ql.io.orc.OrcSerde' 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
  'hdfs://nnha/data/environment/datalake/school/student'
TBLPROPERTIES (
  'last_modified_by'='root', 
  'last_modified_time'='1662590768', 
  'numFiles'='1600', 
  'totalSize'='1913197', 
  'transactional'='true', 
  'transient_lastDdlTime'='1666985788')

It can be parsed as followed:

from hql_parser import DDL_Handler
ddlh = DDL_Handler()
obj = ddlh.file_parser('ddl.sql')
print(obj)

The result is a three-items list:

  • Position 0: schema name

  • Position 1: table name

  • Position 2: a list of table field with following the format {'field': '', 'ttype': '', 'comment': ''}

This example prints the next output

[
  'school', 
  'student', 
  [
    {'field': 'dni', 'ttype': 'varchar(100)', 'comment': 'Identificator National Number'}, 
    {'field': 'first_name', 'ttype': 'varchar(10)', 'comment': 'First name'}, 
    {'field': 'second_name', 'ttype': 'varchar(50)', 'comment': 'Second name'}, 
    {'field': 'age', 'ttype': 'int', 'comment': 'How old is this student'}, 
    {'field': 'nickname', 'ttype': 'varchar(30)', 'comment': 'Nickname'}, 
    {'field': 'flg_estado', 'ttype': 'smallint', 'comment': 'Flag (1 - Active, 0 - No Active)'}
  ]
]

On the other hand, we can parse a content variable as next:

obj = ddl_parser(ddl_content_str)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hql_parser-0.0.2.tar.gz (3.5 kB view details)

Uploaded Source

Built Distribution

hql_parser-0.0.2-py2.py3-none-any.whl (3.4 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file hql_parser-0.0.2.tar.gz.

File metadata

  • Download URL: hql_parser-0.0.2.tar.gz
  • Upload date:
  • Size: 3.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.10

File hashes

Hashes for hql_parser-0.0.2.tar.gz
Algorithm Hash digest
SHA256 eba19197f646e8c614b93479d2242f3cc2886e3983953b9dd75b9e4cfa68f042
MD5 8374c61666955fa739f548d9bd320994
BLAKE2b-256 96e2c5343ee4b8738d86438144a95b810a922ed877da58b6e598985c36d7021d

See more details on using hashes here.

File details

Details for the file hql_parser-0.0.2-py2.py3-none-any.whl.

File metadata

  • Download URL: hql_parser-0.0.2-py2.py3-none-any.whl
  • Upload date:
  • Size: 3.4 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.10

File hashes

Hashes for hql_parser-0.0.2-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 03d7aa614532a9f994e6bd5648c2a4149871ba7f3e08dc58b04c961241f73e69
MD5 ab1abfb17aa4fb24575c6cf9029cf419
BLAKE2b-256 06146f1e16a6bb8080ec399adb594f321130477fd9c463499095ae062a77165d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page