Skip to main content

Simple DDL Parser to parse SQL & HQL ddl files to json/python dict with full information about columns: types, defaults, primary keys, etc.

Project description

Simple DDL Parser

badge1 badge2 badge3

Parser tested on DDL for PostgreSQL & Hive.

If you have samples that cause an error - please open the issue, I will be glad to fix it.

This parser take as input SQL DDL statements or files, for example like this:

    create table prod.super_table
(
    data_sync_id bigint not null,
    sync_count bigint not null,
    sync_mark timestamp  not  null,
    sync_start timestamp  not null,
    sync_end timestamp  not null,
    message varchar(2000) null,
    primary key (data_sync_id, sync_start)
);

And produce output like this (information about table name, schema, columns, types and properties):

[{
'columns': [
    {'name': 'data_sync_id', 'type': 'bigint', 'mode': False, 'size': None, 'default': None},
    {'name': 'sync_count', 'type': 'bigint', 'mode': False, 'size': None, 'default': None},
    {'name': 'sync_mark', 'type': 'timestamp', 'mode': False, 'size': None, 'default': None},
    {'name': 'sync_start', 'type': 'timestamp', 'mode': False, 'size': None, 'default': None},
    {'name': 'sync_end', 'type': 'timestamp', 'mode': False, 'size': None, 'default': None},
    {'name': 'message', 'type': 'varchar', 'mode': False, 'size': 2000, 'default': None}],
'table_name': 'super_table', 'schema': 'prod',
'primary_key': ['data_sync_id', 'sync_start']
}]

Or one more example

CREATE TABLE "paths" (
  "id" int PRIMARY KEY,
  "title" varchar NOT NULL,
  "description" varchar(160),
  "created_at" timestamp,
  "updated_at" timestamp
);

and result

[{
'columns': [
    {'name': 'id', 'type': 'int', 'nullable': False, 'size': None, 'default': None},
    {'name': 'title', 'type': 'varchar', 'nullable': False, 'size': None, 'default': None},
    {'name': 'description', 'type': 'varchar', 'nullable': False, 'size': 160, 'default': None},
    {'name': 'created_at', 'type': 'timestamp', 'nullable': False, 'size': None, 'default': None},
    {'name': 'updated_at', 'type': 'timestamp', 'nullable': False, 'size': None, 'default': None}],
'primary_key': ['id'],
'table_name': 'paths', 'schema': ''
}]

If you pass file or text block with more when 1 CREATE TABLE statement when result will be list of such dicts. For example:

Input:

CREATE TABLE "countries" (
  "id" int PRIMARY KEY,
  "code" varchar(4) NOT NULL,
  "name" varchar NOT NULL
);

CREATE TABLE "path_owners" (
  "user_id" int,
  "path_id" int,
  "type" int DEFAULT 1
);

Output:

[
    {'columns': [
        {'name': 'id', 'type': 'int', 'size': None, 'nullable': False, 'default': None},
        {'name': 'code', 'type': 'varchar', 'size': 4, 'nullable': False, 'default': None},
        {'name': 'name', 'type': 'varchar', 'size': None, 'nullable': False, 'default': None}],
     'primary_key': ['id'],
     'table_name': 'countries',
     'schema': None},
    {'columns': [
        {'name': 'user_id', 'type': 'int', 'size': None, 'nullable': False, 'default': None},
        {'name': 'path_id', 'type': 'int', 'size': None, 'nullable': False, 'default': None},
        {'name': 'type', 'type': 'int', 'size': None, 'nullable': False, 'default': 1}],
     'primary_key': [],
     'table_name': 'path_owners',
     'schema': None}
]

How to use

From python code

from simple_ddl_parser import DDLParser


parse_results = DDLParser("""create table dev.data_sync_history(
    data_sync_id bigint not null,
    sync_count bigint not null,
    sync_mark timestamp  not  null,
    sync_start timestamp  not null,
    sync_end timestamp  not null,
    message varchar(2000) null,
    primary key (data_sync_id, sync_start)
); """).run()

print(parse_results)

To parse from file

from simple_ddl_parser import parse_from_file

result = parse_from_file('tests/test_one_statement.sql')
print(result)

More examples & tests

You can find in tests/functional folder.

Dump result in json

To dump result in json use argument .run(dump=True)

You also can provide a path where you want to have a dumps with schema with argument

TODO in next Releases

  1. Support for references (Foreigein key) in column defenition

  2. Support for separate ALTER TABLE statements for Foreigein keys like

ALTER TABLE "material_attachments" ADD FOREIGN KEY ("material_id") REFERENCES "materials" ("id");
  1. Support for parse CREATE INDEX statements

  2. Add command line

  3. Support ARRAYs

Historical context

This library is an extracted parser code from https://github.com/xnuinside/fakeme (Library for fake relation data generation, that I used in several work projects, but did not have time to make from it normal open source library)

For one of the work projects I needed to convert SQL ddl to Python ORM models in auto way and I tried to use https://github.com/andialbrecht/sqlparse but it works not well enough with ddl for my case (for example, if in ddl used lower case - nothing works, primary keys inside ddl are mapped as column name not reserved word and etc.). So I remembered about Parser in Fakeme and just extracted it & improved.

How to contribute

Please describe issue that you want to solve and open the PR, I will review it as soon as possible.

Any questions? Ping me in Telegram: https://t.me/xnuinside

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

simple-ddl-parser-0.2.1a2.tar.gz (11.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

simple_ddl_parser-0.2.1a2-py3-none-any.whl (9.8 kB view details)

Uploaded Python 3

File details

Details for the file simple-ddl-parser-0.2.1a2.tar.gz.

File metadata

  • Download URL: simple-ddl-parser-0.2.1a2.tar.gz
  • Upload date:
  • Size: 11.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.4 CPython/3.8.7 Darwin/19.6.0

File hashes

Hashes for simple-ddl-parser-0.2.1a2.tar.gz
Algorithm Hash digest
SHA256 e14a14598a1bb6f99e40221852cf15ff09fc9baaf9700828d3bee35e17239995
MD5 da30c1c6f1cda244394413ff8297f1e5
BLAKE2b-256 002796a406ac388ccbc56d442f6099e9fe2a2eee49c6d377c117d9b4932e6571

See more details on using hashes here.

File details

Details for the file simple_ddl_parser-0.2.1a2-py3-none-any.whl.

File metadata

File hashes

Hashes for simple_ddl_parser-0.2.1a2-py3-none-any.whl
Algorithm Hash digest
SHA256 2342b2b68d797b5f8b7a9fcd1a6a2040412d9971275f2d7da343e3dfd2779435
MD5 91ed92718c66b1f40f6fc5ddac56fa04
BLAKE2b-256 2689faaedea6ec7ff6b22cdb4c734bba57d120063f7868e266d3591465baa1c5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page