Simple DDL Parser to parse SQL & HQL ddl files to json/python dict with full information about columns: types, defaults, primary keys, etc.
Project description
Simple DDL Parser
Parser tested on DDL for PostgreSQL & Hive.
If you have samples that cause an error - please open the issue, I will be glad to fix it.
This parser take as input SQL DDL statements or files, for example like this:
create table prod.super_table
(
data_sync_id bigint not null,
sync_count bigint not null,
sync_mark timestamp not null,
sync_start timestamp not null,
sync_end timestamp not null,
message varchar(2000) null,
primary key (data_sync_id, sync_start)
);
And produce output like this (information about table name, schema, columns, types and properties):
[{
'columns': [
{'name': 'data_sync_id', 'type': 'bigint', 'mode': False, 'size': None, 'default': None},
{'name': 'sync_count', 'type': 'bigint', 'mode': False, 'size': None, 'default': None},
{'name': 'sync_mark', 'type': 'timestamp', 'mode': False, 'size': None, 'default': None},
{'name': 'sync_start', 'type': 'timestamp', 'mode': False, 'size': None, 'default': None},
{'name': 'sync_end', 'type': 'timestamp', 'mode': False, 'size': None, 'default': None},
{'name': 'message', 'type': 'varchar', 'mode': False, 'size': 2000, 'default': None}],
'table_name': 'super_table', 'schema': 'prod',
'primary_key': ['data_sync_id', 'sync_start']
}]
Or one more example
CREATE TABLE "paths" (
"id" int PRIMARY KEY,
"title" varchar NOT NULL,
"description" varchar(160),
"created_at" timestamp,
"updated_at" timestamp
);
and result
[{
'columns': [
{'name': 'id', 'type': 'int', 'nullable': False, 'size': None, 'default': None},
{'name': 'title', 'type': 'varchar', 'nullable': False, 'size': None, 'default': None},
{'name': 'description', 'type': 'varchar', 'nullable': False, 'size': 160, 'default': None},
{'name': 'created_at', 'type': 'timestamp', 'nullable': False, 'size': None, 'default': None},
{'name': 'updated_at', 'type': 'timestamp', 'nullable': False, 'size': None, 'default': None}],
'primary_key': ['id'],
'table_name': 'paths', 'schema': ''
}]
If you pass file or text block with more when 1 CREATE TABLE statement when result will be list of such dicts. For example:
Input:
CREATE TABLE "countries" (
"id" int PRIMARY KEY,
"code" varchar(4) NOT NULL,
"name" varchar NOT NULL
);
CREATE TABLE "path_owners" (
"user_id" int,
"path_id" int,
"type" int DEFAULT 1
);
Output:
[
{'columns': [
{'name': 'id', 'type': 'int', 'size': None, 'nullable': False, 'default': None},
{'name': 'code', 'type': 'varchar', 'size': 4, 'nullable': False, 'default': None},
{'name': 'name', 'type': 'varchar', 'size': None, 'nullable': False, 'default': None}],
'primary_key': ['id'],
'table_name': 'countries',
'schema': None},
{'columns': [
{'name': 'user_id', 'type': 'int', 'size': None, 'nullable': False, 'default': None},
{'name': 'path_id', 'type': 'int', 'size': None, 'nullable': False, 'default': None},
{'name': 'type', 'type': 'int', 'size': None, 'nullable': False, 'default': 1}],
'primary_key': [],
'table_name': 'path_owners',
'schema': None}
]
How to use
From python code
from simple_ddl_parser import DDLParser
parse_results = DDLParser("""create table dev.data_sync_history(
data_sync_id bigint not null,
sync_count bigint not null,
sync_mark timestamp not null,
sync_start timestamp not null,
sync_end timestamp not null,
message varchar(2000) null,
primary key (data_sync_id, sync_start)
); """).run()
print(parse_results)
To parse from file
from simple_ddl_parser import parse_from_file
result = parse_from_file('tests/test_one_statement.sql')
print(result)
More examples & tests
You can find in tests/functional folder.
Dump result in json
To dump result in json use argument .run(dump=True)
You also can provide a path where you want to have a dumps with schema with argument
TODO in next Releases
Support for references (Foreigein key) in column defenition
Support for separate ALTER TABLE statements for Foreigein keys like
ALTER TABLE "material_attachments" ADD FOREIGN KEY ("material_id") REFERENCES "materials" ("id");
Support for parse CREATE INDEX statements
Add command line
Support ARRAYs
Historical context
This library is an extracted parser code from https://github.com/xnuinside/fakeme (Library for fake relation data generation, that I used in several work projects, but did not have time to make from it normal open source library)
For one of the work projects I needed to convert SQL ddl to Python ORM models in auto way and I tried to use https://github.com/andialbrecht/sqlparse but it works not well enough with ddl for my case (for example, if in ddl used lower case - nothing works, primary keys inside ddl are mapped as column name not reserved word and etc.). So I remembered about Parser in Fakeme and just extracted it & improved.
How to contribute
Please describe issue that you want to solve and open the PR, I will review it as soon as possible.
Any questions? Ping me in Telegram: https://t.me/xnuinside
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file simple-ddl-parser-0.2.1a2.tar.gz.
File metadata
- Download URL: simple-ddl-parser-0.2.1a2.tar.gz
- Upload date:
- Size: 11.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.1.4 CPython/3.8.7 Darwin/19.6.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e14a14598a1bb6f99e40221852cf15ff09fc9baaf9700828d3bee35e17239995
|
|
| MD5 |
da30c1c6f1cda244394413ff8297f1e5
|
|
| BLAKE2b-256 |
002796a406ac388ccbc56d442f6099e9fe2a2eee49c6d377c117d9b4932e6571
|
File details
Details for the file simple_ddl_parser-0.2.1a2-py3-none-any.whl.
File metadata
- Download URL: simple_ddl_parser-0.2.1a2-py3-none-any.whl
- Upload date:
- Size: 9.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.1.4 CPython/3.8.7 Darwin/19.6.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2342b2b68d797b5f8b7a9fcd1a6a2040412d9971275f2d7da343e3dfd2779435
|
|
| MD5 |
91ed92718c66b1f40f6fc5ddac56fa04
|
|
| BLAKE2b-256 |
2689faaedea6ec7ff6b22cdb4c734bba57d120063f7868e266d3591465baa1c5
|