DBML parser using Lark.
Project description
Lark-DBML
A Python parser for Database Markup Language (DBML) built with the powerful LARK parsing toolkit, utilizing the Earley algorithm for robust and flexible parsing.
Features
- High-Performance:
lark-dbmlsupports both the Earley and LALR(1) algorithms. The LALR(1) algorithm inlarkhas the best performance compared to Parsimonious, PyParsing, and ANTLR, according to lark. - Standalone Mode: the package does not require
larkpackage by default. The whole parsing is packed in a single Python file generated from the EBNF grammar file. - Standalone Mode: fully support DBML latest specification - April 2025.
- Pydantic Validation: Ensures the parsed DBML data conforms to a well-defined structure using Pydantic 2.11, providing reliable data integrity.
- Structured Output: Generates Python objects representing your DBML diagram, making it easy to programmatically access and manipulate your database schema.
- Future-Proof: the parser accepts any properties or settings that are not defined in the DBML spec.
- Powerful Conversion & Tooling:
- DBML Round-Trip: The package supports full round-trip conversion, allowing to parse DBML, programmatically manipulate the Pydantic models, and then generate the DBML back out.
- SQL: convert Pydantic output model to SQL with sqlglot.
- Data Contract: Transform your DBML models into data contract specification.
Milestones
- DBML Parser - Earley
- SQL Converter
- DBML Converter
- Data Contract Converter
- Optimised DBML Parser - LALR(1)
- CLI - TBD
- Generate DBML from a database connection string - TBD
Installation
You can install lark-dbml using pip:
pip install lark-dbml
To use lark mode when standalone_mode is set as False in the load function
pip install lark-dbml
To use SQL converter
pip install "lark-dbml[sql]"
Usage
Output Structure
Diagram - a Pydantic model - defines the expected structure of the parsed DBML content, ensuring consistency and type safety.
class Diagram(BaseModel):
project: Project
enums: list[Enum] | None = []
table_groups: list[TableGroup] | None = []
sticky_notes: list[Note] | None = []
references: list[Reference] | None = []
tables: list[Table] | None = []
table_partials : list[TablePartial] | None = []
Parser
lark-dbml uses the same API as other parser packages in Python. The default option is standalone mode with LALR(1) algorithm. Beside default parameters, load and loads accept any options used by the Lark parser, which can be found at this link
Load DBML
from lark_dbml import load, loads
# 1. Read from a string
dbml = """
Project "My Database" {
database_type: 'PostgreSQL'
Note: "This is a sample database"
}
Table "users" {
id int [pk, increment]
username varchar [unique, not null]
email varchar [unique]
created_at timestamp [default: `now()`]
}
Table "posts" {
id int [pk, increment]
title varchar
content text
user_id int
}
Ref: posts.user_id > users.id
"""
# Default option
diagram = loads(dbml)
# Change to Lark mode
diagram = loads(dbml, standalone_mode=False)
# Switch to Earley algorithm
diagram = loads(dbml, parser="earley")
# 2. Read from a file
diagram = load('example.dbml')
The parser can read any settings or properties in DBML objects even if the spec doesn't define them.
diagram = loads("""
Table myTable [newkey: 'random_value'] {
id int [pk]
}
""")
>>> diagram.tables[0].settings
TableSettings(note=None, header_color=None, newkey='random_value')
Dump DBML
from lark_dbml import dump, dumps
from lark_dbml.converter.dbml import DBMLConverterSettings
from lark_dbml.schema import (
Column,
ColumnSettings,
DataType,
Diagram,
Table,
TableSettings
)
diagram = Diagram(
tables=[
Table(
name="body",
alias="full_table",
note="Incorporated with header and footer",
settings=TableSettings(
headercolor="#3498DB",
note="header note",
partitioned_by="id"
),
columns=[
Column(
name="id",
data_type=DataType(sql_type="int"),
settings=ColumnSettings(
is_primary_key=True, note="why is id behind name?"
),
),
Column(
name="audit_date",
data_type=DataType(sql_type="timestamp"),
settings=ColumnSettings(default="`getdate()`"),
),
],
)
]
)
# This converts the diagram to DBML,
# but partitioned_by will not be included
dumps(diagram)
# This includes partitioned_by in the output
dumps(diagram,
settings=DBMLConverterSettings(
allow_extra=True
)
)
# Write the DBML to file
dump(diagram, 'diagram.dbml')
Converters
SQL
SQL conversion is backed by sqlglot package. The underlying code converts the output Pydantic model to sqlglot's AST Expression. Using sqlglot helps transpilation to any SQL dialect easily.
NOTE THAT: the output SQL is not guaranteed to be perfect or completely functional due to differences between dialects. If you find any issue, please create a new issue in Github :)
from lark_dbml import load
from lark_dbml.converter import to_sql
from sqlglot import Dialects
# Load DBML diagram
diagram = load("diagram.dbml")
# Convert to SQL for PostgreSQL
sql = to_sql(diagram, Dialects.POSTGRES)
Data Contract
Convert DBML diagram to a Data Contract spec file. The basic usage just convert tables and columns to "model" and "definition" sections. However, lark-dbml supports settings to extract more information from a DBML - please expand Advanced Usage.
Basic example
from lark_dbml import load
from lark_dbml.converter import to_data_contract
# Load DBML diagram
diagram = load("diagram.dbml")
# Convert to SQL for PostgreSQL
data_contract = to_data_contract(diagram)
Advanced Usage
You can leverage Sticky Notes in DBML to store information about "terms" and "servers" in JSON or YAML format. Then, you can set note_as_fields in the settings to parse and include those information in the generated contract. Here is the example
import json
from lark_dbml import load
from lark_dbml.converter import to_data_contract
from lark_dbml.converter.datacontract import DataContractConverterSettings
# complex_datacontract.dbml inside the exmaples folder in this repo
diagram = load('examples/complex_datacontract.dbml')
# project_as_info: properties in Project are put into "info"
# note_as_description: note in Table Settings is treated as model description.
# note_as_fields: inline note in Table is parsed and extends the corresponding model's properties.
# deserialization_func is required once note_as_fields is set. This is the function to parse the content inside an inline note. In this example, it's JSON
data_contract = to_data_contract(diagram=diagram,
settings=DataContractConverterSettings(
project_as_info=True,
note_as_description=True,
note_as_fields=True,
deserialization_func=json.loads
))
Development
Contributions are welcome! Please feel free to open issues or submit pull requests.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file lark_dbml-0.6.0.tar.gz.
File metadata
- Download URL: lark_dbml-0.6.0.tar.gz
- Upload date:
- Size: 69.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
404b112b0d1ee8c2f46857b9831833895d9c1b3f6bf365844931a1ee09a490de
|
|
| MD5 |
0996526a993db6a0dff242f6ca124ed0
|
|
| BLAKE2b-256 |
962e70ec05ea7ff2e49865adfa9d2d3de0cef0ef4939e110b32c0b903651ab8f
|
Provenance
The following attestation bundles were made for lark_dbml-0.6.0.tar.gz:
Publisher:
publish.yml on daihuynh/lark-dbml
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
lark_dbml-0.6.0.tar.gz -
Subject digest:
404b112b0d1ee8c2f46857b9831833895d9c1b3f6bf365844931a1ee09a490de - Sigstore transparency entry: 367103703
- Sigstore integration time:
-
Permalink:
daihuynh/lark-dbml@9850724f191d8e97c72c637d6118ad90a057f36f -
Branch / Tag:
refs/tags/v0.6.0 - Owner: https://github.com/daihuynh
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@9850724f191d8e97c72c637d6118ad90a057f36f -
Trigger Event:
push
-
Statement type:
File details
Details for the file lark_dbml-0.6.0-py3-none-any.whl.
File metadata
- Download URL: lark_dbml-0.6.0-py3-none-any.whl
- Upload date:
- Size: 76.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8d36ba5ec27f12fb743f87b0b8296d807c7e3a606d8aa38f69464849fb03d336
|
|
| MD5 |
1798adcea07706a26380ffff238f4081
|
|
| BLAKE2b-256 |
6e30e8e3a1ca9aeb48238e6f8ddef4f1cfa4b8ca278d854aaf096f69707dc055
|
Provenance
The following attestation bundles were made for lark_dbml-0.6.0-py3-none-any.whl:
Publisher:
publish.yml on daihuynh/lark-dbml
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
lark_dbml-0.6.0-py3-none-any.whl -
Subject digest:
8d36ba5ec27f12fb743f87b0b8296d807c7e3a606d8aa38f69464849fb03d336 - Sigstore transparency entry: 367103709
- Sigstore integration time:
-
Permalink:
daihuynh/lark-dbml@9850724f191d8e97c72c637d6118ad90a057f36f -
Branch / Tag:
refs/tags/v0.6.0 - Owner: https://github.com/daihuynh
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@9850724f191d8e97c72c637d6118ad90a057f36f -
Trigger Event:
push
-
Statement type: