Skip to main content

DBML parser using Lark.

Project description

codecov Ruff PyPI Downloads

Lark-DBML

A Python parser for Database Markup Language (DBML) built with the powerful LARK parsing toolkit, utilizing the Earley algorithm for robust and flexible parsing.

Features

  • High-Performance: lark-dbml supports both the Earley and LALR(1) algorithms. The LALR(1) algorithm in lark has the best performance compared to Parsimonious, PyParsing, and ANTLR, according to lark.
  • Standalone Mode: the package does not require lark package by default. The whole parsing is packed in a single Python file generated from the EBNF grammar file.
  • Standalone Mode: fully support DBML latest specification - April 2025.
  • Pydantic Validation: Ensures the parsed DBML data conforms to a well-defined structure using Pydantic 2.11, providing reliable data integrity.
  • Structured Output: Generates Python objects representing your DBML diagram, making it easy to programmatically access and manipulate your database schema.
  • Future-Proof: the parser accepts any properties or settings that are not defined in the DBML spec.
  • Powerful Conversion & Tooling:
    • DBML Round-Trip: The package supports full round-trip conversion, allowing to parse DBML, programmatically manipulate the Pydantic models, and then generate the DBML back out.
    • SQL: convert Pydantic output model to SQL with sqlglot.
    • Data Contract: Transform your DBML models into data contract specification.

Milestones

  • DBML Parser - Earley
  • SQL Converter
  • DBML Converter
  • Data Contract Converter
  • Optimised DBML Parser - LALR(1)
  • CLI - TBD
  • Generate DBML from a database connection string - TBD

Installation

You can install lark-dbml using pip:

pip install lark-dbml

To use lark mode when standalone_mode is set as False in the load function

pip install lark-dbml

To use SQL converter

pip install "lark-dbml[sql]"

Usage

Output Structure

Diagram - a Pydantic model - defines the expected structure of the parsed DBML content, ensuring consistency and type safety.

class Diagram(BaseModel):
    project: Project
    enums: list[Enum] | None = []
    table_groups: list[TableGroup] | None = []
    sticky_notes: list[Note] | None = []
    references: list[Reference] | None = []
    tables: list[Table] | None = []
    table_partials : list[TablePartial] | None = []

Parser

lark-dbml uses the same API as other parser packages in Python. The default option is standalone mode with LALR(1) algorithm. Beside default parameters, load and loads accept any options used by the Lark parser, which can be found at this link

Load DBML

from lark_dbml import load, loads

# 1. Read from a string
dbml = """
Project "My Database" {
  database_type: 'PostgreSQL'
  Note: "This is a sample database"
}

Table "users" {
  id int [pk, increment]
  username varchar [unique, not null]
  email varchar [unique]
  created_at timestamp [default: `now()`]
}

Table "posts" {
  id int [pk, increment]
  title varchar
  content text
  user_id int
}

Ref: posts.user_id > users.id
"""

# Default option
diagram = loads(dbml)
# Change to Lark mode
diagram = loads(dbml, standalone_mode=False)
# Switch to Earley algorithm
diagram = loads(dbml, parser="earley")

# 2. Read from a file
diagram = load('example.dbml')

The parser can read any settings or properties in DBML objects even if the spec doesn't define them.

diagram = loads("""
Table myTable [newkey: 'random_value'] {
    id int [pk]
}
""")
>>> diagram.tables[0].settings
TableSettings(note=None, header_color=None, newkey='random_value')

Dump DBML

from lark_dbml import dump, dumps

from lark_dbml.converter.dbml import DBMLConverterSettings
from lark_dbml.schema import (
    Column,
    ColumnSettings,
    DataType,
    Diagram,
    Table,
    TableSettings
)

diagram = Diagram(
    tables=[
        Table(
            name="body",
            alias="full_table",
            note="Incorporated with header and footer",
            settings=TableSettings(
                headercolor="#3498DB",
                note="header note",
                partitioned_by="id"
            ),
            columns=[
                Column(
                    name="id",
                    data_type=DataType(sql_type="int"),
                    settings=ColumnSettings(
                        is_primary_key=True, note="why is id behind name?"
                    ),
                ),
                Column(
                    name="audit_date",
                    data_type=DataType(sql_type="timestamp"),
                    settings=ColumnSettings(default="`getdate()`"),
                ),
            ],
        )
    ]
)

# This converts the diagram to DBML,
# but partitioned_by will not be included
dumps(diagram)

# This includes partitioned_by in the output
dumps(diagram,
      settings=DBMLConverterSettings(
          allow_extra=True
      )
)

# Write the DBML to file
dump(diagram, 'diagram.dbml')

Converters

SQL

SQL conversion is backed by sqlglot package. The underlying code converts the output Pydantic model to sqlglot's AST Expression. Using sqlglot helps transpilation to any SQL dialect easily.

NOTE THAT: the output SQL is not guaranteed to be perfect or completely functional due to differences between dialects. If you find any issue, please create a new issue in Github :)

from lark_dbml import load
from lark_dbml.converter import to_sql
from sqlglot import Dialects

# Load DBML diagram
diagram = load("diagram.dbml")

# Convert to SQL for PostgreSQL
sql = to_sql(diagram, Dialects.POSTGRES)

Data Contract

Convert DBML diagram to a Data Contract spec file. The basic usage just convert tables and columns to "model" and "definition" sections. However, lark-dbml supports settings to extract more information from a DBML - please expand Advanced Usage.

Basic example
from lark_dbml import load
from lark_dbml.converter import to_data_contract

# Load DBML diagram
diagram = load("diagram.dbml")

# Convert to SQL for PostgreSQL
data_contract = to_data_contract(diagram)
Advanced Usage

You can leverage Sticky Notes in DBML to store information about "terms" and "servers" in JSON or YAML format. Then, you can set note_as_fields in the settings to parse and include those information in the generated contract. Here is the example

import json
from lark_dbml import load
from lark_dbml.converter import to_data_contract
from lark_dbml.converter.datacontract import DataContractConverterSettings

# complex_datacontract.dbml inside the exmaples folder in this repo
diagram = load('examples/complex_datacontract.dbml')

# project_as_info: properties in Project are put into "info"
# note_as_description: note in Table Settings is treated as model description.
# note_as_fields: inline note in Table is parsed and extends the corresponding model's properties.
# deserialization_func is required once note_as_fields is set. This is the function to parse the content inside an inline note. In this example, it's JSON

data_contract = to_data_contract(diagram=diagram,
                       settings=DataContractConverterSettings(
                        project_as_info=True,
                        note_as_description=True,
                        note_as_fields=True,
                        deserialization_func=json.loads
                       ))

Development

Contributions are welcome! Please feel free to open issues or submit pull requests.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lark_dbml-0.6.0.tar.gz (69.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

lark_dbml-0.6.0-py3-none-any.whl (76.8 kB view details)

Uploaded Python 3

File details

Details for the file lark_dbml-0.6.0.tar.gz.

File metadata

  • Download URL: lark_dbml-0.6.0.tar.gz
  • Upload date:
  • Size: 69.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for lark_dbml-0.6.0.tar.gz
Algorithm Hash digest
SHA256 404b112b0d1ee8c2f46857b9831833895d9c1b3f6bf365844931a1ee09a490de
MD5 0996526a993db6a0dff242f6ca124ed0
BLAKE2b-256 962e70ec05ea7ff2e49865adfa9d2d3de0cef0ef4939e110b32c0b903651ab8f

See more details on using hashes here.

Provenance

The following attestation bundles were made for lark_dbml-0.6.0.tar.gz:

Publisher: publish.yml on daihuynh/lark-dbml

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file lark_dbml-0.6.0-py3-none-any.whl.

File metadata

  • Download URL: lark_dbml-0.6.0-py3-none-any.whl
  • Upload date:
  • Size: 76.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for lark_dbml-0.6.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8d36ba5ec27f12fb743f87b0b8296d807c7e3a606d8aa38f69464849fb03d336
MD5 1798adcea07706a26380ffff238f4081
BLAKE2b-256 6e30e8e3a1ca9aeb48238e6f8ddef4f1cfa4b8ca278d854aaf096f69707dc055

See more details on using hashes here.

Provenance

The following attestation bundles were made for lark_dbml-0.6.0-py3-none-any.whl:

Publisher: publish.yml on daihuynh/lark-dbml

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page