Skip to main content

A fast LookML parser.

Project description

CircleCI Codecov

lkml

A blazing fast LookML parser implemented in pure Python.

Why should you use it?

  • Tested on over 160K lines of LookML from public repositories on GitHub
  • Parses a typical view or model file in < 15 ms
  • Written in pure, modern Python 3.7 with no external dependencies
  • A full unit test suite with excellent coverage

How do I install it?

lkml is a currently a WIP project. When it's finished, it will be available to install on pip via the following command:

pip install lkml

How do I run it?

You can run lkml from the command line or import it as a Python package.

From the command line, lkml accepts a single positional argument: the path to the LookML file you wish to parse. It returns the parsed result to the console as a JSON string.

Here's an example:

lkml path/to/file.view.lkml

If you would like to save the result to a file, use the following approach:

lkml path/to/file.view.lkml > path/to/result.json

When running from the command line, pass the debug flag (-d or --debug) to observe how the parser is attempting to navigate and parse the file.

lkml path/to/file.view.lkml --debug

The debug statements indicate how the parser is descending through the LookML, expecting certain grammar (e.g. [pair] = key value), and checking tokens against the expected grammar.

lkml.parser . Try to parse [pair] = key value
lkml.parser . . Try to parse [key] = literal ':'
lkml.parser . . . Check LiteralToken(type) == LiteralToken
lkml.parser . . . Check ValueToken() == ValueToken
lkml.parser . . Successfully parsed key.
lkml.parser . . Try to parse [value] = literal / quoted_literal / expression_block
lkml.parser . . . Check LiteralToken(full_outer) == QuotedLiteralToken or LiteralToken
lkml.parser . . Successfully parsed value.
lkml.parser . Successfully parsed pair.

As a Python package, lkml uses a similar interface as the json and yaml packages. The package has a single function, load, which accepts a file object and returns a dictionary with the parsed result.

Here's an example:

import lkml

with open('path/to/file.view.lkml', 'r') as file:
    result = lkml.load(file)

What does the parsed LookML look like?

From the command line, lkml returns the parsed result as a JSON string. As a Python package, lkml returns a dictionary with the parsed result.

A number of LookML parameters can be repeated, like dimension, include, or view. lkml collects these parameters into lists with a plural key (e.g. dimensions).

{
  "connection": "connection_name",
  "explores": [
    {
      "label": "Explore",
      "joins": [
        {
          "relationship": "many_to_one",
          "type": "inner",
          "sql_on": "${view_one.dimension} = ${view_two.dimension}",
          "name": "view_two"
        },
        {
          "relationship": "one_to_many",
          "type": "inner",
          "sql_on": "${view_one.dimension} = ${view_three.dimension}",
          "name": "view_three"
        }
      ],
      "name": "view_one"
    },
  ]
}

How does it work?

lkml is made up of two main components, a lexer and a parser. The parser is a recursive descent parser with backtracking.

The lexer scans through the input string character by character and generates a stream of relevant tokens. The lexer skips over whitespace when it's not relevant.

For example, the input string:

"sql: ${TABLE}.order_date ;;"

would be broken into the tuple:

(
    LiteralToken(sql),
    ValueToken(),
    ExpressionBlockToken(${TABLE}.order_date),
    ExpressionBlockEndToken()
)

The parser scans through the stream of tokens. It attempts to identify productions in the grammar, descending recursively through a production looking for a match.

If it doesn't find a match, it backtracks to a previously marked point in the stream and tries a different production. If the parser runs out of productions to try, it raises a syntax error.

TODO:

  • Implement models and explores
  • Add CI for mypy and pytest
  • Add repo badges
  • Test with code comments
  • Adjust dimensions, measures, etc. to be dicts instead of lists
  • Test with funky derived table syntax
  • Support HTML blocks
  • Support CLI invocation
  • Support value formats with escaped quotes
  • Support blank lists
  • Improve error handling to return the location of the syntax error
  • Test with more LookML, consider GitHub for a larger sample
  • Improve debug logging and add a command line flag
  • Performance benchmarking and profiling
  • Reach 100% coverage for parser.py and lexer.py
  • Implement checking for scanning literals to make sure they're valid
  • Handle plural keys outside of blocks
  • Add docstrings
  • Support parsing from a Unicode string instead of a file

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lkml-0.0.1.tar.gz (8.1 kB view hashes)

Uploaded Source

Built Distribution

lkml-0.0.1-py2.py3-none-any.whl (8.8 kB view hashes)

Uploaded Python 2 Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page