Skip to main content

A to-SQL Compiler Prototype.

Project description

Flummi

A to-SQL Compiler Prototype.

Flummi is a research compiler prototype for to-SQL compilation, implementing methods we've researched and developed at the database chair of the University of Tübingen---see the list of related publications below.

Usage

The Flummi compiler in this repository can be used either as a python library or directly as a command line tool. Either way, to use our compiler, you will need to add it to your local environment. You can install it from PyPI or as a git dependency:

$ pip install flummi
$ pip install flummi@git+https://github.com/DBatUTuebingen/flummi

Library mode.

As a python library, we expose only the function compile and the module IR containing our intermediate representations. The compile function is straightforward; as the name implies it takes a program and compiles it, yielding the compiled query string. The programs can either be supplied in the form of a source string (you can find a description of the syntax down below) or an AST.

from flummi import compile

query = compile("""
  {
    DECLARE v : §int§;
    LET v = §0§[];
    EMIT v;
    LET v = §{0} + 1§[v];
    EMIT v;
    STOP
  }
""")

To map errors back to original source locations during exception handling, we track the provenance of every AST/IR node in a common attribute called location of type flummi.library.errors.Location | None. When present, we use the provenance information to render more helpful error messages, but we do not require it! To make the pretty error messages work when supplying compile with a custom AST, you need to add a proper code location to each AST node and also pass your source code using the source keyword argument.

from flummi import compile
from flummi.IR import AST
from flummi.library.errors import Location

source = "EMIT v"

ast = AST.Program(
    AST.Emit(
        AST.Variable("v", location=Location(line=1, column=6)),
        location=Location(line=1, column=1),
    ),
    location=Location(line=1, column=1),
)

query = compile(ast, source=source)
flummi.compiler.analysis.AnalysisError: AnalysisError:
Found read from uninitialised variable 'v'.
1 | EMIT v
---------^

Tool mode.

To use the Flummi compiler as a CLI tool, you will need to install it with the extra-dependencies cli, i.e., pip install flummi[cli]. When using the compiler via the CLI, you can pass it a Flummi source file which it will in turn compile and dump the compiled SQL query to stdout or an optional file.

-- contents of input.fl
{
  DECLARE v : §int§;
  LET v = §0§[];
  EMIT v;
  LET v = §{0} + 1§[v];
  EMIT v;
  STOP
}
$ flummi input.fl [output.sql]
WITH
  "start.1"("#️⃣", "⚙️") AS (
    SELECT 0 AS "#️⃣",
           NULL AS "⚙️"
  ),
...

For debugging purposes, or if you just a bit nosy, the compiler can also spit out a graphically renditions of the IR used during our compilation. To render these we rely on being able to find a working version of graphviz somewhere on your path—so make sure to have that installed if you want to use this feature.

The Flummi Language

This compiler prototype uses a "minimum viable language" as an input, as such it isn't really ergonomic to program---but that isn't our goal here! Some of the "unergonomic" things you will need to contend with are that we only have infinite loops with loop controls (BREAK/CONTINUE), our conditionals always need two branches, all variables are scoped globally, etc. The following is a complete EBNF-esque grammar of the Flummi language.

𝑠 := { 𝑠; …; 𝑠 }
   | DECLARE 𝑣 : 𝑡
   | LET 𝑣 = 𝑒
   | EMIT 𝑣
   | STOP
   | NOOP
   | IF 𝑣 THEN 𝑠 ELSE 𝑠
   | LOOP 𝑠
   | BREAK
   | CONTINUE
   | FORK 𝑣 = 𝑞
   | GATHER 𝑣 = 𝑎 [ BY 𝑣, …, 𝑣 ]
   | SYNC [ BY 𝑣, …, 𝑣 ]

𝑣 := <variable>
𝑡 := §<SQL type>§
𝑒 := §<scalar SQL expression>§[<free variables>]
𝑞 := §<non-scalar SQL expression>§[<free variables>]
𝑎 := §<SQL aggregate expression>§[<free variables>]

Embedded SQL

The grammar contains a few §...§[...] bits that we call "black boxes" in most of our work. Syntactically we represent these in two parts, the first §...§ being a bit of SQL code with free variables (free wrt. the code snippet itself) replaced by {𝑖} and the second being a list of the free variables in the order of 𝑖. So the expression §{0} + 1§[v] maps to the SQL expression that increments the variable v, i.e., v + 1. Note that for types we don't expect free variables, since that doesn't make much sense for our purposes.

Data-Driven Concurrency

Up until LOOP, BREAK, CONTINUE, the grammar above probably doesn't need much explaining. After that we get a triplet of statements that you may not be able to make much sense of. TL;DR together, FORK, GATHER, and SYNC represent a data-driven version of the ubiquitous fork-join model for concurrency enabled by our compilation method. The work on this has yet to be published, so you won't find much information on this feature elsewhere. As a brief starter, consider the following:

  • FORK … allows you to fork the current program state into multiple sibling program states at any given time for each row in the result of a table-valued SQL expression.
  • GATHER … [BY …] allows you to join (bad overlap in DB-lingo here, so we call it gather) sibling program states together by combining data from each of them using a SQL aggregate expression. Further you can group sibling program states by providing an additional grouping key.
  • SYNC [BY …] allows you to enforce synchronicity between sibling program states, again with the option to group sibling program states using a grouping key.

The latter is necessary due to our handling of loops during compilation; the SQL queries we compile to execute loop iterations of all siblings in lock step, which can lead to situations where siblings diverge wrt. their current progress through the program. Without the SYNC statement, such divergence would be unrecoverable, thus making it impossible to join these divergent sibling program states.

Related Publications

  • Tim Fischer and Denis Hirn. 2025. BIRNE: Mixed-paradigm Workload Execution in SQL Engines. In Proceedings of the 19th International Symposium on Database Programming Languages (DBPL '25). Association for Computing Machinery, New York, NY, USA, Article 4, 1–11. https://doi.org/10.1145/3735106.3736535
  • Tim Fischer, Denis Hirn, and Torsten Grust. 2024. SQL Engines Excel at the Execution of Imperative Programs. Proc. VLDB Endow. 17, 13 (September 2024), 4696–4708. https://doi.org/10.14778/3704965.3704976

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

flummi-0.2.1.tar.gz (35.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

flummi-0.2.1-py3-none-any.whl (39.9 kB view details)

Uploaded Python 3

File details

Details for the file flummi-0.2.1.tar.gz.

File metadata

  • Download URL: flummi-0.2.1.tar.gz
  • Upload date:
  • Size: 35.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.8 {"installer":{"name":"uv","version":"0.11.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for flummi-0.2.1.tar.gz
Algorithm Hash digest
SHA256 9725434b0a3092c49b329ce97c668e563174a52ec68dda2a645e176e1d97f1bb
MD5 f00fc4732755e9fcf058f1df1d9e5d70
BLAKE2b-256 01f768fe9002648f25c06a792393b1822c263d0af95d556d41aeae6ac648b2fd

See more details on using hashes here.

File details

Details for the file flummi-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: flummi-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 39.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.8 {"installer":{"name":"uv","version":"0.11.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for flummi-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 1355b245c527f723941bfcfa45aafa3b2afff0b167769caa4335edbb38790458
MD5 352a5be8ba936f9385e3863663aa4104
BLAKE2b-256 f7a3e8c9fd9c7b3d6cce505e766ee014d5f54a7a2668624a7763c8154e0a93cf

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page