Skip to main content

Query AST Language Expressions

Project description

qastle (Query AST Language Expressions)

Codecov badge PyPI version

This document describes a language intended to be used in ServiceX and func_adl for messages which represent abstract syntax trees (ASTs). The trees specify columnar selections of HEP data.

Introduction

Influences

  • ast module in Python
    • FuncADL natively uses ASTs as represented by Python's standard ast module, thus it is convenient to base everything at least loosely on Python's ASTs to ease translation. ASTs in Python, however, are extremely dense with information important for a full-featured general programming language but not relevant or useful for our purposes in forming selections of columns.
  • LINQ
    • FuncADL is roughly based on LINQtoROOT, which was based on using LINQ queries on data in ROOT format. LINQ is a query language native to C#. The query operators used in FuncADL (Select, SelectMany, Where, etc.) are those of LINQ, so many of the AST nodes will need to represent these operators.
  • Common Lisp
    • Lisp is a functional programming language with a minimalist syntax definition. We're aiming to use similar syntax because of how sparse it is, so that representations of AST nodes are very lean.

Guiding principles

  • I try not to deviate from the influences listed above without good reason, since they are all already well-established standards. However, with influences from three different languages, it's impossible to adhere to all of them anyway.
  • For simplificaion and clarity, anything in Python's ast that does not affect static translation into a columnar selction is removed.
  • Anything that would result in ambiguity when statically converting to a Python AST with LINQ queries or would prevent this conversion from being possible should be explicitly disallowed for easier debugging.
  • I'm trying to keep the syntax both as simple and as uniform as possible while maintaining all necessary functionality. By this, I mean in the sense of the simplicity and uniformity of the definition, which also results in the least complex parsing. Note that this does not result in the most compact AST text possible.

Language specification

Syntax

The syntax/grammar definition is discussed here. Like Lisp, the language consists solely of s-expressions. S-expressions here represent AST nodes and are either atoms--which include literals and identifiers--or composites of other s-expressions. Literals and names are nearly identical to those in Python. Composites are of the form:

(<composite node type> <s-expression 1> <s-expression 2> <s-expression 3> ...)

They look like bare lists from Lisp, with the first element describing the type of AST node, and the rest of the elements being the components of the node.

Semantics

All defined s-expressions are listed here, though this specification will be expanded in the future. The symbol * is used as a suffix here in its regex meaning (i.e., zero or more of the object that it follows are expected). Except where there is a restriction explicitly mentioned in the templates below, any type of s-expression can used as an element of a composite s-expression.

  • Atomic s-expressions (atoms):

    • Numbers
    • Strings
    • Identifiers
      • Variable names
      • Reserved identifiers: True, False, and None
        • Cannot be used as variable names
  • Composite s-expressions:

    • Lists: (list <item>*)
    • Dictionary: (dict <keys> <values>)
      • keys and values must each be a list
    • Attributes: (attr <object> <attribute>)
      • attribute must be a string literal
    • Subscripts: (subscript <object> <subscript>)
    • Function calls: (call <function> <argument>*)
    • Conditionals: (if <condition> <then> <else>)
    • Unary operators: (<operator> <operand>)
      • <operator> must be not or ~
    • Binary operators: (<operator> <operand> <operand>)
      • <operator> must be one of +, -, *, /, %, **, //, and, or, &, |, ^, <<, >>, ==, !=, <, <=, >, >=
    • Lambdas: (lambda <arguments> <expression>)
      • arguments must be a list containing only variable names
    • Where: (Where <source> <predicate>)
      • selector must be a lambda with one argument
    • Select: (Select <source> <selector>)
      • selector must be a lambda with one argument
    • SelectMany: (SelectMany <source> <selector>)
      • selector must be a lambda with one argument
    • First: (First <source>)
    • Last: (Last <source>)
    • ElementAt: (ElementAt <source> <index>)
      • index must be an integer
    • Aggregate: (Aggregate <source> <seed> <func>)
      • func must be a lambda with two arguments
    • Count: (Count <source>)
    • Max: (Max <source>)
    • Min: (Min <source>)
    • Sum: (Sum <source>)
    • Zip: (Zip <source>)
    • OrderBy: (OrderBy <source> <key_selector>)
      • key_selector must be a lambda with one argument
    • OrderByDescending: (OrderByDescending <source> <key_selector>)
      • key_selector must be a lambda with one argument
    • Choose: (Choose <source> <n>)
      • n must be an integer

Example

The following query for eight columns:

data_column_source.Select("lambda Event: (Event.Electrons.pt(),
                                          Event.Electrons.eta(),
                                          Event.Electrons.phi(),
                                          Event.Electrons.e(),
                                          Event.Muons.pt(),
                                          Event.Muons.eta(),
                                          Event.Muons.phi(),
                                          Event.Muons.e())")

becomes

(Select data_column_source
        (lambda (list Event)
                (list (call (attr (attr Event 'Electrons') 'pt'))
                      (call (attr (attr Event 'Electrons') 'eta'))
                      (call (attr (attr Event 'Electrons') 'phi'))
                      (call (attr (attr Event 'Electrons') 'e'))
                      (call (attr (attr Event 'Muons') 'pt'))
                      (call (attr (attr Event 'Muons') 'eta'))
                      (call (attr (attr Event 'Muons') 'phi'))
                      (call (attr (attr Event 'Muons') 'e')))))

See this Jupyter notebook for a more thorough example.

Nota bene

The mapping between Python and qastle expressions is not strictly one-to-one. There are some Python nodes with more specific functionality than needed in the textual AST representation. For example, all Python tuples are converted to (list)s by qastle.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

qastle-0.13.0.tar.gz (13.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

qastle-0.13.0-py3-none-any.whl (12.0 kB view details)

Uploaded Python 3

File details

Details for the file qastle-0.13.0.tar.gz.

File metadata

  • Download URL: qastle-0.13.0.tar.gz
  • Upload date:
  • Size: 13.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/0.0.0 pkginfo/1.7.1 requests/2.26.0 setuptools/58.0.4 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.9.4

File hashes

Hashes for qastle-0.13.0.tar.gz
Algorithm Hash digest
SHA256 d646e9dfb779286be44c958e17bb6d0d35f6168cda39e1cbb94848646fe30b24
MD5 372245b95a9acb148d30c4d93cc5adfa
BLAKE2b-256 5e7c433f11654acbcd5a99175404b2de831a8499f428d2b3eee7a551ebea6f0c

See more details on using hashes here.

File details

Details for the file qastle-0.13.0-py3-none-any.whl.

File metadata

  • Download URL: qastle-0.13.0-py3-none-any.whl
  • Upload date:
  • Size: 12.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/0.0.0 pkginfo/1.7.1 requests/2.26.0 setuptools/58.0.4 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.9.4

File hashes

Hashes for qastle-0.13.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4d545d4af4cdbfe70283b33782216161f12222e89feae8400d0aec4a03247055
MD5 c5fa6233a6dced803c6f4d609be71a4b
BLAKE2b-256 58c83be072ad1cf605e5d0184f5ec1d5efd7869cfa52f5dcd8fa4aab9ee36a94

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page