Skip to main content

PostgreSQL Languages AST and statements prettifier

Project description

Author:

Lele Gaifax

Contact:

lele@metapensiero.it

License:

GNU General Public License version 3 or later

Status:

Build status Documentation status

This is a Python 3 module that exposes the parse tree of a PostgreSQL statement (extracted by the almost standard PG parser repackaged as a standalone static library by libpg_query) as set of interconnected nodes, usually called an abstract syntax tree.

I needed a better SQL reformatter than the one implemented by sqlparse, and was annoyed by a few glitches (subselects in particular) that ruins the otherwise excellent job it does, considering that it is a generic library that tries to swallow many different SQL dialects.

When I found psqlparse I decided to try implementing a PostgreSQL focused tool: at the beginning it’s been easier than I feared, but I quickly hit some shortcomings in that implementation, so I opted for writing my own solution restarting from scratch, with the following goals:

  • target only Python 3.4+

  • target PostgreSQL 10

  • use a more dynamic approach to represent the parse tree, with a twofold advantage:

    1. it is much less boring to code, because there’s no need to write one Python class for each PostgreSQL node tag

    2. the representation is version agnostic, it can be adapted to newer/older Elephants in a snap

  • allow exploration of parse tree in both directions, because I realized that some kinds of nodes require that knowledge to determine their textual representation

  • avoid introducing arbitrary renames of tags and attributes, so what you read in PostgreSQL documentation/sources[*] is available without the hassle of guessing how a symbol has been mapped

  • use a zero copy approach, keeping the original parse tree returned from the underlying libpg_query functions and have each node just borrow a reference to its own subtree

Introduction

At the lower level the module exposes two libpg_query functions, parse_sql() and parse_plpgsql(), that take respectively an SQL statement and a PLpgSQL statement and return a parse tree as a hierarchy of Python dictionaries, lists and scalar values. In some cases these scalars correspond to some C typedef enums, that are automatically extracted from the PostgreSQL headers mentioned above and are available as pglast.enums.

At a higher level that tree is represented by three Python classes, a Node that represents a single node, a List that wraps a sequence of nodes and a Scalar for plain values such a strings, integers, booleans or none.

Every node is identified by a tag, a string label that characterizes its content that is exposed as a set of attributes as well as with a dictionary-like interface (technically they implements both a __getattr__ method and a __getitem__ method). When asked for an attribute, the node returns an instance of the base classes, i.e. another Node, or a List or a Scalar, depending on the data type of that item. When the node does not contain the requested attribute it returns a singleton Missing marker instance.

A List wraps a plain Python list and may contains a sequence of Node instances, or in some cases other sub-lists, that can be accessed with the usual syntax, or iterated.

Finally, a Scalar carries a single value of some type, accessible through its value attribute.

On top of that, the module implements two serializations, one that transforms a Node into a raw textual representation and another that returns a prettified representation. The latter is exposed by the pgpp CLI tool, see below for an example.

Installation

As usual, the easiest way is with pip:

$ pip install pglast

Alternatively you can clone the repository:

$ git clone https://github.com/lelit/pglast.git --recursive

and install from there:

$ pip install ./pglast

Development

There is a set of makefiles implementing the most common operations, a make help will show a brief table of contents. A comprehensive test suite, based on pytest, covers 98% of the source lines.

Examples of usage

  • Parse an SQL statement and get its AST root node:

    >>> from pglast import Node, parse_sql
    >>> root = Node(parse_sql('SELECT foo FROM bar'))
    >>> print(root)
    None=[1*{RawStmt}]
  • Recursively traverse the parse tree:

    >>> for node in root.traverse():
    ...   print(node)
    ...
    None[0]={RawStmt}
    stmt={SelectStmt}
    fromClause[0]={RangeVar}
    inh=<True>
    location=<16>
    relname=<'bar'>
    relpersistence=<'p'>
    op=<0>
    targetList[0]={ResTarget}
    location=<7>
    val={ColumnRef}
    fields[0]={String}
    str=<'foo'>
    location=<7>

    As you can see, the representation of each value is mnemonic: {some_tag} means a Node with tag some_tag, [X*{some_tag}] is a List containing X nodes of that particular kind[] and <value> is a Scalar.

  • Get a particular node:

    >>> from_clause = root[0].stmt.fromClause
    >>> print(from_clause)
    fromClause=[1*{RangeVar}]
  • Obtain some information about a node:

    >>> range_var = from_clause[0]
    >>> print(range_var.node_tag)
    RangeVar
    >>> print(range_var.attribute_names)
    dict_keys(['relname', 'inh', 'relpersistence', 'location'])
    >>> print(range_var.parent_node)
    stmt={SelectStmt}
  • Iterate over nodes:

    >>> for a in from_clause:
    ...     print(a)
    ...     for b in a:
    ...         print(b)
    ...
    fromClause[0]={RangeVar}
    inh=<True>
    location=<16>
    relname=<'bar'>
    relpersistence=<'p'>
  • Reformat a SQL statement[] from the command line:

    $ echo "select a,b,c from sometable" | pgpp
    SELECT a
         , b
         , c
    FROM sometable
    
    $ echo "select a,b,c from sometable" | pgpp -c
    SELECT a,
           b,
           c
    FROM sometable
    
    $ echo 'update "table" set value=123 where value is null' | pgpp
    UPDATE "table"
    SET value = 123
    WHERE value IS NULL
    
    $ echo "
    insert into t (id, description)
    values (1, 'this is short enough'),
           (2, 'this is too long, and will be splitted')" | pgpp -s 20
    INSERT INTO t (id, description)
    VALUES (1, 'this is short enough')
         , (2, 'this is too long, an'
               'd will be splitted')
  • Programmatically reformat a SQL statement:

    >>> from pglast import prettify
    >>> print(prettify('delete from sometable where value is null'))
    DELETE FROM sometable
    WHERE value IS NULL

Documentation

Latest documentation is hosted by Read the Docs at http://pglast.readthedocs.io/en/latest/

Changes

1.1 (2018-07-20)

  • No visible changes, but now PyPI carries binary wheels for Python 3.7.

1.0 (2018-06-16)

0.28 (2018-06-06)

  • Update libpg_query to 10-1.0.2

  • Support the ‘?’-style parameter placeholder variant allowed by libpg_query (details)

0.27 (2018-04-15)

  • Prettier JOINs representation, aligning them with the starting relation

0.26 (2018-04-03)

  • Fix cosmetic issue with ANY() and ALL()

0.25 (2018-03-31)

  • Fix issue in the safety belt check performed by pgpp (issue #4)

0.24 (2018-03-02)

  • Implement Null printer

0.23 (2017-12-28)

  • Implement some other DDL statements printers

  • New alternative style to print comma-separated-values lists, activated by a new --comma-at-eoln option on pgpp

0.22 (2017-12-03)

  • Implement TransactionStmt and almost all DROP xxx printers

0.21 (2017-11-22)

  • Implement NamedArgExpr printer

  • New alternative printers for a set of special functions, activated by a new --special-functions option on pgpp (issue #2)

0.20 (2017-11-21)

  • Handle special de-reference (A_Indirection) cases

0.19 (2017-11-16)

  • Fix serialization of column labels containing double quotes

  • Fix corner issues surfaced implementing some more DDL statement printers

0.18 (2017-11-14)

  • Fix endless loop due to sloppy conversion of command line option

  • Install the command line tool as pgpp

0.17 (2017-11-12)

  • Rename printers.sql to printers.dml (backward incompatibility)

  • List printer functions in the documentation, referencing the definition of related node type

  • Fix inconsistent spacing in JOIN condition inside a nested expression

  • Fix representation of unbound arrays

  • Fix representation of interval data type

  • Initial support for DDL statements

  • Fix representation of string literals containing single quotes

0.16 (2017-10-31)

  • Update libpg_query to 10-1.0.0

0.15 (2017-10-12)

  • Fix indentation of boolean expressions in SELECT’s targets (issue #3)

0.14 (2017-10-09)

  • Update to latest libpg_query’s 10-latest branch, targeting PostgreSQL 10.0 final

0.13 (2017-09-17)

  • Fix representation of subselects requiring surrounding parens

0.12 (2017-08-22)

  • New option --version on the command line tool

  • Better enums documentation

  • Release the GIL while calling libpg_query functions

0.11 (2017-08-11)

  • Nicer indentation for JOINs, making OUTER JOINs stand out

  • Minor tweaks to lists rendering, with less spurious whitespaces

  • New option --no-location on the command line tool

0.10 (2017-08-11)

  • Support Python 3.4 and Python 3.5 as well as Python 3.6

0.9 (2017-08-10)

  • Fix spacing before the $ character

  • Handle type modifiers

  • New option --plpgsql on the command line tool, just for fun

0.8 (2017-08-10)

  • Add enums subpackages to the documentation with references to their related headers

  • New compact_lists_margin option to produce a more compact representation when possible (see issue #1)

0.7 (2017-08-10)

  • Fix sdist including the Sphinx documentation

0.6 (2017-08-10)

  • New option --parse-tree on the command line tool to show just the parse tree

  • Sphinx documentation, available online

0.5 (2017-08-09)

  • Handle some more cases when a name must be double-quoted

  • Complete the serialization of the WindowDef node, handling its frame options

0.4 (2017-08-09)

  • Expose the actual PostgreSQL version the underlying libpg_query libray is built on thru a new get_postgresql_version() function

  • New option safety_belt for the prettify() function, to protect the innocents

  • Handle serialization of CoalesceExpr and MinMaxExpr

0.3 (2017-08-07)

  • Handle serialization of ParamRef nodes

  • Expose a prettify() helper function

0.2 (2017-08-07)

  • Test coverage at 99%

  • First attempt at automatic wheel upload to PyPI, let’s see…

0.1 (2017-08-07)

  • First release (“Hi daddy!”, as my soul would tag it)

Project details


Release history Release notifications | RSS feed

This version

1.1

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pglast-1.1.tar.gz (1.5 MB view details)

Uploaded Source

Built Distributions

pglast-1.1-cp37-cp37m-manylinux1_x86_64.whl (1.1 MB view details)

Uploaded CPython 3.7m

pglast-1.1-cp37-cp37m-manylinux1_i686.whl (1.0 MB view details)

Uploaded CPython 3.7m

pglast-1.1-cp36-cp36m-manylinux1_x86_64.whl (1.1 MB view details)

Uploaded CPython 3.6m

pglast-1.1-cp36-cp36m-manylinux1_i686.whl (1.0 MB view details)

Uploaded CPython 3.6m

pglast-1.1-cp35-cp35m-manylinux1_x86_64.whl (1.1 MB view details)

Uploaded CPython 3.5m

pglast-1.1-cp35-cp35m-manylinux1_i686.whl (1.0 MB view details)

Uploaded CPython 3.5m

pglast-1.1-cp34-cp34m-manylinux1_x86_64.whl (1.1 MB view details)

Uploaded CPython 3.4m

pglast-1.1-cp34-cp34m-manylinux1_i686.whl (1.0 MB view details)

Uploaded CPython 3.4m

File details

Details for the file pglast-1.1.tar.gz.

File metadata

  • Download URL: pglast-1.1.tar.gz
  • Upload date:
  • Size: 1.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for pglast-1.1.tar.gz
Algorithm Hash digest
SHA256 3149c8b80c84f54fcdc56685e9e12625a7774e02352469d90ad9a30edcca759c
MD5 5377bf3f1b244a356b4ab31d327a931a
BLAKE2b-256 7dd338e9f76d1e5404fc843d0a29696ed961367200d0f907cea1cdcbf4f38acb

See more details on using hashes here.

Provenance

File details

Details for the file pglast-1.1-cp37-cp37m-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for pglast-1.1-cp37-cp37m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 33b481f622c4bdca766385235e405930e75783d23599d27e329083f43e96e46f
MD5 9315b616ee291574b48709a876e7bc39
BLAKE2b-256 79b1b327a90ec5627fe131d25568613ec71f1ddf1dab5281caa150f998e0b468

See more details on using hashes here.

Provenance

File details

Details for the file pglast-1.1-cp37-cp37m-manylinux1_i686.whl.

File metadata

File hashes

Hashes for pglast-1.1-cp37-cp37m-manylinux1_i686.whl
Algorithm Hash digest
SHA256 5e5491dd52bee01dc18f7c24195153130ddeec499fec32b1f2c63404de7ee8e8
MD5 1ff73e32f362023b70277d58fe2c8549
BLAKE2b-256 4b5d02e3478b938ba9213fc072f9614e1a00655a2c507b5a9e26a257a1710ec7

See more details on using hashes here.

Provenance

File details

Details for the file pglast-1.1-cp36-cp36m-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for pglast-1.1-cp36-cp36m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 7139e39a021607297f26cfaef7ed7b76fd48c242c21a77264b48a9824be62164
MD5 d9ec47dbaad0b56e20b5e7c67ede711c
BLAKE2b-256 56c6f7a6b8afda53845cf7aebbc20654f5a480d0550578d5a80f86f548ff8590

See more details on using hashes here.

Provenance

File details

Details for the file pglast-1.1-cp36-cp36m-manylinux1_i686.whl.

File metadata

File hashes

Hashes for pglast-1.1-cp36-cp36m-manylinux1_i686.whl
Algorithm Hash digest
SHA256 58f16ce388e8963b91c407a3708496109beafa172711e003e757a454740cb1e2
MD5 503f1cd8daa7901c6663b4f37e1035fc
BLAKE2b-256 080f89e87e85504a9f6db4e7128dc4c6f40e591e5f59443f42491c3b01a86de6

See more details on using hashes here.

Provenance

File details

Details for the file pglast-1.1-cp35-cp35m-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for pglast-1.1-cp35-cp35m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 cf898496b18725e1e5e2e68de97a1259a5ee2f3c57c30149e79cefc6cafd03ac
MD5 d1dcfc0b5395a6b0e5302da4e3a80588
BLAKE2b-256 09de3efc6f3f6e52bc1705e4b3263d3356dddc5c7c00ce882e355d95da156c10

See more details on using hashes here.

Provenance

File details

Details for the file pglast-1.1-cp35-cp35m-manylinux1_i686.whl.

File metadata

File hashes

Hashes for pglast-1.1-cp35-cp35m-manylinux1_i686.whl
Algorithm Hash digest
SHA256 95f17897d60644740c7c22ef4463c11e6d5eb483e6a0d4ab68e2f65876e22930
MD5 7d7748874d326e22fffdd78876b732e2
BLAKE2b-256 c799a080081d79626af6b59b18f833e2f7e5b5512433327617bcf8dbd8bb41cd

See more details on using hashes here.

Provenance

File details

Details for the file pglast-1.1-cp34-cp34m-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for pglast-1.1-cp34-cp34m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 8e8c263116addbf51f9302caf835fe03a86b29fa343df99d40ebfed196fd82b8
MD5 8bda25705921837b5af7eacbb973ba6f
BLAKE2b-256 f6f4c8d1fac1fbfd8396e5310c774318b644036e9fe07abaa730d6871cdcd780

See more details on using hashes here.

Provenance

File details

Details for the file pglast-1.1-cp34-cp34m-manylinux1_i686.whl.

File metadata

File hashes

Hashes for pglast-1.1-cp34-cp34m-manylinux1_i686.whl
Algorithm Hash digest
SHA256 14681946e56c777c1e538dcfeddc8c6ab5a1744d5bc644c6d7a5d6684d01688d
MD5 0b468bfbf6c5315ffd4d616ce660017f
BLAKE2b-256 a810bf9f82533de31b1e73b64e54a4314b2d16c4976233bc9184900a2d622875

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page