Skip to main content

PostgreSQL Languages AST and statements prettifier

Project description


Lele Gaifax


GNU General Public License version 3 or later

Build status Documentation status

This is a Python 3 module that exposes the parse tree of a PostgreSQL statement (extracted by the almost standard PG parser repackaged as a standalone static library by libpg_query) as set of interconnected nodes, usually called an abstract syntax tree.


I needed a better SQL reformatter than the one implemented by sqlparse, and was annoyed by a few glitches (subselects in particular) that ruins the otherwise excellent job it does, considering that it is a generic library that tries to swallow many different SQL dialects.

When I found psqlparse I decided to try implementing a PostgreSQL focused tool: at the beginning it’s been easier than I feared, but I quickly hit some shortcomings in that implementation, so I opted for writing my own solution restarting from scratch, with the following goals:

  • target only Python 3.4+

  • target PostgreSQL 10+

  • use a more dynamic approach to represent the parse tree, with a twofold advantage:

    1. it is much less boring to code, because there’s no need to write one Python class for each PostgreSQL node tag

    2. the representation is version agnostic, it can be adapted to newer/older Elephants in a snap

  • allow exploration of parse tree in both directions, because I realized that some kinds of nodes require that knowledge to determine their textual representation

  • avoid introducing arbitrary renames of tags and attributes, so what you read in PostgreSQL documentation/sources is available without the hassle of guessing how a symbol has been mapped

  • use a zero copy approach, keeping the original parse tree returned from the underlying libpg_query functions and have each node just borrow a reference to its own subtree

Version 2

In late 2019, Ronan Dunklau opened PR #62 against libpg_query, that reimplemented the build machinery of the library to make it easier (read, semi-automatic) to support PostgreSQL v12, and PR #36 to bring pglast in line.

Since that version of PostgreSQL inevitably introduced some backward incompatibilities, I bumped the major version of pglast to better reflect the fact.

As I’m writing this, the fate of PR #62 is still unclear, so for the time being I switched the libpg_query submodule to Ronan’s fork.

I’m going to keep version 1 aligned to the original Lukas’ PG 10 branch. In all likelihood version 3 will target PostgreSQL 13 once his work on PG 13 branch reaches stability.


At the lower level the module exposes two libpg_query functions, parse_sql() and parse_plpgsql(), that take respectively an SQL statement and a PLpgSQL statement and return a parse tree as a hierarchy of Python dictionaries, lists and scalar values. In some cases these scalars correspond to some C typedef enums, that are automatically extracted from the PostgreSQL headers mentioned above and are available as pglast.enums.

At a higher level that tree is represented by three Python classes, a Node that represents a single node, a List that wraps a sequence of nodes and a Scalar for plain values such a strings, integers, booleans or none.

Every node is identified by a tag, a string label that characterizes its content that is exposed as a set of attributes as well as with a dictionary-like interface (technically they implements both a __getattr__ method and a __getitem__ method). When asked for an attribute, the node returns an instance of the base classes, i.e. another Node, or a List or a Scalar, depending on the data type of that item. When the node does not contain the requested attribute it returns a singleton Missing marker instance.

A List wraps a plain Python list and may contains a sequence of Node instances, or in some cases other sub-lists, that can be accessed with the usual syntax, or iterated.

Finally, a Scalar carries a single value of some type, accessible through its value attribute.

On top of that, the module implements two serializations, one that transforms a Node into a raw textual representation and another that returns a prettified representation. The latter is exposed by the pgpp CLI tool, see below for an example.


As usual, the easiest way is with pip:

$ pip install pglast

Alternatively you can clone the repository:

$ git clone --recursive

and install from there:

$ pip install ./pglast


There is a set of makefiles implementing the most common operations, a make help will show a brief table of contents. A comprehensive test suite, based on pytest, covers 98% of the source lines.

Examples of usage

  • Parse an SQL statement and get its AST root node:

    >>> from pglast import Node, parse_sql
    >>> root = Node(parse_sql('SELECT foo FROM bar'))
    >>> print(root)
  • Recursively traverse the parse tree:

    >>> for node in root.traverse():
    ...   print(node)

    As you can see, the representation of each value is mnemonic: {some_tag} means a Node with tag some_tag, [X*{some_tag}] is a List containing X nodes of that particular kind[*] and <value> is a Scalar.

  • Get a particular node:

    >>> from_clause = root[0].stmt.fromClause
    >>> print(from_clause)
  • Obtain some information about a node:

    >>> range_var = from_clause[0]
    >>> print(range_var.node_tag)
    >>> print(range_var.attribute_names)
    dict_keys(['relname', 'inh', 'relpersistence', 'location'])
    >>> print(range_var.parent_node)
  • Iterate over nodes:

    >>> for a in from_clause:
    ...     print(a)
    ...     for b in a:
    ...         print(b)
  • Reformat a SQL statement[] from the command line:

    $ echo "select a,b,c from sometable" | pgpp
    SELECT a
         , b
         , c
    FROM sometable
    $ echo "select a,b,c from sometable" | pgpp -c
    SELECT a,
    FROM sometable
    $ echo "select a, case when a=1 then 'singular' else 'plural' end from test" > /tmp/q.sql
    $ pgpp /tmp/q.sql
    SELECT a
         , CASE
             WHEN (a = 1)
               THEN 'singular'
             ELSE 'plural'
    FROM test
    $ echo 'update "table" set value=123 where value is null' | pgpp
    UPDATE "table"
    SET value = 123
    WHERE value IS NULL
    $ echo "
    insert into t (id, description)
    values (1, 'this is short enough'),
           (2, 'this is too long, and will be splitted')" | pgpp -s 20
    INSERT INTO t (id, description)
    VALUES (1, 'this is short enough')
         , (2, 'this is too long, an'
               'd will be splitted')
  • Programmatically reformat a SQL statement:

    >>> from pglast import prettify
    >>> print(prettify('delete from sometable where value is null'))
    DELETE FROM sometable
    WHERE value IS NULL


Latest documentation is hosted by Read the Docs at


2.0.dev3 (2021-02-20)

  • Handle INCLUDE clause in IndexStmt (PR #67), thanks to Ronan Dunklau

2.0.dev2 (2020-10-24)

  • Merge new fingerprint functionality from v1 (i.e. master) branch

2.0.dev1 (2020-09-26)

  • Require Python 3.6 or greater

  • Handle ALTER TYPE .. RENAME VALUE in AlterEnumStmt (PR #52), thanks to Ronan Dunklau

  • Add support for Create / Alter / Drop PROCEDURE (PR #48), thanks to Ronan Dunklau

  • Use Ronan’s fork of libpg_query, targeting PostgreSQL 12.1 (PR #36)

  • Change get_postgresql_version() to return a (major, minor) tuple (issue #38)


  • Handle PG12 materialized CTEs (issue #57)

  • Support column numbers in ALTER INDEX (PR #58), thanks to Ronan Dunklau

  • Handle SET LOGGED and SET UNLOGGED in ALTER TABLE (PR #59), thanks to Ronan Dunklau

  • Handle ALTER TYPE ... RENAME (PR #62), , thanks to Ronan Dunklau

1.17 (2021-02-20)

  • Fix the generic case in the RenameStmt printer

1.16 (2021-02-20)

  • Promote to the stable state

  • Move the job of building and uploading binary wheels from TravisCI to GitHub Actions

1.15 (2021-02-19)

  • Fix IF EXISTS variant of RenameStmt printer (PR #70), thanks to Jonathan Mortensen

  • Update libpg_query to 10-1.0.5

1.14 (2020-10-24)

  • Produce Python 3.9 wheels, thanks to cibuildwheel 1.6.3

  • Expose the libpg_query’s fingerprint functionality (PR #64), thanks to Yiming Wang

1.13 (2020-09-26)

  • Handle SELECT FROM foo

1.12 (2020-06-08)

  • Double quote column names in the TYPE_FUNC_NAME_KEYWORDS set (issue #55)

  • Possibly wrap SELECT in UNION/INTERSECT between parens, when needed (issue #55)

1.11 (2020-05-08)

  • Fix A_Expr printer, when lexpr is missing (PR #54), thanks to Aiham

  • Support DISABLE ROW LEVEL SECURITY in AlterTableCmd (PR #49), thanks to Ronan Dunklau

  • Implement CreateOpClassStmt printer (PR #47), thanks to Ronan Dunklau

1.10 (2020-01-25)

  • Fix collation name printer (PR #44), thanks to Ronan Dunklau

  • Implement CreatePLangStmt printer (PR #42), thanks to Bennie Swart

  • Fix privileges printer (PR #41), thanks to Bennie Swart

  • Handle TRUNCATE event in CreateTrigStmt printer (PR #40), thanks to Bennie Swart

  • Fix function body dollar quoting (PR #39), thanks to Bennie Swart

1.9 (2019-12-20)

  • Prettier INSERT representation

1.8 (2019-12-07)

  • Prettier CASE representation

  • New option to emit a semicolon after the last statement (issue #24)

1.7 (2019-12-01)

  • Implement NotifyStmt printer

  • Implement RuleStmt printer, thanks to Gavin M. Roy for his PR #28

  • Fix RenameStmt, properly handling object name

  • Produce Python 3.8 wheels, thanks to cibuildwheel 1.0.0

  • Support ALTER TABLE RENAME CONSTRAINT (PR #35), thanks to Ronan Dunklau

1.6 (2019-09-04)

  • Fix issue with boolean expressions precedence (issue #29)

  • Implement BitString printer

  • Support LEAKPROOF option (PR #31), thanks to Ronan Dunklau

  • Support DEFERRABLE INITIALLY DEFERRED option (PR #32), thanks to Ronan Dunklau

1.5 (2019-06-04)

  • Fix issue with RETURNS SETOF functions, a more general solution than the one proposed by Ronan Dunklau (PR #22)

  • Allow more than one empty line between statements (PR #26), thanks to apnewberry

1.4 (2019-04-06)

  • Fix wrap of trigger’s WHEN expression (issue #18)

  • Support for variadic functions (PR #19), thanks to Ronan Dunklau

  • Support ORDER / LIMIT / OFFSET for set operations (PR #20), thanks to Ronan Dunklau

  • Implement ConstraintsSetStmt and improve VariableSetStmt printers

1.3 (2019-03-28)

  • Support CROSS JOIN and timezone modifiers on time and timestamp datatypes (PR #15), thanks to Ronan Dunklau

  • Many new printers and several enhancements (PR #14), thanks to Ronan Dunklau

  • Expose the package version as pglast.__version__ (issue #12)

1.2 (2019-02-13)

  • Implement new split() function (see PR #10)

  • Implement BooleanTest printer (issue #11)

1.1 (2018-07-20)

  • No visible changes, but now PyPI carries binary wheels for Python 3.7.

1.0 (2018-06-16)

0.28 (2018-06-06)

  • Update libpg_query to 10-1.0.2

  • Support the ‘?’-style parameter placeholder variant allowed by libpg_query (details)

0.27 (2018-04-15)

  • Prettier JOINs representation, aligning them with the starting relation

0.26 (2018-04-03)

  • Fix cosmetic issue with ANY() and ALL()

0.25 (2018-03-31)

  • Fix issue in the safety belt check performed by pgpp (issue #4)

0.24 (2018-03-02)

  • Implement Null printer

0.23 (2017-12-28)

  • Implement some other DDL statements printers

  • New alternative style to print comma-separated-values lists, activated by a new --comma-at-eoln option on pgpp

0.22 (2017-12-03)

  • Implement TransactionStmt and almost all DROP xxx printers

0.21 (2017-11-22)

  • Implement NamedArgExpr printer

  • New alternative printers for a set of special functions, activated by a new --special-functions option on pgpp (issue #2)

0.20 (2017-11-21)

  • Handle special de-reference (A_Indirection) cases

0.19 (2017-11-16)

  • Fix serialization of column labels containing double quotes

  • Fix corner issues surfaced implementing some more DDL statement printers

0.18 (2017-11-14)

  • Fix endless loop due to sloppy conversion of command line option

  • Install the command line tool as pgpp

0.17 (2017-11-12)

  • Rename printers.sql to printers.dml (backward incompatibility)

  • List printer functions in the documentation, referencing the definition of related node type

  • Fix inconsistent spacing in JOIN condition inside a nested expression

  • Fix representation of unbound arrays

  • Fix representation of interval data type

  • Initial support for DDL statements

  • Fix representation of string literals containing single quotes

0.16 (2017-10-31)

  • Update libpg_query to 10-1.0.0

0.15 (2017-10-12)

  • Fix indentation of boolean expressions in SELECT’s targets (issue #3)

0.14 (2017-10-09)

  • Update to latest libpg_query’s 10-latest branch, targeting PostgreSQL 10.0 final

0.13 (2017-09-17)

  • Fix representation of subselects requiring surrounding parens

0.12 (2017-08-22)

  • New option --version on the command line tool

  • Better enums documentation

  • Release the GIL while calling libpg_query functions

0.11 (2017-08-11)

  • Nicer indentation for JOINs, making OUTER JOINs stand out

  • Minor tweaks to lists rendering, with less spurious whitespaces

  • New option --no-location on the command line tool

0.10 (2017-08-11)

  • Support Python 3.4 and Python 3.5 as well as Python 3.6

0.9 (2017-08-10)

  • Fix spacing before the $ character

  • Handle type modifiers

  • New option --plpgsql on the command line tool, just for fun

0.8 (2017-08-10)

  • Add enums subpackages to the documentation with references to their related headers

  • New compact_lists_margin option to produce a more compact representation when possible (see issue #1)

0.7 (2017-08-10)

  • Fix sdist including the Sphinx documentation

0.6 (2017-08-10)

  • New option --parse-tree on the command line tool to show just the parse tree

  • Sphinx documentation, available online

0.5 (2017-08-09)

  • Handle some more cases when a name must be double-quoted

  • Complete the serialization of the WindowDef node, handling its frame options

0.4 (2017-08-09)

  • Expose the actual PostgreSQL version the underlying libpg_query libray is built on thru a new get_postgresql_version() function

  • New option safety_belt for the prettify() function, to protect the innocents

  • Handle serialization of CoalesceExpr and MinMaxExpr

0.3 (2017-08-07)

  • Handle serialization of ParamRef nodes

  • Expose a prettify() helper function

0.2 (2017-08-07)

  • Test coverage at 99%

  • First attempt at automatic wheel upload to PyPI, let’s see…

0.1 (2017-08-07)

  • First release (“Hi daddy!”, as my soul would tag it)

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pglast-2.0.dev3.tar.gz (209.1 kB view hashes)

Uploaded source

Built Distributions

pglast-2.0.dev3-cp39-cp39-manylinux1_i686.whl (415.0 kB view hashes)

Uploaded cp39

pglast-2.0.dev3-cp38-cp38-manylinux1_i686.whl (415.1 kB view hashes)

Uploaded cp38

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Huawei Huawei PSF Sponsor Microsoft Microsoft PSF Sponsor NVIDIA NVIDIA PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page