PostgreSQL Languages AST and statements prettifier
Project description
- Contact:
- lele@metapensiero.it
- License:
- Status:
- Version:
- 2
This is a Python 3 module that exposes the parse tree of a PostgreSQL statement (extracted by the almost standard PG parser repackaged as a standalone static library by libpg_query) as set of interconnected nodes, usually called an abstract syntax tree.
Foreword
I needed a better SQL reformatter than the one implemented by sqlparse, and was annoyed by a few glitches (subselects in particular) that ruins the otherwise excellent job it does, considering that it is a generic library that tries to swallow many different SQL dialects.
When I found psqlparse I decided to try implementing a PostgreSQL focused tool: at the beginning it’s been easier than I feared, but I quickly hit some shortcomings in that implementation, so I opted for writing my own solution restarting from scratch, with the following goals:
target only Python 3.4+
target PostgreSQL 10+
use a more dynamic approach to represent the parse tree, with a twofold advantage:
it is much less boring to code, because there’s no need to write one Python class for each PostgreSQL node tag
the representation is version agnostic, it can be adapted to newer/older Elephants in a snap
allow exploration of parse tree in both directions, because I realized that some kinds of nodes require that knowledge to determine their textual representation
avoid introducing arbitrary renames of tags and attributes, so what you read in PostgreSQL documentation/sources is available without the hassle of guessing how a symbol has been mapped
use a zero copy approach, keeping the original parse tree returned from the underlying libpg_query functions and have each node just borrow a reference to its own subtree
Version 2
In late 2019, Ronan Dunklau opened PR #62 against libpg_query, that reimplemented the build machinery of the library to make it easier (read, semi-automatic) to support PostgreSQL v12, and PR #36 to bring pglast in line.
Since latest version of PostgreSQL inevitably introduced some backward incompatibilities, I bumped the major version of pglast to better reflect the fact.
As I’m writing this, the fate of PR #62 is still unclear, so for the time being I switched the libpg_query submodule to Ronan’s fork.
Introduction
At the lower level the module exposes two libpg_query functions, parse_sql() and parse_plpgsql(), that take respectively an SQL statement and a PLpgSQL statement and return a parse tree as a hierarchy of Python dictionaries, lists and scalar values. In some cases these scalars correspond to some C typedef enums, that are automatically extracted from the PostgreSQL headers mentioned above and are available as pglast.enums.
At a higher level that tree is represented by three Python classes, a Node that represents a single node, a List that wraps a sequence of nodes and a Scalar for plain values such a strings, integers, booleans or none.
Every node is identified by a tag, a string label that characterizes its content that is exposed as a set of attributes as well as with a dictionary-like interface (technically they implements both a __getattr__ method and a __getitem__ method). When asked for an attribute, the node returns an instance of the base classes, i.e. another Node, or a List or a Scalar, depending on the data type of that item. When the node does not contain the requested attribute it returns a singleton Missing marker instance.
A List wraps a plain Python list and may contains a sequence of Node instances, or in some cases other sub-lists, that can be accessed with the usual syntax, or iterated.
Finally, a Scalar carries a single value of some type, accessible through its value attribute.
On top of that, the module implements two serializations, one that transforms a Node into a raw textual representation and another that returns a prettified representation. The latter is exposed by the pgpp CLI tool, see below for an example.
Installation
As usual, the easiest way is with pip:
$ pip install pglast
Alternatively you can clone the repository:
$ git clone https://github.com/lelit/pglast.git --recursive
and install from there:
$ pip install ./pglast
Development
There is a set of makefiles implementing the most common operations, a make help will show a brief table of contents. A comprehensive test suite, based on pytest, covers 98% of the source lines.
Examples of usage
Parse an SQL statement and get its AST root node:
>>> from pglast import Node, parse_sql >>> root = Node(parse_sql('SELECT foo FROM bar')) >>> print(root) None=[1*{RawStmt}]
Recursively traverse the parse tree:
>>> for node in root.traverse(): ... print(node) ... None[0]={RawStmt} stmt={SelectStmt} fromClause[0]={RangeVar} inh=<True> location=<16> relname=<'bar'> relpersistence=<'p'> op=<0> targetList[0]={ResTarget} location=<7> val={ColumnRef} fields[0]={String} str=<'foo'> location=<7>
As you can see, the representation of each value is mnemonic: {some_tag} means a Node with tag some_tag, [X*{some_tag}] is a List containing X nodes of that particular kind[*] and <value> is a Scalar.
Get a particular node:
>>> from_clause = root[0].stmt.fromClause >>> print(from_clause) fromClause=[1*{RangeVar}]
Obtain some information about a node:
>>> range_var = from_clause[0] >>> print(range_var.node_tag) RangeVar >>> print(range_var.attribute_names) dict_keys(['relname', 'inh', 'relpersistence', 'location']) >>> print(range_var.parent_node) stmt={SelectStmt}
Iterate over nodes:
>>> for a in from_clause: ... print(a) ... for b in a: ... print(b) ... fromClause[0]={RangeVar} inh=<True> location=<16> relname=<'bar'> relpersistence=<'p'>
Reformat a SQL statement[†] from the command line:
$ echo "select a,b,c from sometable" | pgpp SELECT a , b , c FROM sometable $ echo "select a,b,c from sometable" | pgpp -c SELECT a, b, c FROM sometable $ echo "select a, case when a=1 then 'singular' else 'plural' end from test" > /tmp/q.sql $ pgpp /tmp/q.sql SELECT a , CASE WHEN (a = 1) THEN 'singular' ELSE 'plural' END FROM test $ echo 'update "table" set value=123 where value is null' | pgpp UPDATE "table" SET value = 123 WHERE value IS NULL $ echo " insert into t (id, description) values (1, 'this is short enough'), (2, 'this is too long, and will be splitted')" | pgpp -s 20 INSERT INTO t (id, description) VALUES (1, 'this is short enough') , (2, 'this is too long, an' 'd will be splitted')
Programmatically reformat a SQL statement:
>>> from pglast import prettify >>> print(prettify('delete from sometable where value is null')) DELETE FROM sometable WHERE value IS NULL
Documentation
Latest documentation is hosted by Read the Docs at http://pglast.readthedocs.io/en/latest/
Changes
2.0.dev1 (2020-09-26)
Require Python 3.6 or greater
Handle ALTER TYPE .. RENAME VALUE in AlterEnumStmt (PR #52), thanks to Ronan Dunklau
Add support for Create / Alter / Drop PROCEDURE (PR #48), thanks to Ronan Dunklau
Use Ronan’s fork of libpg_query, targeting PostgreSQL 12.1 (PR #36)
Change get_postgresql_version() to return a (major, minor) tuple (issue #38)
Handle ALTER TABLE ... ALTER COLUMN ... SET STORAGE ...
Handle PG12 materialized CTEs (issue #57)
Support column numbers in ALTER INDEX (PR #58), thanks to Ronan Dunklau
Handle SET LOGGED and SET UNLOGGED in ALTER TABLE (PR #59), thanks to Ronan Dunklau
Handle ALTER TYPE ... RENAME (PR #62), , thanks to Ronan Dunklau
1.13 (2020-09-26)
Handle SELECT FROM foo
1.12 (2020-06-08)
1.11 (2020-05-08)
1.10 (2020-01-25)
Fix collation name printer (PR #44), thanks to Ronan Dunklau
Implement CreatePLangStmt printer (PR #42), thanks to Bennie Swart
Fix privileges printer (PR #41), thanks to Bennie Swart
Handle TRUNCATE event in CreateTrigStmt printer (PR #40), thanks to Bennie Swart
Fix function body dollar quoting (PR #39), thanks to Bennie Swart
1.9 (2019-12-20)
Prettier INSERT representation
1.8 (2019-12-07)
Prettier CASE representation
New option to emit a semicolon after the last statement (issue #24)
1.7 (2019-12-01)
Implement NotifyStmt printer
Implement RuleStmt printer, thanks to Gavin M. Roy for his PR #28
Fix RenameStmt, properly handling object name
Produce Python 3.8 wheels, thanks to cibuildwheel 1.0.0
Support ALTER TABLE RENAME CONSTRAINT (PR #35), thanks to Ronan Dunklau
1.6 (2019-09-04)
1.5 (2019-06-04)
1.4 (2019-04-06)
1.3 (2019-03-28)
1.2 (2019-02-13)
1.1 (2018-07-20)
No visible changes, but now PyPI carries binary wheels for Python 3.7.
1.0 (2018-06-16)
Important
The name of the package has been changed from pg_query to pglast, to satisfy the request made by the author of libpg_query in issue #9.
This affects both the main repository on GitHub, that from now on is https://github.com/lelit/pglast, and the ReadTheDocs project that hosts the documentation, http://pglast.readthedocs.io/en/latest/.
I’m sorry for any inconvenience this may cause.
0.28 (2018-06-06)
Update libpg_query to 10-1.0.2
Support the ‘?’-style parameter placeholder variant allowed by libpg_query (details)
0.27 (2018-04-15)
Prettier JOINs representation, aligning them with the starting relation
0.26 (2018-04-03)
Fix cosmetic issue with ANY() and ALL()
0.25 (2018-03-31)
Fix issue in the safety belt check performed by pgpp (issue #4)
0.24 (2018-03-02)
Implement Null printer
0.23 (2017-12-28)
Implement some other DDL statements printers
New alternative style to print comma-separated-values lists, activated by a new --comma-at-eoln option on pgpp
0.22 (2017-12-03)
Implement TransactionStmt and almost all DROP xxx printers
0.21 (2017-11-22)
Implement NamedArgExpr printer
New alternative printers for a set of special functions, activated by a new --special-functions option on pgpp (issue #2)
0.20 (2017-11-21)
Handle special de-reference (A_Indirection) cases
0.19 (2017-11-16)
Fix serialization of column labels containing double quotes
Fix corner issues surfaced implementing some more DDL statement printers
0.18 (2017-11-14)
Fix endless loop due to sloppy conversion of command line option
Install the command line tool as pgpp
0.17 (2017-11-12)
Rename printers.sql to printers.dml (backward incompatibility)
List printer functions in the documentation, referencing the definition of related node type
Fix inconsistent spacing in JOIN condition inside a nested expression
Fix representation of unbound arrays
Fix representation of interval data type
Initial support for DDL statements
Fix representation of string literals containing single quotes
0.16 (2017-10-31)
Update libpg_query to 10-1.0.0
0.15 (2017-10-12)
Fix indentation of boolean expressions in SELECT’s targets (issue #3)
0.14 (2017-10-09)
Update to latest libpg_query’s 10-latest branch, targeting PostgreSQL 10.0 final
0.13 (2017-09-17)
Fix representation of subselects requiring surrounding parens
0.12 (2017-08-22)
New option --version on the command line tool
Better enums documentation
Release the GIL while calling libpg_query functions
0.11 (2017-08-11)
Nicer indentation for JOINs, making OUTER JOINs stand out
Minor tweaks to lists rendering, with less spurious whitespaces
New option --no-location on the command line tool
0.10 (2017-08-11)
Support Python 3.4 and Python 3.5 as well as Python 3.6
0.9 (2017-08-10)
Fix spacing before the $ character
Handle type modifiers
New option --plpgsql on the command line tool, just for fun
0.8 (2017-08-10)
Add enums subpackages to the documentation with references to their related headers
New compact_lists_margin option to produce a more compact representation when possible (see issue #1)
0.7 (2017-08-10)
Fix sdist including the Sphinx documentation
0.6 (2017-08-10)
New option --parse-tree on the command line tool to show just the parse tree
Sphinx documentation, available online
0.5 (2017-08-09)
Handle some more cases when a name must be double-quoted
Complete the serialization of the WindowDef node, handling its frame options
0.4 (2017-08-09)
Expose the actual PostgreSQL version the underlying libpg_query libray is built on thru a new get_postgresql_version() function
New option safety_belt for the prettify() function, to protect the innocents
Handle serialization of CoalesceExpr and MinMaxExpr
0.3 (2017-08-07)
Handle serialization of ParamRef nodes
Expose a prettify() helper function
0.2 (2017-08-07)
Test coverage at 99%
First attempt at automatic wheel upload to PyPI, let’s see…
0.1 (2017-08-07)
First release (“Hi daddy!”, as my soul would tag it)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Hashes for pglast-2.0.dev1-cp38-cp38-manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1fc93a22916ac3cca5ad4d6c8356c1a4bc23d7b9a6f9bf0c87f3fdb22d3383ec |
|
MD5 | de69c45d57bb130365902c7b92cdee97 |
|
BLAKE2b-256 | eabc81a48533e5c3c2292c5aa8c9c1e61fa9458641bcc377eb0cdcf803867484 |
Hashes for pglast-2.0.dev1-cp38-cp38-manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | cfaf85c7d598d81e80b55d5cede683b083a94b92fa7907effbb4f9b2b595c43f |
|
MD5 | 3f001aeef29f94230989ce1e93cd93b9 |
|
BLAKE2b-256 | 088f17cfe814d89e680574a56f662172b5ff9929d7e49a10d7267d9d805d9a49 |
Hashes for pglast-2.0.dev1-cp38-cp38-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 35fcf1efbbc6f1efdae4223cfb040daf9a9b132c356083237e7b39aecd74e70c |
|
MD5 | ef28891e4bfa822e3716e3f93a7e4949 |
|
BLAKE2b-256 | c9475d31639dff8ec893d8930620e86663d439d255425856a075d38b85530aa4 |
Hashes for pglast-2.0.dev1-cp38-cp38-manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5240f0077a7f7ddf42c9b1beab42566490eed82cb405aa54abd1167d63b5400b |
|
MD5 | ce50e6f0673e645d1407840f6b555f2d |
|
BLAKE2b-256 | a4a316ed0e36b4e41a5c402297cc8533bc09a3f017b03899c5d0d51188dfef26 |
Hashes for pglast-2.0.dev1-cp37-cp37m-manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e61dda81f5455550698aaa2d26bee3cf7a8f7210d82d2533d1edb5b3059e1fb6 |
|
MD5 | fa82ecbb8ac421caff301addbb371657 |
|
BLAKE2b-256 | 3c505bdca2ec5ce1769bef9ff784b1606d4a954104ba4c170f51db2c997dd687 |
Hashes for pglast-2.0.dev1-cp37-cp37m-manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e0b48609976302a8c75f7b5e93452665e05085a7b3c22c2787e5eff9ce8d4fc0 |
|
MD5 | 5adebbe112750289f9cceb3e84b936d4 |
|
BLAKE2b-256 | 77b0ea80688802318f713c765a935f7cbf5a35eba6cc8f68487f55c435903132 |
Hashes for pglast-2.0.dev1-cp37-cp37m-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b5fafbf408b6025d993aad068a5f8935fab21d3737f0707c07421ac2eda665ce |
|
MD5 | 479bf5566d1c97ef3d4b0847f0c9561f |
|
BLAKE2b-256 | 2dc1cbb1803ee968f7ff0ed0a2fd47b8ae4cc6330d5909df0258722b4f79b49b |
Hashes for pglast-2.0.dev1-cp37-cp37m-manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 480826b8d2e95a7eeb855ed76be548c4622338d52bf9b0dc118aaffcc1acac5e |
|
MD5 | 9dab93f0e54d9cb8a7481264369b5670 |
|
BLAKE2b-256 | 9c8ae25d275472febdca94d2d93962060da58f12edfe28a7b7a5971c547bcc66 |
Hashes for pglast-2.0.dev1-cp36-cp36m-manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6f00d086913776eddc92d97183d1db7ef3cacbecb808600131f028159c38d5ab |
|
MD5 | 6c6fc723816078b742b45ab745dd7775 |
|
BLAKE2b-256 | 49a2f93857e7c6723fcb60d95b0c320a100c1aef0b29c04b5fd129de7dcc5725 |
Hashes for pglast-2.0.dev1-cp36-cp36m-manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f44a87688e9abfd0b0d2a0205847e8dcd11d3a7dd86737d9a4c4a55d3b05e232 |
|
MD5 | d6bae075f100d9f4c4a419c824b29dc8 |
|
BLAKE2b-256 | 33a99730f581a664a0fc52ef789a13eda83e8351bb3fd21752140cba4ae80ca2 |
Hashes for pglast-2.0.dev1-cp36-cp36m-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | daa0259034aea73fab3136594dc84ee3689ec6ab92554adf1994241f14059737 |
|
MD5 | 8d14b67b28a4922dce77837e1e6767de |
|
BLAKE2b-256 | 75df119e83e519c7ed4875c0d41ae4c693f5ccb4bc6a5a6801f9b931da362b8c |
Hashes for pglast-2.0.dev1-cp36-cp36m-manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4b00cc1a68462618670221c201a453e78ce8784c25b1e6081e33ab846383ba27 |
|
MD5 | bb5fe94951795cf20d6062557300190c |
|
BLAKE2b-256 | 9353b836f05fcd0f452c0b98322e0592d8d533ead342eb765b084df564e6a8a1 |