PostgreSQL Languages AST and statements prettifier
Project description
- Contact:
- lele@metapensiero.it
- License:
- Status:
- Version:
- 3
This is a Python 3 module that exposes the parse tree of a PostgreSQL statement (extracted by the almost standard PG parser repackaged as a standalone static library by libpg_query) as set of interconnected nodes, usually called an abstract syntax tree.
Introduction
At the lower level the module exposes several libpg_query functions, parse_sql_json(), parse_sql_protobuf(), parse_plpgsql_json(), fingerprint(), scan, split() and deparse_protobuf(); the first two take an SQL statement and return the correspondent parse tree respectively as a JSON encoded value and a Protobuf encoded value; the third function takes a PLpgSQL statement and returns the parse tree as JSON, the fourth returns a sequence of tokens that compose a SQL statement, the fifth returns a signature of the given statement, the sixth returns a sequence of the single statements and the last one accepts a Protobuf-serialized statement and reproduce the original SQL statement.
One more function, parse_sql(), returns the syntax tree represented by a hierarchy of instances of the classes implemented in the pglast.ast module.
At a higher level that tree is represented by three Python classes, a Node that represents a single node, a List that wraps a sequence of nodes and a Scalar for plain values such a strings, integers, booleans or none.
Every node is identified by a tag, a string label that characterizes its content that is exposed as a set of attributes as well as with a dictionary-like interface (technically they implements both a __getattr__ method and a __getitem__ method). When asked for an attribute, the node returns an instance of the base classes, i.e. another Node, or a List or a Scalar, depending on the data type of that item. When the node does not contain the requested attribute it returns a singleton Missing marker instance.
A List wraps a plain Python list and may contains a sequence of Node instances, or in some cases other sub-lists, that can be accessed with the usual syntax, or iterated.
Finally, a Scalar carries a single value of some type, accessible through its value attribute.
On top of that, the module implements two serializations, one that transforms a Node into a raw textual representation and another that returns a prettified representation. The latter is exposed by the pgpp CLI tool, see the Usage section in the documentation for an example.
Installation
As usual, the easiest way is with pip:
$ pip install pglast
Alternatively you can clone the repository:
$ git clone https://github.com/lelit/pglast.git --recursive
and install from there:
$ pip install ./pglast
Development
There is a set of makefiles implementing the most common operations, a make help will show a brief table of contents. A comprehensive test suite, based on pytest, covers 98% of the source lines.
Documentation
Latest documentation is hosted by Read the Docs at http://pglast.readthedocs.io/en/latest/
History
Version 1
I needed a better SQL reformatter than the one implemented by sqlparse, and was annoyed by a few glitches (subselects in particular) that ruins the otherwise excellent job it does, considering that it is a generic library that tries to swallow many different SQL dialects.
When I found psqlparse I decided to try implementing a PostgreSQL focused tool: at the beginning it’s been easier than I feared, but I quickly hit some shortcomings in that implementation, so I opted for writing my own solution restarting from scratch, with the following goals:
target only Python 3.4+
target PostgreSQL 10+
use a more dynamic approach to represent the parse tree, with a twofold advantage:
it is much less boring to code, because there’s no need to write one Python class for each PostgreSQL node tag
the representation is version agnostic, it can be adapted to newer/older Elephants in a snap
allow exploration of parse tree in both directions, because I realized that some kinds of nodes require that knowledge to determine their textual representation
avoid introducing arbitrary renames of tags and attributes, so what you read in PostgreSQL documentation/sources is available without the hassle of guessing how a symbol has been mapped
use a zero copy approach, keeping the original parse tree returned from the underlying libpg_query functions and have each node just borrow a reference to its own subtree
Version 2
In late 2019, Ronan Dunklau opened PR #62 against libpg_query, that reimplemented the build machinery of the library to make it easier (read, semi-automatic) to support PostgreSQL 12, and PR #36 to bring pglast in line.
Since that version of PostgreSQL inevitably introduced some backward incompatibilities, I bumped the major version of pglast to better reflect the fact.
As I’m writing this, the fate of PR #62 is still unclear, so for the time being I switched the libpg_query submodule to Ronan’s fork.
I’m going to keep version 1 aligned to the original Lukas’ PG 10 branch.
Version 3
In early 2021, Lukas put a considerable effort into evolving his library to target PostgreSQL 13. He introduced a richer protobuf-based AST serialization protocol, rewriting the underlying machinery so that the same code is used to generate either a JSON or a protobuf stream.
The approach has obvious advantages, but unfortunately both formats come with different shortcomings, and I was not able to adapt pglast. The JSON serialization has changed in a way that it not anymore sufficient to rebuild the original AST because some attributes now carry an implicit structure, that requires additional information to understand the content (see issue #82). OTOH, the Protobuf format is clumsy, at least on the Python side: the Google’s compiler creates a huge and unreadable module, while other implementations (see pyrobuf, cprotobuf and betterproto) suffer of different issues (see issue #210).
After several attempts, I decided to follow a more rewarding way and implement a native Python wrapper layer on top of PG parser’s nodes.
Changes
V3
3.0.dev0 (2021-05-03)
Expose the new pg_query_scan() function as parser.scan()
Expose the pg_query_parse() function as parser.parse_sql_json()
Expose the new pg_query_parse_protobuf() function as parser.parse_sql_protobuf()
Expose the new pg_query_deparse_protobuf() function as parser.deparse_protobuf()
Honor the catalogname of a RangeVar if present (issue #71)
Cover almost all SQL statements, testing against the whole PostgreSQL regression suite (issue #68, PR #72 and PR #77), thanks to Ronan Dunklau and Hong Cheng
New rudimentary support for the preserve comments feature (issue #23)
Breaking changes
Target PostgreSQL 13
The pglast.parser module exposes all libpg_query entry points, even the new pg_query_deparse_protobuf() function that is basically equivalent to RawStream-based printer
The split() function is now based on the lower level pg_query_split_with_xxx() functions
The parse_sql() function returns native Python objects, not a JSON string as before: all PG nodes are now represented by subclasses of pglast.ast.Node, without exception, even Expr and Value are there. The latter impacts on pglast.node.Scalar: for example it now may contains a ast.Integer instance instead of a Python int
The pgpp --parse-tree output is a pprint represention of the AST, not a JSON string as before
The ParseError exception does not expose the location as an instance member anymore, although its still there, as the second argument (ie .args[1]); furthermore, its value now corresponds to the index in the original Unicode string, instead of the offset in the UTF-8 representation passed to the underlying C function
V2
2.0.dev3 (2021-02-20)
Handle INCLUDE clause in IndexStmt (PR #67), thanks to Ronan Dunklau
2.0.dev2 (2020-10-24)
Merge new fingerprint functionality from v1 (i.e. master) branch
2.0.dev1 (2020-09-26)
Require Python 3.6 or greater
Handle ALTER TYPE .. RENAME VALUE in AlterEnumStmt (PR #52), thanks to Ronan Dunklau
Add support for Create / Alter / Drop PROCEDURE (PR #48), thanks to Ronan Dunklau
Use Ronan’s fork of libpg_query, targeting PostgreSQL 12.1 (PR #36)
Change get_postgresql_version() to return a (major, minor) tuple (issue #38)
Handle ALTER TABLE ... ALTER COLUMN ... SET STORAGE ...
Handle PG12 materialized CTEs (issue #57)
Support column numbers in ALTER INDEX (PR #58), thanks to Ronan Dunklau
Handle SET LOGGED and SET UNLOGGED in ALTER TABLE (PR #59), thanks to Ronan Dunklau
Handle ALTER TYPE ... RENAME (PR #62), , thanks to Ronan Dunklau
V1
1.17 (2021-02-20)
Fix the generic case in the RenameStmt printer
1.16 (2021-02-20)
Promote to the stable state
Move the job of building and uploading binary wheels from TravisCI to GitHub Actions
1.15 (2021-02-19)
Fix IF EXISTS variant of RenameStmt printer (PR #70), thanks to Jonathan Mortensen
Update libpg_query to 10-1.0.5
1.14 (2020-10-24)
Produce Python 3.9 wheels, thanks to cibuildwheel 1.6.3
Expose the libpg_query’s fingerprint functionality (PR #64), thanks to Yiming Wang
1.13 (2020-09-26)
Handle SELECT FROM foo
1.12 (2020-06-08)
1.11 (2020-05-08)
1.10 (2020-01-25)
Fix collation name printer (PR #44), thanks to Ronan Dunklau
Implement CreatePLangStmt printer (PR #42), thanks to Bennie Swart
Fix privileges printer (PR #41), thanks to Bennie Swart
Handle TRUNCATE event in CreateTrigStmt printer (PR #40), thanks to Bennie Swart
Fix function body dollar quoting (PR #39), thanks to Bennie Swart
1.9 (2019-12-20)
Prettier INSERT representation
1.8 (2019-12-07)
Prettier CASE representation
New option to emit a semicolon after the last statement (issue #24)
1.7 (2019-12-01)
Implement NotifyStmt printer
Implement RuleStmt printer, thanks to Gavin M. Roy for his PR #28
Fix RenameStmt, properly handling object name
Produce Python 3.8 wheels, thanks to cibuildwheel 1.0.0
Support ALTER TABLE RENAME CONSTRAINT (PR #35), thanks to Ronan Dunklau
1.6 (2019-09-04)
1.5 (2019-06-04)
1.4 (2019-04-06)
1.3 (2019-03-28)
1.2 (2019-02-13)
1.1 (2018-07-20)
No visible changes, but now PyPI carries binary wheels for Python 3.7.
1.0 (2018-06-16)
0.28 (2018-06-06)
Update libpg_query to 10-1.0.2
Support the ‘?’-style parameter placeholder variant allowed by libpg_query (details)
0.27 (2018-04-15)
Prettier JOINs representation, aligning them with the starting relation
0.26 (2018-04-03)
Fix cosmetic issue with ANY() and ALL()
0.25 (2018-03-31)
Fix issue in the safety belt check performed by pgpp (issue #4)
0.24 (2018-03-02)
Implement Null printer
0.23 (2017-12-28)
Implement some other DDL statements printers
New alternative style to print comma-separated-values lists, activated by a new --comma-at-eoln option on pgpp
0.22 (2017-12-03)
Implement TransactionStmt and almost all DROP xxx printers
0.21 (2017-11-22)
Implement NamedArgExpr printer
New alternative printers for a set of special functions, activated by a new --special-functions option on pgpp (issue #2)
0.20 (2017-11-21)
Handle special de-reference (A_Indirection) cases
0.19 (2017-11-16)
Fix serialization of column labels containing double quotes
Fix corner issues surfaced implementing some more DDL statement printers
0.18 (2017-11-14)
Fix endless loop due to sloppy conversion of command line option
Install the command line tool as pgpp
0.17 (2017-11-12)
Rename printers.sql to printers.dml (backward incompatibility)
List printer functions in the documentation, referencing the definition of related node type
Fix inconsistent spacing in JOIN condition inside a nested expression
Fix representation of unbound arrays
Fix representation of interval data type
Initial support for DDL statements
Fix representation of string literals containing single quotes
0.16 (2017-10-31)
Update libpg_query to 10-1.0.0
0.15 (2017-10-12)
Fix indentation of boolean expressions in SELECT’s targets (issue #3)
0.14 (2017-10-09)
Update to latest libpg_query’s 10-latest branch, targeting PostgreSQL 10.0 final
0.13 (2017-09-17)
Fix representation of subselects requiring surrounding parens
0.12 (2017-08-22)
New option --version on the command line tool
Better enums documentation
Release the GIL while calling libpg_query functions
0.11 (2017-08-11)
Nicer indentation for JOINs, making OUTER JOINs stand out
Minor tweaks to lists rendering, with less spurious whitespaces
New option --no-location on the command line tool
0.10 (2017-08-11)
Support Python 3.4 and Python 3.5 as well as Python 3.6
0.9 (2017-08-10)
Fix spacing before the $ character
Handle type modifiers
New option --plpgsql on the command line tool, just for fun
0.8 (2017-08-10)
Add enums subpackages to the documentation with references to their related headers
New compact_lists_margin option to produce a more compact representation when possible (see issue #1)
0.7 (2017-08-10)
Fix sdist including the Sphinx documentation
0.6 (2017-08-10)
New option --parse-tree on the command line tool to show just the parse tree
Sphinx documentation, available online
0.5 (2017-08-09)
Handle some more cases when a name must be double-quoted
Complete the serialization of the WindowDef node, handling its frame options
0.4 (2017-08-09)
Expose the actual PostgreSQL version the underlying libpg_query libray is built on thru a new get_postgresql_version() function
New option safety_belt for the prettify() function, to protect the innocents
Handle serialization of CoalesceExpr and MinMaxExpr
0.3 (2017-08-07)
Handle serialization of ParamRef nodes
Expose a prettify() helper function
0.2 (2017-08-07)
Test coverage at 99%
First attempt at automatic wheel upload to PyPI, let’s see…
0.1 (2017-08-07)
First release (“Hi daddy!”, as my soul would tag it)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for pglast-3.0.dev0-cp39-cp39-manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 87638206089834b534658df1e0484ea7183c35785dd4b0a842dcea2c87f3def1 |
|
MD5 | 189e509a5f7a96ea5c90294822da06ee |
|
BLAKE2b-256 | 8760bbc519c6815e1613dd5549e1cc4461b466fae12c496cdd855affe3983336 |
Hashes for pglast-3.0.dev0-cp39-cp39-manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5bfae412639a9856fa623fd26273827ecb00ed1186906f8f41eb38a676f4e3fc |
|
MD5 | c17884c9869dfeda5b76332dfb94008b |
|
BLAKE2b-256 | c3d723ad263489259d95a4b8fa8af0c48e7439212f3a70458f71a7656908f68b |
Hashes for pglast-3.0.dev0-cp39-cp39-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 08f06793704fc4db6b48d767584496d5b3240efcdcb53fe88ca17d4d2026aa99 |
|
MD5 | 0c6b53f53581a86b9d8e47210ea50d5f |
|
BLAKE2b-256 | f598e50dc25233ada907989751e210ff06c6394959622d2f7f4455ce2a00b1dc |
Hashes for pglast-3.0.dev0-cp39-cp39-manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a5efcf0b6a414e997a96ef20d60d7764b0f6becdf4784e05f5536f7d3aea8a56 |
|
MD5 | cbc935baa06b60c15a2fcef7f1b9644d |
|
BLAKE2b-256 | be5b795094d905a082464fd58aa98e5f163e54903adabec8dd04f77b31d9e484 |
Hashes for pglast-3.0.dev0-cp38-cp38-manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d7031688b447eeda55946e8e2efcc9c96f331a815a7ecb2a655ec093abb2fc7a |
|
MD5 | fca26c1d17e77764b0fa9baebe6909fa |
|
BLAKE2b-256 | b2c2650424fe25177504c06245ceb9c1cd0429aebf37c0bd8691aed1fa12a9fd |
Hashes for pglast-3.0.dev0-cp38-cp38-manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2979a88eae0e352b218dc666b4c6e91b6327dc48559b0e985196cd422816750c |
|
MD5 | c822c7bcdd6def73e48940e0a0bc4507 |
|
BLAKE2b-256 | 2ab36ec6a9975cd505ec3bdd7b7fccb16c8ce777e99ecb3cf2432d83fe3b9d87 |
Hashes for pglast-3.0.dev0-cp38-cp38-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 901ae896c5e3bc0cc66cfdb66b08775951ff2446ed07bb7bdd534d92795b4a95 |
|
MD5 | ec8e7787a271c6aa55e2de833821e9f2 |
|
BLAKE2b-256 | d19413766a4d5c873bb9fe83cf13b730dc257fc6d2c4c23bb7d6a13bc776d0f6 |
Hashes for pglast-3.0.dev0-cp38-cp38-manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | cc7e13543be63d7a09e7ebafe06f20e550425f0872c2ae00b8139a508cbd12fa |
|
MD5 | 7abfc6c8f09850da09425601ea291e0e |
|
BLAKE2b-256 | 9d94504f94900af2bf32867245a09c08b044ff4662cd84c75aeb3a46ca873455 |
Hashes for pglast-3.0.dev0-cp37-cp37m-manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a1f7a85041165c490965f6ee56f00c708bd58f079278868f9337f07b60d525f8 |
|
MD5 | 56018de85ba347df7286127f5ce700e6 |
|
BLAKE2b-256 | 02cc297dd9a56f969b7c5f0c5ac4088127a0a9e295c0422ed4dd8f1d6a02a33f |
Hashes for pglast-3.0.dev0-cp37-cp37m-manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4ee01c23f91c0ea5783013eb9ccd2e7cbf311fa09420cb7baaebba0b97c9947e |
|
MD5 | 9518147604d661ffdffcabc5f8170e2b |
|
BLAKE2b-256 | 9163baa64b0b2f9175001c85193ddf17936325c99a103878bd8e5971bf017b35 |
Hashes for pglast-3.0.dev0-cp37-cp37m-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0fa6fc91eaf0a66e729572d3d5938bf6ea30498034358322b32d266c33d363f2 |
|
MD5 | 7e184cdad47107c5ee4e166165dc5573 |
|
BLAKE2b-256 | 8dc2cedc35d795aebd9703b4e86e9e17006a8cec8ff060e31718ca2867533a1a |
Hashes for pglast-3.0.dev0-cp37-cp37m-manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9fe23d0cf4c91703c4a25f5d320e6807962310eae22f0a795e8ee5f4f4a88622 |
|
MD5 | 20aa3085460b50ac7ae4cadee4e7c7bd |
|
BLAKE2b-256 | fccfbc85b501f93912e17612025bcbdb28f53c2b217c3dd6cde8d421cdbdd37d |
Hashes for pglast-3.0.dev0-cp36-cp36m-manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c054b3e4156d2c1577be2714929c257c15f6350063d6a0c70d01bfc9d5b10ccb |
|
MD5 | 01c66aa3e1b2dabb259c6e5a25ee3b46 |
|
BLAKE2b-256 | d4f8faa52ae034037c089dab1d9c0e3c247ab720d34f6c581ecd2f592ea08159 |
Hashes for pglast-3.0.dev0-cp36-cp36m-manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 558170154a45b5d798d82132110b8cc0b928257349c293b767e4b5185858d1fa |
|
MD5 | 284d05c30ab06e1b4b916495e81e939b |
|
BLAKE2b-256 | c8295e338c50783bb64636e5820ad283c929f471650392adbca4ff02d25bbaff |
Hashes for pglast-3.0.dev0-cp36-cp36m-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d9cf430e973c3766420fd1a75d372f24f09aba7ed0ea9d5751bd4b2e4132f7a8 |
|
MD5 | 36bd06e01d88110c7924a5bc73b1a28e |
|
BLAKE2b-256 | 1cdc68e9a61620e5c7af67537b3c76f3e73a496378ea2c29d4014d6c41dce813 |
Hashes for pglast-3.0.dev0-cp36-cp36m-manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7ed4ec6742a7fa7a2cce682b2158c15200fafa18b76e93b2fe0e87ea6f5a2feb |
|
MD5 | 716f5a4edf55246e32d72f0cf3f24526 |
|
BLAKE2b-256 | 933627af7106eeea5f503fe850d0f5a64461311a26f2b466fefb988dd4d44303 |