PostgreSQL Languages AST and statements prettifier
Project description
- Contact:
- lele@metapensiero.it
- License:
- Status:
- Version:
- 3
This is a Python 3 module that exposes the parse tree of a PostgreSQL statement (extracted by the almost standard PG parser repackaged as a standalone static library by libpg_query) as set of interconnected nodes, usually called an abstract syntax tree.
Introduction
At the lower level the module exposes several libpg_query functions, parse_sql_json(), parse_sql_protobuf(), parse_plpgsql_json(), fingerprint(), scan, split() and deparse_protobuf(); the first two take an SQL statement and return the correspondent parse tree respectively as a JSON encoded value and a Protobuf encoded value; the third function takes a PLpgSQL statement and returns the parse tree as JSON, the fourth returns a sequence of tokens that compose a SQL statement, the fifth returns a signature of the given statement, the sixth returns a sequence of the single statements and the last one accepts a Protobuf-serialized statement and reproduce the original SQL statement.
One more function, parse_sql(), returns the syntax tree represented by a hierarchy of instances of the classes implemented in the pglast.ast module.
At a higher level that tree is represented by three Python classes, a Node that represents a single node, a List that wraps a sequence of nodes and a Scalar for plain values such a strings, integers, booleans or none.
Every node is identified by a tag, a string label that characterizes its content that is exposed as a set of attributes as well as with a dictionary-like interface (technically they implements both a __getattr__ method and a __getitem__ method). When asked for an attribute, the node returns an instance of the base classes, i.e. another Node, or a List or a Scalar, depending on the data type of that item. When the node does not contain the requested attribute it returns a singleton Missing marker instance.
A List wraps a plain Python list and may contains a sequence of Node instances, or in some cases other sub-lists, that can be accessed with the usual syntax, or iterated.
Finally, a Scalar carries a single value of some type, accessible through its value attribute.
On top of that, the module implements two serializations, one that transforms a Node into a raw textual representation and another that returns a prettified representation. The latter is exposed by the pgpp CLI tool, see the Usage section in the documentation for an example.
Installation
As usual, the easiest way is with pip:
$ pip install pglast
Alternatively you can clone the repository:
$ git clone https://github.com/lelit/pglast.git --recursive
and install from there:
$ pip install ./pglast
Development
There is a set of makefiles implementing the most common operations, a make help will show a brief table of contents. A comprehensive test suite, based on pytest, covers 98% of the source lines.
Documentation
Latest documentation is hosted by Read the Docs at http://pglast.readthedocs.io/en/latest/
History
Version 1
I needed a better SQL reformatter than the one implemented by sqlparse, and was annoyed by a few glitches (subselects in particular) that ruins the otherwise excellent job it does, considering that it is a generic library that tries to swallow many different SQL dialects.
When I found psqlparse I decided to try implementing a PostgreSQL focused tool: at the beginning it’s been easier than I feared, but I quickly hit some shortcomings in that implementation, so I opted for writing my own solution restarting from scratch, with the following goals:
target only Python 3.4+
target PostgreSQL 10+
use a more dynamic approach to represent the parse tree, with a twofold advantage:
it is much less boring to code, because there’s no need to write one Python class for each PostgreSQL node tag
the representation is version agnostic, it can be adapted to newer/older Elephants in a snap
allow exploration of parse tree in both directions, because I realized that some kinds of nodes require that knowledge to determine their textual representation
avoid introducing arbitrary renames of tags and attributes, so what you read in PostgreSQL documentation/sources is available without the hassle of guessing how a symbol has been mapped
use a zero copy approach, keeping the original parse tree returned from the underlying libpg_query functions and have each node just borrow a reference to its own subtree
Version 2
In late 2019, Ronan Dunklau opened PR #62 against libpg_query, that reimplemented the build machinery of the library to make it easier (read, semi-automatic) to support PostgreSQL 12, and PR #36 to bring pglast in line.
Since that version of PostgreSQL inevitably introduced some backward incompatibilities, I bumped the major version of pglast to better reflect the fact.
As I’m writing this, the fate of PR #62 is still unclear, so for the time being I switched the libpg_query submodule to Ronan’s fork.
I’m going to keep version 1 aligned to the original Lukas’ PG 10 branch.
Version 3
In early 2021, Lukas put a considerable effort into evolving his library to target PostgreSQL 13. He introduced a richer protobuf-based AST serialization protocol, rewriting the underlying machinery so that the same code is used to generate either a JSON or a protobuf stream.
The approach has obvious advantages, but unfortunately both formats come with different shortcomings, and I was not able to adapt pglast. The JSON serialization has changed in a way that it not anymore sufficient to rebuild the original AST because some attributes now carry an implicit structure, that requires additional information to understand the content (see issue #82). OTOH, the Protobuf format is clumsy, at least on the Python side: the Google’s compiler creates a huge and unreadable module, while other implementations (see pyrobuf, cprotobuf and betterproto) suffer of different issues (see issue #210).
After several attempts, I decided to follow a more rewarding way and implement a native Python wrapper layer on top of PG parser’s nodes.
Changes
V3
3.0.dev1 (2021-05-16)
Fix AT_SetIdentity, AT_EnableReplicaTrig and AlterSubscriptionStmt printers
Improve AlterTSConfigType and IntoClause printers
New generic “visitor pattern” (issue #51) exemplified by a new referenced_relations() function (issue #66)
Refine printing of SQL comments
Implement AlterExtensionStmt printer
3.0.dev0 (2021-05-03)
Expose the new pg_query_scan() function as parser.scan()
Expose the pg_query_parse() function as parser.parse_sql_json()
Expose the new pg_query_parse_protobuf() function as parser.parse_sql_protobuf()
Expose the new pg_query_deparse_protobuf() function as parser.deparse_protobuf()
Honor the catalogname of a RangeVar if present (issue #71)
Cover almost all SQL statements, testing against the whole PostgreSQL regression suite (issue #68, PR #72 and PR #77), thanks to Ronan Dunklau and Hong Cheng
New rudimentary support for the preserve comments feature (issue #23)
Breaking changes
Target PostgreSQL 13
The pglast.parser module exposes all libpg_query entry points, even the new pg_query_deparse_protobuf() function that is basically equivalent to RawStream-based printer
The split() function is now based on the lower level pg_query_split_with_xxx() functions
The parse_sql() function returns native Python objects, not a JSON string as before: all PG nodes are now represented by subclasses of pglast.ast.Node, without exception, even Expr and Value are there. The latter impacts on pglast.node.Scalar: for example it now may contains a ast.Integer instance instead of a Python int
The pgpp --parse-tree output is a pprint represention of the AST, not a JSON string as before
The ParseError exception does not expose the location as an instance member anymore, although its still there, as the second argument (ie .args[1]); furthermore, its value now corresponds to the index in the original Unicode string, instead of the offset in the UTF-8 representation passed to the underlying C function
V2
2.0.dev3 (2021-02-20)
Handle INCLUDE clause in IndexStmt (PR #67), thanks to Ronan Dunklau
2.0.dev2 (2020-10-24)
Merge new fingerprint functionality from v1 (i.e. master) branch
2.0.dev1 (2020-09-26)
Require Python 3.6 or greater
Handle ALTER TYPE .. RENAME VALUE in AlterEnumStmt (PR #52), thanks to Ronan Dunklau
Add support for Create / Alter / Drop PROCEDURE (PR #48), thanks to Ronan Dunklau
Use Ronan’s fork of libpg_query, targeting PostgreSQL 12.1 (PR #36)
Change get_postgresql_version() to return a (major, minor) tuple (issue #38)
Handle ALTER TABLE ... ALTER COLUMN ... SET STORAGE ...
Handle PG12 materialized CTEs (issue #57)
Support column numbers in ALTER INDEX (PR #58), thanks to Ronan Dunklau
Handle SET LOGGED and SET UNLOGGED in ALTER TABLE (PR #59), thanks to Ronan Dunklau
Handle ALTER TYPE ... RENAME (PR #62), , thanks to Ronan Dunklau
V1
1.17 (2021-02-20)
Fix the generic case in the RenameStmt printer
1.16 (2021-02-20)
Promote to the stable state
Move the job of building and uploading binary wheels from TravisCI to GitHub Actions
1.15 (2021-02-19)
Fix IF EXISTS variant of RenameStmt printer (PR #70), thanks to Jonathan Mortensen
Update libpg_query to 10-1.0.5
1.14 (2020-10-24)
Produce Python 3.9 wheels, thanks to cibuildwheel 1.6.3
Expose the libpg_query’s fingerprint functionality (PR #64), thanks to Yiming Wang
1.13 (2020-09-26)
Handle SELECT FROM foo
1.12 (2020-06-08)
1.11 (2020-05-08)
1.10 (2020-01-25)
Fix collation name printer (PR #44), thanks to Ronan Dunklau
Implement CreatePLangStmt printer (PR #42), thanks to Bennie Swart
Fix privileges printer (PR #41), thanks to Bennie Swart
Handle TRUNCATE event in CreateTrigStmt printer (PR #40), thanks to Bennie Swart
Fix function body dollar quoting (PR #39), thanks to Bennie Swart
1.9 (2019-12-20)
Prettier INSERT representation
1.8 (2019-12-07)
Prettier CASE representation
New option to emit a semicolon after the last statement (issue #24)
1.7 (2019-12-01)
Implement NotifyStmt printer
Implement RuleStmt printer, thanks to Gavin M. Roy for his PR #28
Fix RenameStmt, properly handling object name
Produce Python 3.8 wheels, thanks to cibuildwheel 1.0.0
Support ALTER TABLE RENAME CONSTRAINT (PR #35), thanks to Ronan Dunklau
1.6 (2019-09-04)
1.5 (2019-06-04)
1.4 (2019-04-06)
1.3 (2019-03-28)
1.2 (2019-02-13)
1.1 (2018-07-20)
No visible changes, but now PyPI carries binary wheels for Python 3.7.
1.0 (2018-06-16)
0.28 (2018-06-06)
Update libpg_query to 10-1.0.2
Support the ‘?’-style parameter placeholder variant allowed by libpg_query (details)
0.27 (2018-04-15)
Prettier JOINs representation, aligning them with the starting relation
0.26 (2018-04-03)
Fix cosmetic issue with ANY() and ALL()
0.25 (2018-03-31)
Fix issue in the safety belt check performed by pgpp (issue #4)
0.24 (2018-03-02)
Implement Null printer
0.23 (2017-12-28)
Implement some other DDL statements printers
New alternative style to print comma-separated-values lists, activated by a new --comma-at-eoln option on pgpp
0.22 (2017-12-03)
Implement TransactionStmt and almost all DROP xxx printers
0.21 (2017-11-22)
Implement NamedArgExpr printer
New alternative printers for a set of special functions, activated by a new --special-functions option on pgpp (issue #2)
0.20 (2017-11-21)
Handle special de-reference (A_Indirection) cases
0.19 (2017-11-16)
Fix serialization of column labels containing double quotes
Fix corner issues surfaced implementing some more DDL statement printers
0.18 (2017-11-14)
Fix endless loop due to sloppy conversion of command line option
Install the command line tool as pgpp
0.17 (2017-11-12)
Rename printers.sql to printers.dml (backward incompatibility)
List printer functions in the documentation, referencing the definition of related node type
Fix inconsistent spacing in JOIN condition inside a nested expression
Fix representation of unbound arrays
Fix representation of interval data type
Initial support for DDL statements
Fix representation of string literals containing single quotes
0.16 (2017-10-31)
Update libpg_query to 10-1.0.0
0.15 (2017-10-12)
Fix indentation of boolean expressions in SELECT’s targets (issue #3)
0.14 (2017-10-09)
Update to latest libpg_query’s 10-latest branch, targeting PostgreSQL 10.0 final
0.13 (2017-09-17)
Fix representation of subselects requiring surrounding parens
0.12 (2017-08-22)
New option --version on the command line tool
Better enums documentation
Release the GIL while calling libpg_query functions
0.11 (2017-08-11)
Nicer indentation for JOINs, making OUTER JOINs stand out
Minor tweaks to lists rendering, with less spurious whitespaces
New option --no-location on the command line tool
0.10 (2017-08-11)
Support Python 3.4 and Python 3.5 as well as Python 3.6
0.9 (2017-08-10)
Fix spacing before the $ character
Handle type modifiers
New option --plpgsql on the command line tool, just for fun
0.8 (2017-08-10)
Add enums subpackages to the documentation with references to their related headers
New compact_lists_margin option to produce a more compact representation when possible (see issue #1)
0.7 (2017-08-10)
Fix sdist including the Sphinx documentation
0.6 (2017-08-10)
New option --parse-tree on the command line tool to show just the parse tree
Sphinx documentation, available online
0.5 (2017-08-09)
Handle some more cases when a name must be double-quoted
Complete the serialization of the WindowDef node, handling its frame options
0.4 (2017-08-09)
Expose the actual PostgreSQL version the underlying libpg_query libray is built on thru a new get_postgresql_version() function
New option safety_belt for the prettify() function, to protect the innocents
Handle serialization of CoalesceExpr and MinMaxExpr
0.3 (2017-08-07)
Handle serialization of ParamRef nodes
Expose a prettify() helper function
0.2 (2017-08-07)
Test coverage at 99%
First attempt at automatic wheel upload to PyPI, let’s see…
0.1 (2017-08-07)
First release (“Hi daddy!”, as my soul would tag it)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for pglast-3.0.dev1-cp39-cp39-manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e36a9015e5a36d47f02d6ae6c507c91359eb815fbba76e343da187ab828b396f |
|
MD5 | 121571ef591b7a39b2fbfc16a845db65 |
|
BLAKE2b-256 | 05a1264b87310f3a191ef85b3bf28ce4646e45308aaa4c265035531f65c00899 |
Hashes for pglast-3.0.dev1-cp39-cp39-manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 14df7d85b7e51a4de9798566365b6e8658ef2ef413c4cfd1c818ea3e0830e6b0 |
|
MD5 | f688f5e9f712991a486874aff3cbbb6b |
|
BLAKE2b-256 | 162bead236f62c0d0adf5a6c4505672838bab980d27e81a37920495e10320b3c |
Hashes for pglast-3.0.dev1-cp39-cp39-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2dc3e7cd53568d2d6205eabfe588d90dac981550d3e2a4d98890881c921f4a7f |
|
MD5 | c7c37fbf65f9435d01e53a0f9e9da053 |
|
BLAKE2b-256 | 3481a09232ca3c131cd483aaff8c42d3a798b43b4ba9a8d85f12e27084b78f73 |
Hashes for pglast-3.0.dev1-cp39-cp39-manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5efbc9e69e321692dabb01ec6144f742144f8b4030bda0e5115d5f56cc8bee7a |
|
MD5 | 7c5c2d4c0c5470f08b85ffe9c489f85e |
|
BLAKE2b-256 | bae43f50588efd130a960c8ff2573befbc6e637e91cd5bc7bd920f04ee03b099 |
Hashes for pglast-3.0.dev1-cp38-cp38-manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 728b4952b60df537bf5229db9fbeb21e0d8c077b019822095c93ba1251c9f2fe |
|
MD5 | 4b3b618fe19441702519a064839df15c |
|
BLAKE2b-256 | e7440493f4c9290260e81d77907648374a9862005f23317247c176be223b9ef3 |
Hashes for pglast-3.0.dev1-cp38-cp38-manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 88c4742bd8821e2e589e625ad1448d0e0b81f90661abcf10ed998c30052e2cc0 |
|
MD5 | dc2e4da9da0ef124c166f927fce3cf6f |
|
BLAKE2b-256 | e332766e7edd6923e1a1b7bd924c78b5d095d78f218c045633fa48f50d2479b3 |
Hashes for pglast-3.0.dev1-cp38-cp38-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3d5d709416bdcef3eb275480889af2421095961c6500d299353acbb4529f517e |
|
MD5 | d65039cf1622818228af75e1323cd7f7 |
|
BLAKE2b-256 | 47c34607f170b26648822a23d6c4a8ac8d8008faa1e408b838419ed10b2caf6a |
Hashes for pglast-3.0.dev1-cp38-cp38-manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1739198e8ee82ca7052b19c2dd1be14b9786858b57eb227921793260e0e499c5 |
|
MD5 | f9a1e09df8dd41f5bc6d3e4dd49e914b |
|
BLAKE2b-256 | 7ff15a4879860d3c15e36fb19ec4f311279f77e44dbc8c9192861938d1bcce53 |
Hashes for pglast-3.0.dev1-cp37-cp37m-manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e8e07907a7c2577cfc43370eed39f6415bb0e6406abf7c4ce58c5701df68eee7 |
|
MD5 | 0dfeffcb3522661b8771f18332ebc6b9 |
|
BLAKE2b-256 | 368d2c0884d34a235f66b2b8fc82bc13c68f3c910d5553d27138ee5132b292b3 |
Hashes for pglast-3.0.dev1-cp37-cp37m-manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 68d9ca4d140d530e3a6429d9ff89e279880dd1fbb79a0a298defaa36b9002f20 |
|
MD5 | 930c4e5980471beddc7259485eee536b |
|
BLAKE2b-256 | e3e735ce8e976c3160e2554eb17a317b0a9134f10b818242265f6aa2c092c3a4 |
Hashes for pglast-3.0.dev1-cp37-cp37m-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 570ec67d9565e754a29193b11c4a2f999cd74eaf1992b2611cf3bb3058287c5c |
|
MD5 | adc4f63cd6f68822ea9871fafc856e38 |
|
BLAKE2b-256 | 4d273f3f80c8b3edb5d6595194a615b10b5e879a4a9905a0668a8bcd676f2f97 |
Hashes for pglast-3.0.dev1-cp37-cp37m-manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5ff693fa6125e2f9d0709b0d8f94fe9e8a709dedd9650bbf9401a9cae4b0c3f5 |
|
MD5 | 18d822b57a913c8db67b3d61a394acc7 |
|
BLAKE2b-256 | dc73d0716c89d448b9c26f79c7224717515e08529cc1c4ec6369e6f93362849b |
Hashes for pglast-3.0.dev1-cp36-cp36m-manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 56df0ebcd0d15ef108f111a754172477af0d34db63cdef6c2a1f05048abe2243 |
|
MD5 | 60329ef5185278eced3bba80e5981e03 |
|
BLAKE2b-256 | 47b9e7bca1ee292bff8a7ae015a1cd9161da3991d2eb69dff67d5785537b0b1c |
Hashes for pglast-3.0.dev1-cp36-cp36m-manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1da0291f22cd413c73452b06ebf470cdd3747dbfb889ea23c3b6fb5952fa8ab9 |
|
MD5 | a49b3ad08a909b9b431818cb2db9b7e1 |
|
BLAKE2b-256 | 5faa877190d0a1a9047d0adc5a48f1699f41a73293cd302197d0f6c40a513149 |
Hashes for pglast-3.0.dev1-cp36-cp36m-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4328317b2487e3e0b33c6b9ef3e45bd4f5674310a8034a849f150f5946cf7bb7 |
|
MD5 | f4963f7cf1f0d1fcc8d81f22742d5e67 |
|
BLAKE2b-256 | 8b8340e170f53a4216497135368c30af5b704f782f2e8d669d1bd0e20d77af2c |
Hashes for pglast-3.0.dev1-cp36-cp36m-manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6c7af17ada1b510828b4427a92482cac75237c961bb7ab3184d96e06314ebff2 |
|
MD5 | 01ecd0ce0d0fa5f92b93aec220d46f0c |
|
BLAKE2b-256 | ef8c8af426d97dbf88e8eb734b6d3c84ccee8533c4df2a85490f4491dc22484c |