Pythonic wrapper around libpg_query and SQL prettifier
Project description
- author:
Lele Gaifax
- contact:
- license:
GNU General Public License version 3 or later
This is a Python 3.6+ implementation of a wrapper to libpg_query, a C library that repackages the PostgreSQL languages parser as a standalone static library.
I needed a better SQL reformatter than the one implemented by sqlparse, and was annoyed by a few glitches (subselects in particular) that ruins the otherwise excellent job it does, considering that it is a generic library that tries to swallow many different SQL dialects.
When I found psqlparse I decided to try implementing a PostgreSQL focused tool: at the beginning it’s been easier than I feared, but I quickly hit some shortcomings in that implementation, so I opted for writing my own solution restarting from scratch, with the following goals:
target only Python 3 (more precisely, only 3.6 or higher, mainly because I’m lazy and the enums extraction code uses the auto() helper of the standard library enum module)
target PostgreSQL 10 (in beta 2 as I’m writing this), taking advantage of a work-in-progress branch of the libpg_query library
use a more dynamic approach to represent the parse tree, with a twofold advantage:
it is much less boring to code, because there’s no need to write one Python class for each PostgreSQL node tag
the representation is version agnostic, it can be adapted to newer/older Elephants in a snap
allow exploration of parse tree in both directions, because I realized that some kinds of nodes require that knowledge to determine their textual representation
avoid introducing arbitrary renames of tags and attributes, so what you read in PostgreSQL documentation/sources[*] is available without the hassle of guessing how a symbol has been mapped
use a zero copy approach, keeping the original parse tree returned from the underlying libpg_query functions and have each node just borrow a reference to its own subtree
Introduction
At the lower level the module exposes two libpg_query functions, parse_sql() and parse_plpgsql(), that take respectively an SQL statement and a PLpgSQL statement and return a parse tree as a hierarchy of Python dictionaries, lists and scalar values. In some cases these scalars correspond to some C typedef enums, that are automatically extracted from the PostgreSQL headers and are available as pg_query.enums.
At a higher level that tree is represented by three Python classes, a Node that represents a single node, a List that wraps a sequence of nodes and a Scalar for plain values such a strings, integers, booleans or none.
Every node is identified by a tag, a string label that characterize its content that is exposed as a set of attributes as well as with a dictionary-like interface (technically they implements both a __getattr__ method and a __getitem__ method). When asked for an attribute, the node returns an instance of the base classes, i.e. another Node, or a List or a Scalar, depending on the data type of that item. When the node does not contain the requested attribute it returns a singleton Missing marker instance.
A List wraps a plain Python list and may contains a sequence of Node instances, or in some cases other sub-lists, that can be accessed with the usual syntax, or iterated.
Finally, a Scalar carries a single value of some type, accessible through its value attribute.
On top of that, the module implements two serializations, one that transforms a Node into a raw textual representation and another that returns a prettified representation. The latter is exposed by the __main__ entry point of the package, see below for an example.
Installation
As usual, the easiest way is with pip:
$ pip install pg_query
Alternatively you can clone the repository:
$ git clone https://github.com/lelit/pg_query.git --recursive
and install from there:
$ pip install ./pg_query
Development
There is a set of makefiles implementing the most common operations, a make help will show a brief table of contents. A comprehensive test suite, based on pytest, covers 98% of the source lines.
Examples of usage
Parse an SQL statement and get its AST root node:
>>> from pg_query import Node, parse_sql >>> root = Node(parse_sql('SELECT foo FROM bar')) >>> print(root) None=[1*{RawStmt}]
Recursively traverse the parse tree:
>>> for node in root.traverse(): ... print(node) ... None[0]={RawStmt} stmt={SelectStmt} fromClause[0]={RangeVar} inh=<True> location=<16> relname=<'bar'> relpersistence=<'p'> op=<0> targetList[0]={ResTarget} location=<7> val={ColumnRef} fields[0]={String} str=<'foo'> location=<7>
As you can see, the representation of each value is mnemonic: {some_tag} means a Node with tag some_tag, [X*{some_tag}] is a List containing X nodes of that particular kind[†] and <value> is a Scalar.
Get a particular node:
>>> from_clause = root[0].stmt.fromClause >>> print(from_clause) fromClause=[1*{RangeVar}]
Obtain some information about a node:
>>> range_var = from_clause[0] >>> print(range_var.node_tag) RangeVar >>> print(range_var.attribute_names) dict_keys(['relname', 'inh', 'relpersistence', 'location']) >>> print(range_var.parent_node) stmt={SelectStmt}
Iterate over nodes:
>>> for a in from_clause: ... print(a) ... for b in a: ... print(b) ... fromClause[0]={RangeVar} inh=<True> location=<16> relname=<'bar'> relpersistence=<'p'>
Reformat a SQL statement[‡] from the command line:
$ echo "select a,b,c from sometable" | python -m pg_query SELECT a , b , c FROM sometable $ echo 'update "table" set value=123 where value is null' | python -m pg_query UPDATE "table" SET value = 123 WHERE value IS NULL
Programmatically reformat a SQL statement:
>>> from pg_query import prettify >>> print(prettify('delete from sometable where value is null')) DELETE FROM sometable WHERE value IS NULL
Documentation
Latest documentation is hosted by Read the Docs at http://pg-query.readthedocs.io/en/latest/
This is an approximation, because in principle a list could contain different kinds of nodes, or even sub-lists in some cases: the List representation arbitrarily shows the tag of the first object.
Currently this covers most DML statements such as SELECTs, INSERTs, DELETEs and UPDATEs, fulfilling my needs, but I’d like to extend it to handle also DDL statements and, why not, PLpgSQL instructions too.
Changes
0.8 (2017-08-10)
Add enums subpackages to the documentation with references to their related headers
New compact_lists_margin option to produce a more compact representation when possible (see issue #1)
0.7 (2017-08-10)
Fix sdist including the Sphinx documentation
0.6 (2017-08-10)
New option --parse-tree on the command line tool to show just the parse tree
Sphinx documentation, available online
0.5 (2017-08-09)
Handle some more cases when a name must be double-quoted
Complete the serialization of the WindowDef node, handling its frame options
0.4 (2017-08-09)
Expose the actual PostgreSQL version the underlying libpg_query libray is built on thru a new get_postgresql_version() function
New option safety_belt for the prettify() function, to protect the innocents
Handle serialization of CoalesceExpr and MinMaxExpr
0.3 (2017-08-07)
Handle serialization of ParamRef nodes
Expose a prettify() helper function
0.2 (2017-08-07)
Test coverage at 99%
First attempt at automatic wheel upload to PyPI, let’s see…
0.1 (2017-08-07)
First release (“Hi daddy!”, as my soul would tag it)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
File details
Details for the file pg_query-0.8.tar.gz
.
File metadata
- Download URL: pg_query-0.8.tar.gz
- Upload date:
- Size: 1.5 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 45aec64716ff153b9b45a90652c741554b415ed8b7a24c088ae0fb70271cb75d |
|
MD5 | 6878d8817477d2aeff435cb55979cda8 |
|
BLAKE2b-256 | e0b4db0f7c0221c834aebace312cddf1a1340dbe4b9a47ae14d24734fad529a4 |
Provenance
File details
Details for the file pg_query-0.8-cp36-cp36m-manylinux1_x86_64.whl
.
File metadata
- Download URL: pg_query-0.8-cp36-cp36m-manylinux1_x86_64.whl
- Upload date:
- Size: 1.0 MB
- Tags: CPython 3.6m
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 27b2c1d3b18b5f21ba540807c2b16c1b9217ee575a226b04419eb7b1f21d7350 |
|
MD5 | cee86490bd3730341ceae331b951ca56 |
|
BLAKE2b-256 | 91e3a549ef2880efd21ec0fd59059e346b5b572bbbd22b0ce7581e7aaf7c27f5 |
Provenance
File details
Details for the file pg_query-0.8-cp36-cp36m-manylinux1_i686.whl
.
File metadata
- Download URL: pg_query-0.8-cp36-cp36m-manylinux1_i686.whl
- Upload date:
- Size: 1.0 MB
- Tags: CPython 3.6m
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2545a6ee3b09cc10457f19aad20e4f939a4e39c8f244ba028f1fac34582edc05 |
|
MD5 | 03c3baf22c7947839a92602082df1b4c |
|
BLAKE2b-256 | 65608263e0874f02172355d3369f22a28b4c2804f8dd0156fef102ab10a5c901 |