Skip to main content

Pythonic wrapper around libpg_query and PostgreSQL prettifier

Project description

author:

Lele Gaifax

contact:

lele@metapensiero.it

license:

GNU General Public License version 3 or later

This is a Python 3 implementation of a wrapper to libpg_query, a C library that repackages the PostgreSQL language parser as a standalone static library.

I needed a better SQL reformatter than the one implemented by sqlparse, and was annoyed by a few glitches (subselects in particular) that ruins the otherwise excellent job it does, considering that it is a generic library that tries to swallow many different SQL dialects.

When I found psqlparse I decided to try implementing a PostgreSQL focused tool: at the beginning it’s been easier than I feared, but I quickly hit some shortcomings in that implementation, so I opted for writing my own solution restarting from scratch, with the following goals:

  • target only Python 3.4+

  • target PostgreSQL 10

  • use a more dynamic approach to represent the parse tree, with a twofold advantage:

    1. it is much less boring to code, because there’s no need to write one Python class for each PostgreSQL node tag

    2. the representation is version agnostic, it can be adapted to newer/older Elephants in a snap

  • allow exploration of parse tree in both directions, because I realized that some kinds of nodes require that knowledge to determine their textual representation

  • avoid introducing arbitrary renames of tags and attributes, so what you read in PostgreSQL documentation/sources[*] is available without the hassle of guessing how a symbol has been mapped

  • use a zero copy approach, keeping the original parse tree returned from the underlying libpg_query functions and have each node just borrow a reference to its own subtree

Introduction

At the lower level the module exposes two libpg_query functions, parse_sql() and parse_plpgsql(), that take respectively an SQL statement and a PLpgSQL statement and return a parse tree as a hierarchy of Python dictionaries, lists and scalar values. In some cases these scalars correspond to some C typedef enums, that are automatically extracted from the PostgreSQL headers and are available as pg_query.enums.

At a higher level that tree is represented by three Python classes, a Node that represents a single node, a List that wraps a sequence of nodes and a Scalar for plain values such a strings, integers, booleans or none.

Every node is identified by a tag, a string label that characterize its content that is exposed as a set of attributes as well as with a dictionary-like interface (technically they implements both a __getattr__ method and a __getitem__ method). When asked for an attribute, the node returns an instance of the base classes, i.e. another Node, or a List or a Scalar, depending on the data type of that item. When the node does not contain the requested attribute it returns a singleton Missing marker instance.

A List wraps a plain Python list and may contains a sequence of Node instances, or in some cases other sub-lists, that can be accessed with the usual syntax, or iterated.

Finally, a Scalar carries a single value of some type, accessible through its value attribute.

On top of that, the module implements two serializations, one that transforms a Node into a raw textual representation and another that returns a prettified representation. The latter is exposed by the pgpp CLI tool, see below for an example.

Installation

As usual, the easiest way is with pip:

$ pip install pg_query

Alternatively you can clone the repository:

$ git clone https://github.com/lelit/pg_query.git --recursive

and install from there:

$ pip install ./pg_query

Development

There is a set of makefiles implementing the most common operations, a make help will show a brief table of contents. A comprehensive test suite, based on pytest, covers 98% of the source lines.

Examples of usage

  • Parse an SQL statement and get its AST root node:

    >>> from pg_query import Node, parse_sql
    >>> root = Node(parse_sql('SELECT foo FROM bar'))
    >>> print(root)
    None=[1*{RawStmt}]
  • Recursively traverse the parse tree:

    >>> for node in root.traverse():
    ...   print(node)
    ...
    None[0]={RawStmt}
    stmt={SelectStmt}
    fromClause[0]={RangeVar}
    inh=<True>
    location=<16>
    relname=<'bar'>
    relpersistence=<'p'>
    op=<0>
    targetList[0]={ResTarget}
    location=<7>
    val={ColumnRef}
    fields[0]={String}
    str=<'foo'>
    location=<7>

    As you can see, the representation of each value is mnemonic: {some_tag} means a Node with tag some_tag, [X*{some_tag}] is a List containing X nodes of that particular kind[] and <value> is a Scalar.

  • Get a particular node:

    >>> from_clause = root[0].stmt.fromClause
    >>> print(from_clause)
    fromClause=[1*{RangeVar}]
  • Obtain some information about a node:

    >>> range_var = from_clause[0]
    >>> print(range_var.node_tag)
    RangeVar
    >>> print(range_var.attribute_names)
    dict_keys(['relname', 'inh', 'relpersistence', 'location'])
    >>> print(range_var.parent_node)
    stmt={SelectStmt}
  • Iterate over nodes:

    >>> for a in from_clause:
    ...     print(a)
    ...     for b in a:
    ...         print(b)
    ...
    fromClause[0]={RangeVar}
    inh=<True>
    location=<16>
    relname=<'bar'>
    relpersistence=<'p'>
  • Reformat a SQL statement[] from the command line:

    $ echo "select a,b,c from sometable" | pgpp
    SELECT a
         , b
         , c
    FROM sometable
    
    $ echo 'update "table" set value=123 where value is null' | pgpp
    UPDATE "table"
    SET value = 123
    WHERE value IS NULL
    
    $ echo "
    insert into t (id, description)
    values (1, 'this is short enough'),
           (2, 'this is too long, and will be splitted')" | pgpp -s 20
    INSERT INTO t (id, description)
    VALUES (1, 'this is short enough')
         , (2, 'this is too long, an'
               'd will be splitted')
  • Programmatically reformat a SQL statement:

    >>> from pg_query import prettify
    >>> print(prettify('delete from sometable where value is null'))
    DELETE FROM sometable
    WHERE value IS NULL

Documentation

Latest documentation is hosted by Read the Docs at http://pg-query.readthedocs.io/en/latest/

Changes

0.20 (2017-11-21)

  • Handle special de-reference (A_Indirection) cases

0.19 (2017-11-16)

  • Fix serialization of column labels containing double quotes

  • Fix corner issues surfaced implementing some more DDL statement printers

0.18 (2017-11-14)

  • Fix endless loop due to sloppy conversion of command line option

  • Install the command line tool as pgpp

0.17 (2017-11-12)

  • Rename printers.sql to printers.dml (backward incompatibility)

  • List printer functions in the documentation, referencing the definition of related node type

  • Fix inconsistent spacing in JOIN condition inside a nested expression

  • Fix representation of unbound arrays

  • Fix representation of interval data type

  • Initial support for DDL statements

  • Fix representation of string literals containing single quotes

0.16 (2017-10-31)

  • Update libpg_query to 10-1.0.0

0.15 (2017-10-12)

  • Fix indentation of boolean expressions in SELECT’s targets (issue #3)

0.14 (2017-10-09)

  • Update to latest libpg_query’s 10-latest branch, targeting PostgreSQL 10.0 final

0.13 (2017-09-17)

  • Fix representation of subselects requiring surrounding parens

0.12 (2017-08-22)

  • New option --version on the command line tool

  • Better enums documentation

  • Release the GIL while calling libpg_query functions

0.11 (2017-08-11)

  • Nicer indentation for JOINs, making OUTER JOINs stand out

  • Minor tweaks to lists rendering, with less spurious whitespaces

  • New option --no-location on the command line tool

0.10 (2017-08-11)

  • Support Python 3.4 and Python 3.5 as well as Python 3.6

0.9 (2017-08-10)

  • Fix spacing before the $ character

  • Handle type modifiers

  • New option --plpgsql on the command line tool, just for fun

0.8 (2017-08-10)

  • Add enums subpackages to the documentation with references to their related headers

  • New compact_lists_margin option to produce a more compact representation when possible (see issue #1)

0.7 (2017-08-10)

  • Fix sdist including the Sphinx documentation

0.6 (2017-08-10)

  • New option --parse-tree on the command line tool to show just the parse tree

  • Sphinx documentation, available online

0.5 (2017-08-09)

  • Handle some more cases when a name must be double-quoted

  • Complete the serialization of the WindowDef node, handling its frame options

0.4 (2017-08-09)

  • Expose the actual PostgreSQL version the underlying libpg_query libray is built on thru a new get_postgresql_version() function

  • New option safety_belt for the prettify() function, to protect the innocents

  • Handle serialization of CoalesceExpr and MinMaxExpr

0.3 (2017-08-07)

  • Handle serialization of ParamRef nodes

  • Expose a prettify() helper function

0.2 (2017-08-07)

  • Test coverage at 99%

  • First attempt at automatic wheel upload to PyPI, let’s see…

0.1 (2017-08-07)

  • First release (“Hi daddy!”, as my soul would tag it)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pg_query-0.20.tar.gz (1.5 MB view details)

Uploaded Source

Built Distributions

pg_query-0.20-cp36-cp36m-manylinux1_x86_64.whl (1.1 MB view details)

Uploaded CPython 3.6m

pg_query-0.20-cp36-cp36m-manylinux1_i686.whl (1.0 MB view details)

Uploaded CPython 3.6m

pg_query-0.20-cp35-cp35m-manylinux1_x86_64.whl (1.1 MB view details)

Uploaded CPython 3.5m

pg_query-0.20-cp35-cp35m-manylinux1_i686.whl (1.0 MB view details)

Uploaded CPython 3.5m

pg_query-0.20-cp34-cp34m-manylinux1_x86_64.whl (1.1 MB view details)

Uploaded CPython 3.4m

pg_query-0.20-cp34-cp34m-manylinux1_i686.whl (1.0 MB view details)

Uploaded CPython 3.4m

File details

Details for the file pg_query-0.20.tar.gz.

File metadata

  • Download URL: pg_query-0.20.tar.gz
  • Upload date:
  • Size: 1.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for pg_query-0.20.tar.gz
Algorithm Hash digest
SHA256 d1164dafa1a0c575fafd98f6ae9fc1cb670f846607222f26583e1e8ff0ba5704
MD5 87c75b9b1e5c327938290cfb0c151cec
BLAKE2b-256 74add0b776ea29ccfbb92682f3e5f1f668263d5c527aa31c8b8d5c51f0a2f501

See more details on using hashes here.

Provenance

File details

Details for the file pg_query-0.20-cp36-cp36m-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for pg_query-0.20-cp36-cp36m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 1ebb4bebb33f3bb8f60e3da393ae2346a2c63b0358e1004c1ce82d2ddf739524
MD5 545271b02ceeb389f70935745318a281
BLAKE2b-256 d27dca47ff75c54eba3c28ccb4515ff7a79d09fe211802be1f86dfdac0ebc596

See more details on using hashes here.

Provenance

File details

Details for the file pg_query-0.20-cp36-cp36m-manylinux1_i686.whl.

File metadata

File hashes

Hashes for pg_query-0.20-cp36-cp36m-manylinux1_i686.whl
Algorithm Hash digest
SHA256 17ed3134b0a7316b0647319ae53b3a2fdbe59ad4084792152a711d44b5d23cd1
MD5 db091c97781be5817e999f18e77ccb00
BLAKE2b-256 4a6ca909250b8a7c98863372082ce6eee3e9bba9f75641344e6ef49f8f97d106

See more details on using hashes here.

Provenance

File details

Details for the file pg_query-0.20-cp35-cp35m-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for pg_query-0.20-cp35-cp35m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 da7e44d57e7a1d3ca5bd259d5db65d9a1157f43246cd45302345230bc4c2d131
MD5 504af4df78d42492f00ce17f0b82aa40
BLAKE2b-256 bdf8118a50599afdc88c75710ac4a7c33ddcde34df24bc8d3f1ffa5530eeee03

See more details on using hashes here.

Provenance

File details

Details for the file pg_query-0.20-cp35-cp35m-manylinux1_i686.whl.

File metadata

File hashes

Hashes for pg_query-0.20-cp35-cp35m-manylinux1_i686.whl
Algorithm Hash digest
SHA256 57f8e8fe6f6ecd190762fe67d1b305dfe0bcb0416c4633570112f63abd88b51b
MD5 cd643ef4808510c61355f8cbbdb17b9d
BLAKE2b-256 35a50f6850b1677aad151b8ee253b6f9efb47c20ce3133dba06c4a1f2b877a60

See more details on using hashes here.

Provenance

File details

Details for the file pg_query-0.20-cp34-cp34m-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for pg_query-0.20-cp34-cp34m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 5305f10cc59b460c796b7a4f0cd8f579b84a252f8a7b5aa00cc943a2e0dc76eb
MD5 1204151f0491966c06740bafac6f6458
BLAKE2b-256 ba63def6881b09645dee138ec71fa8c424205581b3bcc89d9d47085788405fb1

See more details on using hashes here.

Provenance

File details

Details for the file pg_query-0.20-cp34-cp34m-manylinux1_i686.whl.

File metadata

File hashes

Hashes for pg_query-0.20-cp34-cp34m-manylinux1_i686.whl
Algorithm Hash digest
SHA256 918cfeacca52f7eb8a77e68f06adee26aa03b2e69e6c5ed54fca87f0d8b1510c
MD5 cfe627586eb35faeab438e5e41d589f0
BLAKE2b-256 4a9cc5fa24cdc6e38851ffba137db974ec4d31e536b59c7a9e2b3bc991fc5bc3

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page