Skip to main content

Pythonic wrapper around libpg_query and PostgreSQL prettifier

Project description

Author:

Lele Gaifax

Contact:

lele@metapensiero.it

License:

GNU General Public License version 3 or later

Status:

Build status Documentation status

This is a Python 3 implementation of a wrapper to libpg_query, a C library that repackages the PostgreSQL language parser as a standalone static library.

I needed a better SQL reformatter than the one implemented by sqlparse, and was annoyed by a few glitches (subselects in particular) that ruins the otherwise excellent job it does, considering that it is a generic library that tries to swallow many different SQL dialects.

When I found psqlparse I decided to try implementing a PostgreSQL focused tool: at the beginning it’s been easier than I feared, but I quickly hit some shortcomings in that implementation, so I opted for writing my own solution restarting from scratch, with the following goals:

  • target only Python 3.4+

  • target PostgreSQL 10

  • use a more dynamic approach to represent the parse tree, with a twofold advantage:

    1. it is much less boring to code, because there’s no need to write one Python class for each PostgreSQL node tag

    2. the representation is version agnostic, it can be adapted to newer/older Elephants in a snap

  • allow exploration of parse tree in both directions, because I realized that some kinds of nodes require that knowledge to determine their textual representation

  • avoid introducing arbitrary renames of tags and attributes, so what you read in PostgreSQL documentation/sources[*] is available without the hassle of guessing how a symbol has been mapped

  • use a zero copy approach, keeping the original parse tree returned from the underlying libpg_query functions and have each node just borrow a reference to its own subtree

Introduction

At the lower level the module exposes two libpg_query functions, parse_sql() and parse_plpgsql(), that take respectively an SQL statement and a PLpgSQL statement and return a parse tree as a hierarchy of Python dictionaries, lists and scalar values. In some cases these scalars correspond to some C typedef enums, that are automatically extracted from the PostgreSQL headers mentioned above and are available as pg_query.enums.

At a higher level that tree is represented by three Python classes, a Node that represents a single node, a List that wraps a sequence of nodes and a Scalar for plain values such a strings, integers, booleans or none.

Every node is identified by a tag, a string label that characterize its content that is exposed as a set of attributes as well as with a dictionary-like interface (technically they implements both a __getattr__ method and a __getitem__ method). When asked for an attribute, the node returns an instance of the base classes, i.e. another Node, or a List or a Scalar, depending on the data type of that item. When the node does not contain the requested attribute it returns a singleton Missing marker instance.

A List wraps a plain Python list and may contains a sequence of Node instances, or in some cases other sub-lists, that can be accessed with the usual syntax, or iterated.

Finally, a Scalar carries a single value of some type, accessible through its value attribute.

On top of that, the module implements two serializations, one that transforms a Node into a raw textual representation and another that returns a prettified representation. The latter is exposed by the pgpp CLI tool, see below for an example.

Installation

As usual, the easiest way is with pip:

$ pip install pg_query

Alternatively you can clone the repository:

$ git clone https://github.com/lelit/pg_query.git --recursive

and install from there:

$ pip install ./pg_query

Development

There is a set of makefiles implementing the most common operations, a make help will show a brief table of contents. A comprehensive test suite, based on pytest, covers 98% of the source lines.

Examples of usage

  • Parse an SQL statement and get its AST root node:

    >>> from pg_query import Node, parse_sql
    >>> root = Node(parse_sql('SELECT foo FROM bar'))
    >>> print(root)
    None=[1*{RawStmt}]
  • Recursively traverse the parse tree:

    >>> for node in root.traverse():
    ...   print(node)
    ...
    None[0]={RawStmt}
    stmt={SelectStmt}
    fromClause[0]={RangeVar}
    inh=<True>
    location=<16>
    relname=<'bar'>
    relpersistence=<'p'>
    op=<0>
    targetList[0]={ResTarget}
    location=<7>
    val={ColumnRef}
    fields[0]={String}
    str=<'foo'>
    location=<7>

    As you can see, the representation of each value is mnemonic: {some_tag} means a Node with tag some_tag, [X*{some_tag}] is a List containing X nodes of that particular kind[] and <value> is a Scalar.

  • Get a particular node:

    >>> from_clause = root[0].stmt.fromClause
    >>> print(from_clause)
    fromClause=[1*{RangeVar}]
  • Obtain some information about a node:

    >>> range_var = from_clause[0]
    >>> print(range_var.node_tag)
    RangeVar
    >>> print(range_var.attribute_names)
    dict_keys(['relname', 'inh', 'relpersistence', 'location'])
    >>> print(range_var.parent_node)
    stmt={SelectStmt}
  • Iterate over nodes:

    >>> for a in from_clause:
    ...     print(a)
    ...     for b in a:
    ...         print(b)
    ...
    fromClause[0]={RangeVar}
    inh=<True>
    location=<16>
    relname=<'bar'>
    relpersistence=<'p'>
  • Reformat a SQL statement[] from the command line:

    $ echo "select a,b,c from sometable" | pgpp
    SELECT a
         , b
         , c
    FROM sometable
    
    $ echo "select a,b,c from sometable" | pgpp -c
    SELECT a,
           b,
           c
    FROM sometable
    
    $ echo 'update "table" set value=123 where value is null' | pgpp
    UPDATE "table"
    SET value = 123
    WHERE value IS NULL
    
    $ echo "
    insert into t (id, description)
    values (1, 'this is short enough'),
           (2, 'this is too long, and will be splitted')" | pgpp -s 20
    INSERT INTO t (id, description)
    VALUES (1, 'this is short enough')
         , (2, 'this is too long, an'
               'd will be splitted')
  • Programmatically reformat a SQL statement:

    >>> from pg_query import prettify
    >>> print(prettify('delete from sometable where value is null'))
    DELETE FROM sometable
    WHERE value IS NULL

Documentation

Latest documentation is hosted by Read the Docs at http://pg-query.readthedocs.io/en/latest/

Changes

0.24 (2018-03-02)

  • Implement Null printer

0.23 (2017-12-28)

  • Implement some other DDL statements printers

  • New alternative style to print comma-separated-values lists, activated by a new --comma-at-eoln option on pgpp

0.22 (2017-12-03)

  • Implement TransactionStmt and almost all DROP xxx printers

0.21 (2017-11-22)

  • Implement NamedArgExpr printer

  • New alternative printers for a set of special functions, activated by a new --special-functions option on pgpp (issue #2)

0.20 (2017-11-21)

  • Handle special de-reference (A_Indirection) cases

0.19 (2017-11-16)

  • Fix serialization of column labels containing double quotes

  • Fix corner issues surfaced implementing some more DDL statement printers

0.18 (2017-11-14)

  • Fix endless loop due to sloppy conversion of command line option

  • Install the command line tool as pgpp

0.17 (2017-11-12)

  • Rename printers.sql to printers.dml (backward incompatibility)

  • List printer functions in the documentation, referencing the definition of related node type

  • Fix inconsistent spacing in JOIN condition inside a nested expression

  • Fix representation of unbound arrays

  • Fix representation of interval data type

  • Initial support for DDL statements

  • Fix representation of string literals containing single quotes

0.16 (2017-10-31)

  • Update libpg_query to 10-1.0.0

0.15 (2017-10-12)

  • Fix indentation of boolean expressions in SELECT’s targets (issue #3)

0.14 (2017-10-09)

  • Update to latest libpg_query’s 10-latest branch, targeting PostgreSQL 10.0 final

0.13 (2017-09-17)

  • Fix representation of subselects requiring surrounding parens

0.12 (2017-08-22)

  • New option --version on the command line tool

  • Better enums documentation

  • Release the GIL while calling libpg_query functions

0.11 (2017-08-11)

  • Nicer indentation for JOINs, making OUTER JOINs stand out

  • Minor tweaks to lists rendering, with less spurious whitespaces

  • New option --no-location on the command line tool

0.10 (2017-08-11)

  • Support Python 3.4 and Python 3.5 as well as Python 3.6

0.9 (2017-08-10)

  • Fix spacing before the $ character

  • Handle type modifiers

  • New option --plpgsql on the command line tool, just for fun

0.8 (2017-08-10)

  • Add enums subpackages to the documentation with references to their related headers

  • New compact_lists_margin option to produce a more compact representation when possible (see issue #1)

0.7 (2017-08-10)

  • Fix sdist including the Sphinx documentation

0.6 (2017-08-10)

  • New option --parse-tree on the command line tool to show just the parse tree

  • Sphinx documentation, available online

0.5 (2017-08-09)

  • Handle some more cases when a name must be double-quoted

  • Complete the serialization of the WindowDef node, handling its frame options

0.4 (2017-08-09)

  • Expose the actual PostgreSQL version the underlying libpg_query libray is built on thru a new get_postgresql_version() function

  • New option safety_belt for the prettify() function, to protect the innocents

  • Handle serialization of CoalesceExpr and MinMaxExpr

0.3 (2017-08-07)

  • Handle serialization of ParamRef nodes

  • Expose a prettify() helper function

0.2 (2017-08-07)

  • Test coverage at 99%

  • First attempt at automatic wheel upload to PyPI, let’s see…

0.1 (2017-08-07)

  • First release (“Hi daddy!”, as my soul would tag it)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pg_query-0.24.tar.gz (1.5 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

pg_query-0.24-cp36-cp36m-manylinux1_x86_64.whl (1.1 MB view details)

Uploaded CPython 3.6m

pg_query-0.24-cp36-cp36m-manylinux1_i686.whl (1.0 MB view details)

Uploaded CPython 3.6m

pg_query-0.24-cp35-cp35m-manylinux1_x86_64.whl (1.1 MB view details)

Uploaded CPython 3.5m

pg_query-0.24-cp35-cp35m-manylinux1_i686.whl (1.0 MB view details)

Uploaded CPython 3.5m

pg_query-0.24-cp34-cp34m-manylinux1_x86_64.whl (1.1 MB view details)

Uploaded CPython 3.4m

pg_query-0.24-cp34-cp34m-manylinux1_i686.whl (1.0 MB view details)

Uploaded CPython 3.4m

File details

Details for the file pg_query-0.24.tar.gz.

File metadata

  • Download URL: pg_query-0.24.tar.gz
  • Upload date:
  • Size: 1.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for pg_query-0.24.tar.gz
Algorithm Hash digest
SHA256 1742c046c969a2a368220251f3fd6bfd214d114abbaa831494e58a3930d51400
MD5 aaf4da666d444ce20a079ce3b322ff1a
BLAKE2b-256 9874c11822afb03a0ce8ffb80c459d2c6dd1fdffd372894162dbe181984aab51

See more details on using hashes here.

File details

Details for the file pg_query-0.24-cp36-cp36m-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for pg_query-0.24-cp36-cp36m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 14e3e1a49128a11c5d560265626a355f6c006b47889aed433a3d10dee9714443
MD5 6aa44fd5c9cb9a688d1d16be1140fbde
BLAKE2b-256 80a1f4d508018d86a5786988e5495ebe342b40dcfe9e74063beed68fb0c3f554

See more details on using hashes here.

File details

Details for the file pg_query-0.24-cp36-cp36m-manylinux1_i686.whl.

File metadata

File hashes

Hashes for pg_query-0.24-cp36-cp36m-manylinux1_i686.whl
Algorithm Hash digest
SHA256 0307fd78e7d846c5a6c5c31dbbc6c46d9bf27bcdad9495d406b52ed9a08fdfc2
MD5 c584fed60caa3f73d2dbea4b68a92f89
BLAKE2b-256 3a7bdc0f3aa729c61f4c36f5272e5f3e4845cc0bec8b123126eb876a1899e188

See more details on using hashes here.

File details

Details for the file pg_query-0.24-cp35-cp35m-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for pg_query-0.24-cp35-cp35m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 6cab98daf6719ad7a22faa4e5c254f94823ad5de7dd9feb20882b850cc7912c0
MD5 ac602cf80ec81d106ecad68dad66473e
BLAKE2b-256 e33700cd9d8c437d4f510fb08490eea363d5a6833eabf92b5c99bf43bf1e712c

See more details on using hashes here.

File details

Details for the file pg_query-0.24-cp35-cp35m-manylinux1_i686.whl.

File metadata

File hashes

Hashes for pg_query-0.24-cp35-cp35m-manylinux1_i686.whl
Algorithm Hash digest
SHA256 8980a9986e152c86ce65d7ca53ae5787b02c100d878c59ba53885890e005c4ea
MD5 27d1bea4a237c13a343fe4ca0b23a3cd
BLAKE2b-256 0bb965e8402bc69a75043ad31f0090acf04accc3f78286c1f7a6cf117b0d8059

See more details on using hashes here.

File details

Details for the file pg_query-0.24-cp34-cp34m-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for pg_query-0.24-cp34-cp34m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 707cc0d8dfa38449bc6210fcc7c4e2a8849768c711729262066ee1a6229950a0
MD5 254f6ffc5b11a66f94571b78940a4e70
BLAKE2b-256 fbf963d944effd16f7dc7e076c46052b756aad1ddb59c0a7391f9b93bb3f92d7

See more details on using hashes here.

File details

Details for the file pg_query-0.24-cp34-cp34m-manylinux1_i686.whl.

File metadata

File hashes

Hashes for pg_query-0.24-cp34-cp34m-manylinux1_i686.whl
Algorithm Hash digest
SHA256 887798d82feb1acac37cfc773acba6f192371d37d8a7816a252734f313dc635e
MD5 701661c3edc43e625f1b33baa56ff0cf
BLAKE2b-256 d66c3c1b7d9cf109389bd98698904be0026df6a694662618531fb35540e0a467

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page