Skip to main content

Pythonic wrapper around libpg_query and SQL prettifier

Project description

author:

Lele Gaifax

contact:

lele@metapensiero.it

license:

GNU General Public License version 3 or later

This is a Python 3 implementation of a wrapper to libpg_query, a C library that repackages the PostgreSQL languages parser as a standalone static library.

I needed a better SQL reformatter than the one implemented by sqlparse, and was annoyed by a few glitches (subselects in particular) that ruins the otherwise excellent job it does, considering that it is a generic library that tries to swallow many different SQL dialects.

When I found psqlparse I decided to try implementing a PostgreSQL focused tool: at the beginning it’s been easier than I feared, but I quickly hit some shortcomings in that implementation, so I opted for writing my own solution restarting from scratch, with the following goals:

  • target only Python 3.4+

  • target PostgreSQL 10, taking advantage of a work-in-progress branch of the libpg_query library

  • use a more dynamic approach to represent the parse tree, with a twofold advantage:

    1. it is much less boring to code, because there’s no need to write one Python class for each PostgreSQL node tag

    2. the representation is version agnostic, it can be adapted to newer/older Elephants in a snap

  • allow exploration of parse tree in both directions, because I realized that some kinds of nodes require that knowledge to determine their textual representation

  • avoid introducing arbitrary renames of tags and attributes, so what you read in PostgreSQL documentation/sources[*] is available without the hassle of guessing how a symbol has been mapped

  • use a zero copy approach, keeping the original parse tree returned from the underlying libpg_query functions and have each node just borrow a reference to its own subtree

Introduction

At the lower level the module exposes two libpg_query functions, parse_sql() and parse_plpgsql(), that take respectively an SQL statement and a PLpgSQL statement and return a parse tree as a hierarchy of Python dictionaries, lists and scalar values. In some cases these scalars correspond to some C typedef enums, that are automatically extracted from the PostgreSQL headers and are available as pg_query.enums.

At a higher level that tree is represented by three Python classes, a Node that represents a single node, a List that wraps a sequence of nodes and a Scalar for plain values such a strings, integers, booleans or none.

Every node is identified by a tag, a string label that characterize its content that is exposed as a set of attributes as well as with a dictionary-like interface (technically they implements both a __getattr__ method and a __getitem__ method). When asked for an attribute, the node returns an instance of the base classes, i.e. another Node, or a List or a Scalar, depending on the data type of that item. When the node does not contain the requested attribute it returns a singleton Missing marker instance.

A List wraps a plain Python list and may contains a sequence of Node instances, or in some cases other sub-lists, that can be accessed with the usual syntax, or iterated.

Finally, a Scalar carries a single value of some type, accessible through its value attribute.

On top of that, the module implements two serializations, one that transforms a Node into a raw textual representation and another that returns a prettified representation. The latter is exposed by the __main__ entry point of the package, see below for an example.

Installation

As usual, the easiest way is with pip:

$ pip install pg_query

Alternatively you can clone the repository:

$ git clone https://github.com/lelit/pg_query.git --recursive

and install from there:

$ pip install ./pg_query

Development

There is a set of makefiles implementing the most common operations, a make help will show a brief table of contents. A comprehensive test suite, based on pytest, covers 98% of the source lines.

Examples of usage

  • Parse an SQL statement and get its AST root node:

    >>> from pg_query import Node, parse_sql
    >>> root = Node(parse_sql('SELECT foo FROM bar'))
    >>> print(root)
    None=[1*{RawStmt}]
  • Recursively traverse the parse tree:

    >>> for node in root.traverse():
    ...   print(node)
    ...
    None[0]={RawStmt}
    stmt={SelectStmt}
    fromClause[0]={RangeVar}
    inh=<True>
    location=<16>
    relname=<'bar'>
    relpersistence=<'p'>
    op=<0>
    targetList[0]={ResTarget}
    location=<7>
    val={ColumnRef}
    fields[0]={String}
    str=<'foo'>
    location=<7>

    As you can see, the representation of each value is mnemonic: {some_tag} means a Node with tag some_tag, [X*{some_tag}] is a List containing X nodes of that particular kind[] and <value> is a Scalar.

  • Get a particular node:

    >>> from_clause = root[0].stmt.fromClause
    >>> print(from_clause)
    fromClause=[1*{RangeVar}]
  • Obtain some information about a node:

    >>> range_var = from_clause[0]
    >>> print(range_var.node_tag)
    RangeVar
    >>> print(range_var.attribute_names)
    dict_keys(['relname', 'inh', 'relpersistence', 'location'])
    >>> print(range_var.parent_node)
    stmt={SelectStmt}
  • Iterate over nodes:

    >>> for a in from_clause:
    ...     print(a)
    ...     for b in a:
    ...         print(b)
    ...
    fromClause[0]={RangeVar}
    inh=<True>
    location=<16>
    relname=<'bar'>
    relpersistence=<'p'>
  • Reformat a SQL statement[] from the command line:

    $ echo "select a,b,c from sometable" | python -m pg_query
    SELECT a
         , b
         , c
    FROM sometable
    
    $ echo 'update "table" set value=123 where value is null' | python -m pg_query
    UPDATE "table"
    SET value = 123
    WHERE value IS NULL
  • Programmatically reformat a SQL statement:

    >>> from pg_query import prettify
    >>> print(prettify('delete from sometable where value is null'))
    DELETE FROM sometable
    WHERE value IS NULL

Documentation

Latest documentation is hosted by Read the Docs at http://pg-query.readthedocs.io/en/latest/

Changes

0.16 (2017-10-31)

  • Update libpg_query to 10-1.0.0

0.15 (2017-10-12)

  • Fix indentation of boolean expressions in SELECT’s targets (issue #3)

0.14 (2017-10-09)

  • Update to latest libpg_query’s 10-latest branch, targeting PostgreSQL 10.0 final

0.13 (2017-09-17)

  • Fix representation of subselects requiring surrounding parens

0.12 (2017-08-22)

  • New option --version on the command line tool

  • Better enums documentation

  • Release the GIL while calling libpg_query functions

0.11 (2017-08-11)

  • Nicer indentation for JOINs, making OUTER JOINs stand out

  • Minor tweaks to lists rendering, with less spurious whitespaces

  • New option --no-location on the command line tool

0.10 (2017-08-11)

  • Support Python 3.4 and Python 3.5 as well as Python 3.6

0.9 (2017-08-10)

  • Fix spacing before the $ character

  • Handle type modifiers

  • New option --plpgsql on the command line tool, just for fun

0.8 (2017-08-10)

  • Add enums subpackages to the documentation with references to their related headers

  • New compact_lists_margin option to produce a more compact representation when possible (see issue #1)

0.7 (2017-08-10)

  • Fix sdist including the Sphinx documentation

0.6 (2017-08-10)

  • New option --parse-tree on the command line tool to show just the parse tree

  • Sphinx documentation, available online

0.5 (2017-08-09)

  • Handle some more cases when a name must be double-quoted

  • Complete the serialization of the WindowDef node, handling its frame options

0.4 (2017-08-09)

  • Expose the actual PostgreSQL version the underlying libpg_query libray is built on thru a new get_postgresql_version() function

  • New option safety_belt for the prettify() function, to protect the innocents

  • Handle serialization of CoalesceExpr and MinMaxExpr

0.3 (2017-08-07)

  • Handle serialization of ParamRef nodes

  • Expose a prettify() helper function

0.2 (2017-08-07)

  • Test coverage at 99%

  • First attempt at automatic wheel upload to PyPI, let’s see…

0.1 (2017-08-07)

  • First release (“Hi daddy!”, as my soul would tag it)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pg_query-0.16.tar.gz (1.5 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

pg_query-0.16-cp36-cp36m-manylinux1_x86_64.whl (1.0 MB view details)

Uploaded CPython 3.6m

pg_query-0.16-cp36-cp36m-manylinux1_i686.whl (1.0 MB view details)

Uploaded CPython 3.6m

pg_query-0.16-cp35-cp35m-manylinux1_x86_64.whl (1.0 MB view details)

Uploaded CPython 3.5m

pg_query-0.16-cp35-cp35m-manylinux1_i686.whl (1.0 MB view details)

Uploaded CPython 3.5m

pg_query-0.16-cp34-cp34m-manylinux1_x86_64.whl (1.0 MB view details)

Uploaded CPython 3.4m

pg_query-0.16-cp34-cp34m-manylinux1_i686.whl (1.0 MB view details)

Uploaded CPython 3.4m

File details

Details for the file pg_query-0.16.tar.gz.

File metadata

  • Download URL: pg_query-0.16.tar.gz
  • Upload date:
  • Size: 1.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for pg_query-0.16.tar.gz
Algorithm Hash digest
SHA256 6ea1877c47daef940446914078955c42a5db6873ef5a8909678e8886687c6e4f
MD5 ff6abad837eb9be75e6497b460d2d4be
BLAKE2b-256 4d8dafd1f5aba7e850ed503605d7a24bf50f99f0d7f76c605334eed13bc32e43

See more details on using hashes here.

File details

Details for the file pg_query-0.16-cp36-cp36m-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for pg_query-0.16-cp36-cp36m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 81dbde8a21ae0137f51abdc777bf25ca683e3f4c30d51568e61210e585efab76
MD5 b37a3fdd06aa3972204fffea33f55071
BLAKE2b-256 9cc0b6b900be26ea6e3d9e179289362b34e0506ef632aaa2eff22d38a99cbb59

See more details on using hashes here.

File details

Details for the file pg_query-0.16-cp36-cp36m-manylinux1_i686.whl.

File metadata

File hashes

Hashes for pg_query-0.16-cp36-cp36m-manylinux1_i686.whl
Algorithm Hash digest
SHA256 8e5ba4c2fe46078c1bc9ca9f74d38fedd276de48aa179083f6a9aa4771f2f402
MD5 0b9ce40c753fa6aefe525a34f1b0bb55
BLAKE2b-256 6ef5c84416466627d22ce1fb149c83109a468d8dba19e47f585564cb416615c0

See more details on using hashes here.

File details

Details for the file pg_query-0.16-cp35-cp35m-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for pg_query-0.16-cp35-cp35m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 606514fdb6bd31933fcf016b8b4a0be00d4037b6f6ab49ba0013a12b64ff858c
MD5 f07be6629370fd3f2c33f4bb13be0841
BLAKE2b-256 424661594beb767bcfe12635588692ae8109d6b6af63cf0de8f70213c8867314

See more details on using hashes here.

File details

Details for the file pg_query-0.16-cp35-cp35m-manylinux1_i686.whl.

File metadata

File hashes

Hashes for pg_query-0.16-cp35-cp35m-manylinux1_i686.whl
Algorithm Hash digest
SHA256 e83753c66016c362297ebcc64883d60bb3686e6b2a2a3a87f5bb0a26e789c360
MD5 ff3d0bbf74474440115e067e57128bbe
BLAKE2b-256 b4b68a5d6b3d10dbbb224f5a79520d1816b90fc979abf10864ec77f41b1a727e

See more details on using hashes here.

File details

Details for the file pg_query-0.16-cp34-cp34m-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for pg_query-0.16-cp34-cp34m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 a60576a8d19d6bc58843565f4f52f016737cefb84e250f1f2ab461c65eaec74e
MD5 0ff8f115358b55f1ea9445d55ba07cf8
BLAKE2b-256 26e939841e3bc5b04c08489d0d745da33771bacf4d88d73bd953d83dbef8ff24

See more details on using hashes here.

File details

Details for the file pg_query-0.16-cp34-cp34m-manylinux1_i686.whl.

File metadata

File hashes

Hashes for pg_query-0.16-cp34-cp34m-manylinux1_i686.whl
Algorithm Hash digest
SHA256 9d198f617c43a8ecf59b8bcfe1125bbe156d7b0c9b0bdc2b1a575094568e0a9f
MD5 e3785047c8390a71b5a8ae6ba3b384b8
BLAKE2b-256 53bbff779747eb0da34d497cfbd01a6955274e2d7c54c5dd9fba52ea80622755

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page