Skip to main content

Expose libpg_query PostgreSQL parser with Pythonic mood

Project description

author:

Lele Gaifax

contact:

lele@metapensiero.it

license:

GNU General Public License version 3 or later

This is a Python 3.6+ implementation of a wrapper to libpg_query, a C library that repackages the PostgreSQL languages parser as a standalone static library.

It is similar to the more mature psqlparse, but has different goals:

  • it targets only Python 3 (more precisely, only 3.6 or higher, mainly because I’m lazy and the enums extraction code uses the auto() helper of the standard library enum module)

  • it aims to target PostgreSQL 10 (in beta 2 as I’m writing this), taking advantage of a work-in-progress branch of the libpg_query library

  • it uses a more dynamic approach to represent the parse tree; this has a twofold advantage:

    1. it is much less boring to code, because there’s no need to write one Python class for each PostgreSQL node tag

    2. the representation is version agnostic, it can be adapted to newer/older Elephants in a snap

  • it allows to explore the parse tree in both directions, as each node carries a reference to the node from which it stems: this is the main reason I started this project, because while implementing the pretty printing feature for psqlparse I realized that some kinds of nodes require that knowledge to determine their textual representation

  • it does not introduce arbitrary renames of tags and attributes, so what you read in PostgreSQL documentation/sources[*] is available without the hassle of guessing how a symbol has been mapped

  • not very important, but I guess it is much more performant, as basically it does not duplicates the original parse tree into every instance

Introduction

At the lower level the module exposes two libpg_query functions, parse_sql() and parse_plpgsql(), that take respectively an SQL statement and a PLpgSQL statement and return a parse tree as a hierarchy of Python dictionaries, lists and scalar values. In some cases these scalars correspond to some C typedef enums, that are automatically extracted from the PostgreSQL headers and are available as pg_query.enums.

At a higher level that tree is represented by three Python classes, a Node that represents a single node, a List that wraps a sequence of nodes and a Scalar for plain values such a strings, integers, booleans or none.

Every node is identified by a tag, a string label that characterize its content that is exposed as a set of attributes as well as with a dictionary-like interface (technically they implements both a __getattr__ method and a __getitem__ method). When asked for an attribute, the node returns an instance of the base classes, i.e. another Node, or a List or a Scalar, depending on the data type of that item. When the node does not contain the requested attribute it returns a singleton Missing marker instance.

A List wraps a plain Python list and may contains a sequence of Node instances, or in some cases other sub-lists, that can be accessed with the usual syntax, or iterated.

Finally, a Scalar carries a single value of some type, accessible through its value attribute.

On top of that, the module implements two serializations, one that transforms a Node into a raw textual representation and another that returns a prettified representation. The latter is exposed by the __main__ entry point of the package, see below for an example.

Installation

Once this is released to PyPI, the following command will do:

$ pip install pg_query

Until then, you have to clone the repository:

$ git clone https://github.com/lelit/pg_query.git --recursive

and install from there:

$ pip install ./pg_query

Development

There is a set of makefiles implementing the most common operations, a make help will show a brief table of contents. A comprehensive test suite, based on pytest, covers 98% of the source lines.

Examples of usage

  • Parse an SQL statement and get its AST root node:

    >>> from pg_query import Node, parse_sql
    >>> root = Node(parse_sql('SELECT foo FROM bar'))
    >>> print(root)
    None=[1*{RawStmt}]
  • Recursively traverse the parse tree:

    >>> for node in root.traverse():
    ...   print(node)
    ...
    None[0]={RawStmt}
    stmt={SelectStmt}
    fromClause[0]={RangeVar}
    inh=<True>
    location=<16>
    relname=<'bar'>
    relpersistence=<'p'>
    op=<0>
    targetList[0]={ResTarget}
    location=<7>
    val={ColumnRef}
    fields[0]={String}
    str=<'foo'>
    location=<7>

    As you can see, the representation of each value is mnemonic: {some_tag} means a Node with tag some_tag, [X*{some_tag}] is a List containing X nodes of that particular kind[] and <value> is a Scalar.

  • Get a particular node:

    >>> from_clause = root[0].stmt.fromClause
    >>> print(from_clause)
    fromClause=[1*{RangeVar}]
  • Obtain some information about a node:

    >>> range_var = from_clause[0]
    >>> print(range_var.node_tag)
    RangeVar
    >>> print(range_var.attribute_names)
    dict_keys(['relname', 'inh', 'relpersistence', 'location'])
    >>> print(range_var.parent_node)
    stmt={SelectStmt}
  • Iterate over nodes:

    >>> for a in from_clause:
    ...     print(a)
    ...     for b in a:
    ...         print(b)
    ...
    fromClause[0]={RangeVar}
    inh=<True>
    location=<16>
    relname=<'bar'>
    relpersistence=<'p'>
  • Reformat a SQL statement[] from the command line:

    $ echo "select a,b,c from sometable" | python -m pg_query
    SELECT a
         , b
         , c
    FROM sometable
    
    $ echo 'update "table" set value=123 where value is null' | python -m pg_query
    UPDATE "table"
    SET value = 123
    WHERE value IS NULL
  • Programmatically reformat a SQL statement:

    >>> from pg_query import prettify
    >>> print(prettify('delete from sometable where value is null'))
    DELETE FROM sometable
    WHERE value IS NULL

Changes

0.4 (2017-08-09)

  • Expose the actual PostgreSQL version the underlying libpg_query libray is built on thru a new get_postgresql_version() function

  • New option safety_belt for the prettify() function, to protect the innocents

  • Handle serialization of CoalesceExpr and MinMaxExpr

0.3 (2017-08-07)

  • Handle serialization of ParamRef nodes

  • Expose a prettify() helper function

0.2 (2017-08-07)

  • Test coverage at 99%

  • First attempt at automatic wheel upload to PyPI, let’s see…

0.1 (2017-08-07)

  • First release (“Hi daddy!”, as my soul would tag it)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pg_query-0.4.tar.gz (1.4 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

pg_query-0.4-cp36-cp36m-manylinux1_x86_64.whl (1.0 MB view details)

Uploaded CPython 3.6m

pg_query-0.4-cp36-cp36m-manylinux1_i686.whl (1.0 MB view details)

Uploaded CPython 3.6m

File details

Details for the file pg_query-0.4.tar.gz.

File metadata

  • Download URL: pg_query-0.4.tar.gz
  • Upload date:
  • Size: 1.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for pg_query-0.4.tar.gz
Algorithm Hash digest
SHA256 83836045510b65fee2d506f0d24ef1b0333e652c7973e43048977cf4e57b3fbb
MD5 e71d5a9b465dd55fc08c8fea23b0340f
BLAKE2b-256 a6d4503f0e3c324b15e737a207090a8b06fac5cb367700ba805d9337b2e8ce12

See more details on using hashes here.

File details

Details for the file pg_query-0.4-cp36-cp36m-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for pg_query-0.4-cp36-cp36m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 07e8299358d2cead4eac59fd23e86118bcce490e3d6231723e247ad3344b3b0e
MD5 8eb8bfc5a68c29bf11467276cd05fb45
BLAKE2b-256 50ccc8c6513700f75046eddba02e06ba18df4140fd86ccd093e1798a143ff711

See more details on using hashes here.

File details

Details for the file pg_query-0.4-cp36-cp36m-manylinux1_i686.whl.

File metadata

File hashes

Hashes for pg_query-0.4-cp36-cp36m-manylinux1_i686.whl
Algorithm Hash digest
SHA256 88678d2d432d8dc97784f9a7abe310b966784bf40a8c2a0a30166ea34c7fbcee
MD5 90c08cd36bd52fd44f9ad8aa45ca9bb0
BLAKE2b-256 f518753187cf3a0ca37a272ec6492428135a6cfe24739a89ec704038ebf5a47a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page