Skip to main content

A tool to parse and analyze the structure for Postgres sql queries.

Project description

This is a Python package that parses a given sql query, matches the column and tables within your given metastore, and analyzes the query to generate a list of referenced columns within the metastore.

Quick Start

$ pip install sqlanalyzer

Example Usage

1. Format a query to follow the ANSI standards for SQL:

>>> from sqlanalyzer import column_parser
>>> query = """SELECT api.name, acct.customer_tier_c, acct.name FROM api_requests_by_account api
... LEFT JOIN accounts 
... acct ON api.user_id = acct.customer_api_id
... """
>>> formatter = column_parser.Parser(query)
>>> formatted = formatter.format_query(query)
>>> print(formatted)
SELECT api.name,
       acct.customer_tier_c,
       acct.name
FROM api_requests_by_account api
LEFT JOIN accounts acct ON api.user_id = acct.customer_api_id

2. Separate CTE's and extract alias names and queries:

>>> query = """WITH a AS
...   (SELECT DISTINCT anonymous_id,
...                    user_id
...    FROM customer_data.segment_identifies
...    WHERE dt >= '2018-07-01'),
...      b AS
...   (SELECT id,
...           email,
...           created
...    FROM customer_data.accounts)
... SELECT a.*,
...        b.*
... FROM a
... LEFT JOIN b ON a.user_id = b.id
... WHERE context_campaign_name IS NOT NULL
... """
>>> formatter = column_parser.Parser(query)
>>> cte_query = formatter.parse_cte(query)
>>> cte_query
{'a': "SELECT DISTINCT anonymous_id,\n                   user_id\n   FROM customer_data.segment_identifies\n   WHERE dt >= '2018-07-01'",
'b': 'SELECT id,\n          email,\n          created\n   FROM customer_data.accounts', 
'main_query': 'SELECT a.*,\n       b.*\nFROM a\nLEFT JOIN b ON a.user_id = b.id\nWHERE context_campaign_name IS NOT NULL\n'}
>>> cte_query.keys()
dict_keys(['a', 'b', 'main_query'])

3. Match table aliases with the actual database name:

>>> query = """SELECT *
... FROM api_requests.requests_by_account m
... INNER JOIN mapbox_customer_data.styles s ON m.metadata_version = s.id
... LEFT JOIN sfdc.users u ON m.csm = u.id
... """
>>> formatter = column_parser.Parser(query)
>>> formatted = formatter.format_query(query)
>>> table_alias_mapping = formatter.get_table_names(formatted.split('\n'))
>>> table_alias_mapping
{'m': 'api_requests.requests_by_account', 
's': 'mapbox_customer_data.styles', 
'u': 'sfdc.users'}

4. Analyze and parse complex query with subqueries, Common Table Expressions and a mix of the two types.

a) Parse multiple and deeply (3+ levels) nested subqueries:

>>> from sqlanalyzer import query_analyzer
>>> query = """SELECT *
... FROM
...   (SELECT a.*,
...           b.*
...    FROM
...      (SELECT DISTINCT anonymous_id,
...                       user_id
...       FROM customer_data.segment_identifies
...       WHERE dt >= '2018-07-01') a
...    LEFT JOIN
...      (SELECT id,
...              email,
...              created
...       FROM customer_data.accounts) b ON a.user_id = b.id
...    WHERE context_campaign_name IS NOT NULL )
... """
>>> analyzer = query_analyzer.Analyzer(query)
>>> analyzer.parse_query(query)
[{'level_1_main': 'SELECT * FROM no alias '}, 
{'level_2_main': 'SELECT a.*,        b.* WHERE context_campaign_name IS NOT NULL FROM a LEFT JOIN b ON a.user_id = b.id '}, 
{'a': "SELECT DISTINCT anonymous_id, user_id FROM customer_data.segment_identifies WHERE dt >= '2018-07-01'"}, 
{'b': 'SELECT id, email, created FROM customer_data.accounts'}]

b) Parse Common Table Expressions (CTE's):

>>> query = """WITH a AS
...   (SELECT DISTINCT anonymous_id,
...                    user_id
...    FROM customer_data.segment_identifies
...    WHERE dt >= '2018-07-01'),
...      b AS
...   (SELECT id,
...           email,
...           created
...    FROM customer_data.accounts)
... SELECT a.*,
...        b.*
... FROM a
... LEFT JOIN b ON a.user_id = b.id
... WHERE context_campaign_name IS NOT NULL
... """
>>> analyzer = query_analyzer.Analyzer(query)
>>> analyzer.parse_query(query)
[{'a': "SELECT DISTINCT anonymous_id,\n                   user_id\n   FROM customer_data.segment_identifies\n   WHERE dt >= '2018-07-01'"}, 
{'b': 'SELECT id,\n          email,\n          created\n   FROM customer_data.accounts'}, 
{'main_query': 'SELECT a.*,\n       b.*\nFROM a\nLEFT JOIN b ON a.user_id = b.id\nWHERE context_campaign_name IS NOT NULL'}]

c) Parse mixed type of nested queries and CTE's:

>>> query = """SELECT email,
...        COUNT(DISTINCT context_campaign_name)
... FROM
...   (WITH a AS
...      (SELECT DISTINCT anonymous_id,
...                       user_id
...       FROM customer_data.segment_identifies
...       WHERE dt >= '2018-07-01'),
...         b AS
...      (SELECT id,
...              email,
...              created
...       FROM customer_data.accounts) SELECT a.*,
...                                           b.*
...    FROM a
...    LEFT JOIN b ON a.user_id = b.id
...    WHERE context_campaign_name IS NOT NULL )
... WHERE user_id IN ('123',
...                   '234',
...                   '345')
... GROUP BY 1
... ORDER BY 2 DESC
... LIMIT 200
... """
>>> analyzer = query_analyzer.Analyzer(query)
>>> analyzer.parse_query(query)
[{'level_1_main': "SELECT email,        COUNT(DISTINCT context_campaign_name) WHERE user_id IN ('123',                   '234',                   '345') FROM no alias "}, 
{'no alias': [{'a': "SELECT DISTINCT anonymous_id,\n                   user_id\n   FROM customer_data.segment_identifies\n   WHERE dt >= '2018-07-01'"}, 
{'b': 'SELECT id,\n          email,\n          created\n   FROM customer_data.accounts'}, 
{'main_query': 'SELECT a.*,\n       b.*\nFROM a\nLEFT JOIN b ON a.user_id = b.id\nWHERE context_campaign_name IS NOT NULL'}]}]

Notes:

Upload instructions python3 -m pip install --user --upgrade setuptools wheel twine python3 setup.py sdist bdist_wheel twine check dist/* twine upload dist/*

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sqlanalyzer-0.5.47.tar.gz (9.7 kB view details)

Uploaded Source

Built Distribution

sqlanalyzer-0.5.47-py3-none-any.whl (13.8 kB view details)

Uploaded Python 3

File details

Details for the file sqlanalyzer-0.5.47.tar.gz.

File metadata

  • Download URL: sqlanalyzer-0.5.47.tar.gz
  • Upload date:
  • Size: 9.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/49.2.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.4

File hashes

Hashes for sqlanalyzer-0.5.47.tar.gz
Algorithm Hash digest
SHA256 de6b3559237fc0578b2e228cb7209ad729f62f0f1e6905b7e8596cfdcaca3171
MD5 4fd6c5beec451860be5003e6ed94f379
BLAKE2b-256 3f804467fb003a6a0102136206a012e8f62dbb6b7316833117b6ea17cdfbdab5

See more details on using hashes here.

File details

Details for the file sqlanalyzer-0.5.47-py3-none-any.whl.

File metadata

  • Download URL: sqlanalyzer-0.5.47-py3-none-any.whl
  • Upload date:
  • Size: 13.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/49.2.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.4

File hashes

Hashes for sqlanalyzer-0.5.47-py3-none-any.whl
Algorithm Hash digest
SHA256 0ea1185e452fbc0e2090452eb96310a0298b891dc7d784847316a2deaa210eb4
MD5 a4f0ab45d7dc85e92dff3655acecd2a1
BLAKE2b-256 161ef2d09fe575c36990f80c1b7681305f28dd50b15ab7a08b179851631964cb

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page