Skip to main content

A formatter for Python code and SparkSQL queries.

Project description

pyspark-sql-formatter

A formatter for Pyspark code with SQL queries. It relies on Python formatter yapf and SparkSQL formatter sparksqlformatter, both working indepdendently. User can specify configurations for either formatter separately.

The queries should be in the form spark.sql(query) or spark.sql('xxx'). Cases like spark.sql('xxx'.format()), spark.sql('xxx'.replace()) may raise Exceptions.

Installation

Install using pip

pip install pysqlformatter

Install from source

  1. Download source code.
  2. Navigate to the source code directory.
  3. Do python setup.py install or pip install ..

Compatibility

Supports Python 2.7 and 3.6+.

Usage

pysqlformatter can be used as either a command-line tool or a Python library.

Use as command-line tool

usage: pysqlformatter [-h] [-f FILES [FILES ...]] [-i] [--query-names QUERY_NAMES [QUERY_NAMES ...]] [--python-style PYTHON_STYLE] [--sparksql-style SPARKSQL_CONFIG]

Formatter for Pyspark code and SparkSQL queries.

optional arguments:
  -h, --help            show this help message and exit
  -f FILES [FILES ...], --files FILES [FILES ...]
                        Paths to files to format.
  -i, --in-place        Format the files in place.
  --python-style PYTHON_STYLE
                        Style for Python formatting, interface to https://github.com/google/yapf.
  --sparksql-style SPARKSQL_CONFIG
                        Style for SparkSQL formatting, interface to https://github.com/largecats/sparksql-formatter.
  --query-names QUERY_NAMES [QUERY_NAMES ...]
                        String variables with names containing these strings will be formatted as SQL queries. Default to 'query'.

E.g.,

$ pysqlformatter -f <path_to_file> --python-style='pep8' --sparksql-style="{'reservedKeywordUppercase': False}" --query-names query

Or using config files:

$ pysqlformatter -f <path_to_file> --python-style="<path_to_python_style_config_file>" --sparksql-style="<path_to_sparksql_config_file>" --query-names query

Use as Python library

Call pysqlformatter.api.format_script() to format script passed as string:

>>> from pysqlformatter import api
>>> script = '''query = 'select * from t0'\nspark.sql(query)'''
>>> api.format_script(script=script, pythonStyle='pep8', sparksqlConfig=sparksqlConfig(), queryNames=['query'])
"query = '''\nSELECT\n    *\nFROM\n    t0\n'''\nspark.sql(query)\n"

Call pysqlformatter.api.format_file() to format script in file:

>>> from pysqlformatter import api
>>> api.format_file(filePath=<path_to_file>, pythonStyle='pep8', sparksqlConfig=sparksqlConfig(), queryNames=['query'], inPlace=False)
...

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pysqlformatter-0.0.6.tar.gz (10.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pysqlformatter-0.0.6-py3-none-any.whl (13.5 kB view details)

Uploaded Python 3

File details

Details for the file pysqlformatter-0.0.6.tar.gz.

File metadata

  • Download URL: pysqlformatter-0.0.6.tar.gz
  • Upload date:
  • Size: 10.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.5.0 pkginfo/1.5.0.1 requests/2.24.0 requests-toolbelt/0.9.1 tqdm/4.47.0 CPython/3.8.3

File hashes

Hashes for pysqlformatter-0.0.6.tar.gz
Algorithm Hash digest
SHA256 b5053688b476ed31eafb95a4efbe57f5ab23937715bdf11fd80eedd9f92fa232
MD5 ac7da37cc339cd2424a665dc1f79249d
BLAKE2b-256 5aa5cc43fdb27826603ad8dea94da4a7936bdefc6d09cbd5b5a30fe78d964929

See more details on using hashes here.

File details

Details for the file pysqlformatter-0.0.6-py3-none-any.whl.

File metadata

  • Download URL: pysqlformatter-0.0.6-py3-none-any.whl
  • Upload date:
  • Size: 13.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.5.0 pkginfo/1.5.0.1 requests/2.24.0 requests-toolbelt/0.9.1 tqdm/4.47.0 CPython/3.8.3

File hashes

Hashes for pysqlformatter-0.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 2ff2ddf4145ccdeff02d90593159a6264e88ffd3aa95fe961e99015d7a7e1eee
MD5 982bd7a1f7d1dfbbc7c1c3ae0e1db8c6
BLAKE2b-256 e97ed42e296567fdf043f4f29570bc2a5b120f69e83fe579c6b6e163fa15f553

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page