sqlfmt is an opinionated CLI tool that formats your sql files
Project description
sqlfmt
sqlfmt formats your dbt SQL files so you don't have to. It is similar in nature to black, gofmt, and rustfmt (but for SQL). When you use sqlfmt:
- You never have to mention (or argue about) code style in code reviews again
- You make it easier to collaborate and solicit contributions from new people
- You will be able to read your team's code as if you wrote it
- You can forget about formatting your code, and spend your time on business logic instead
sqlfmt is not configurable, except for line length. It enforces a single style. sqlfmt maintains comments and some extra newlines, but largely ignores all indentation and line breaks in the input file.
sqlfmt is not a linter. It does not parse your code into an AST; it just lexes it and tracks a small subset of tokens that impact formatting. This lets us "do one thing and do it well:" sqlfmt is very fast, and easier to maintain and extend than linters that need a full SQL grammar.
sqlfmt is designed to work with templated SQL files that contain jinja tags and blocks. It formats the code that users look at, and therefore doesn't need to know anything about what happens after the templates are rendered.
For now, sqlfmt only works on select
statements (which is all you need if you use sqlfmt with a dbt project). In the future, it will be extended to DDL statements, as well.
Installation
Try it first
Want to test out sqlfmt on a query before you install it? Go to sqlfmt.com to use the interactive, web-based version.
Prerequisites
You will need Python 3.7-3.10 installed. You should use pipx or install into a virtual environment (maybe as a dev-dependency in your project). If you do not know how to install python and use pipx and/or virtual environments, go read about that first.
Install Using pipx (recommended)
pipx install shandy-sqlfmt
To install with the jinjafmt extra (which will also install the Python code formatter, black):
pipx install shandy-sqlfmt[jinjafmt]
Other Installation Options
You should use a virtual environment to isolate sqlfmt's dependencies from others on your system. We recommend poetry (poetry add -D shandy-sqlfmt[jinjafmt]
or poetry add -D shandy-sqlfmt
), or pipenv (pipenv install -d shandy-sqlfmt[jinjafmt]
, etc.), but a simple pip install shandy-sqlfmt
will also work.
Getting Started
Other prerequisites
sqlfmt is an alpha product and will not always produce the formatted output you might want. It might even break your SQL syntax. It is highly recommended to only run sqlfmt on files in a version control system (like git), so that it is easy for you to revert any changes made by sqlfmt. On your first run, be sure to make a commit before running sqlfmt.
Using sqlfmt
sqlfmt is a command-line tool. It works on any posix terminal and on Windows Powershell. If you have used black
, the sqlfmt commands will look familiar.
Note: The
$
in the code snippets below indicate that these are commands that can be typed into your terminal. The$
is the prompt symbol in bash; it is equivalent to>
on Windows or%
on zsh. Do not include the$
in the command you type into your terminal
To list commands and options:
$ sqlfmt --help
If you want to format all .sql
and .sql.jinja
files in your current working directory (and all nested directories), simply type:
$ sqlfmt .
You can also supply a path to a one or more files or directories as arguments:
$ sqlfmt /path/to/my/dir /path/to/a/file.sql
If you don't want to format the files you have on disk, you can run sqlfmt with the --check
option. sqlfmt will exit with code 1 if the files on disk are not properly formatted:
$ sqlfmt --check .
$ sqlfmt --check path/to/my/dir
If you want to print a diff of changes that sqlfmt would make to format a file (but not update the file on disk), you can use the --diff
option. --diff
also exits with 1 on changes:
$ sqlfmt --diff .
Disabling sqlfmt
If you would like sqlfmt to ignore a file, or part of a file, you can add -- fmt: off
and -- fmt: on
comments to your code (or # fmt: off
on MySQL or BigQuery). sqlfmt will not change any code between those comments; a single -- fmt: off
at the top of a file will keep the entire file intact.
The jinjafmt extra
sqlfmt loves properly-formatted jinja, too.
sqlfmt will safely attempt to import the Python code formatter, black. If it is successful (either because sqlfmt was installed with the jinjafmt extra or because black was installed separately in the same environment), it will use black to format the contents of jinja tags. If you do not want sqlfmt to use black to format your jinja, then specify the --no-jinjafmt
flag when running sqlfmt.
Installing sqlfmt with the jinjafmt extra will also install black. You can do this with pipx install sqlfmt[jinjafmt]
If you want to pin a specific black version, you should specify that separately, as a direct dependency of your project (in your Pipfile, pyproject.toml, etc.).
If sqlfmt was installed without the jinjafmt extra, and black is not otherwise installed, then sqlfmt will not attempt to format the contents of jinja tags, except for enforcing a single space inside each curly.
Configuring sqlfmt using pyproject.toml
Any command-line option for sqlfmt can also be set in a pyproject.toml
file, under a [tool.sqlfmt]
section header. Options passed at the command line will override the settings in the config file.
sqlfmt will search for the pyproject.toml
file using the files
passed to it as arguments. It starts in the lowest (most specific) common parent directory to all the files
and recurses up to the root directory. It will load settings from the first pyproject.toml
file it finds in this search.
Example of a pyproject.toml
file to override the default behaviors (run sqlfmt --help
for more options):
[tool.sqlfmt]
line_length = 100
check = true
Using sqlfmt with different SQL dialects
sqlfmt's rules are simple, which means it does not have to parse every single token in your query. This allows nearly all SQL dialects to be formatted using sqlfmt's default "polyglot" dialect.
ClickHouse
The exception to this is ClickHouse, which is case-sensitive where other dialects are not. By default, sqlfmt will lowercase all SQL keywords, database identifiers, aliases, etc. (basically anything that isn't quoted). This is bad for ClickHouse. To prevent the lowercasing of function names, database identifiers, and aliases, use the --dialect clickhouse
option when running sqlfmt. For example,
$ sqlfmt . --dialect clickhouse
This can also be configured using the pyproject.toml
file:
[tool.sqlfmt]
dialect = "clickhouse"
Note that with this option, sqlfmt will not lowercase most non-reserved keywords, even common ones like sum
or count
. See (and please join) this issue for a discussion of this topic.
Using sqlfmt with dbt
sqlfmt was built for dbt, so only minimal configuration is required. We recommend excluding your target
and dbt_packages
directories from formatting. You can do this with the command-line --exclude
option, or by setting exclude
in your pyproject.toml
file:
[tool.sqlfmt]
exclude=["target/**/*", "dbt_packages/**/*"]
Using sqlfmt with pre-commit
You can configure pre-commit to run sqlfmt on your repository before you commit changes.
Add the following config to your .pre-commit-config.yaml
file:
repos:
- repo: https://github.com/tconbeer/sqlfmt
rev: v0.10.0
hooks:
- id: sqlfmt
language_version: python
You should replace rev
with the latest available release, but we do suggest pinning to a specific rev, to avoid unexpected formatting changes.
Using sqlfmt with SQLFluff
You can (and should!) use SQLFluff to lint your SQL queries after they are formatted by sqlfmt. However, the two tools do not see eye-to-eye on formatting (by default), so to avoid lint errors, add the following to your .sqlfluff
config file:
[sqlfluff]
exclude_rules = L003, L018, L036
[sqlfluff:rules]
max_line_length = 88 # or whatever you set in sqlfmt
capitalisation_policy = lower
extended_capitalisation_policy = lower
[sqlfluff:indentation]
indented_joins = False
indented_using_on = True
template_blocks_indent = False
[sqlfluff:rules:L052]
multiline_newline = True
The sqlfmt style
The only thing you can configure with sqlfmt is the desired line length of the formatted file. You can do this with the --line-length
or -l
options. The default is 88.
Given the desired line length, sqlfmt has four objectives:
- Break and indent lines to make the syntactical structure of the code apparent
- Break lines so they are shorter than the desired line length, if possible
- Combine lines to use the least possible vertical space, without violating #1 and #2
- Standardize capitalization (to lowercase) and in-line whitespace
sqlfmt borrows elements from well-accepted styles from other programming languages. It places opening brackets on the same line as preceding function names (like black for python and 1TBS for C). It indents closing brackets to the same depth as the opening bracket (this is extended to statements that must be closed, like case
and end
).
The sqlfmt style is as simple as possible, with little-to-no special-casing of formatting concerns. While at first blush, this may not create a format that is as "nice" or "expressive" as hand-crafted indentation, over time, as you grow accustomed to the style, formatting becomes transparent and the consistency will allow you to jump between files, projects, and even companies much faster.
Why lowercase?
There are several reasons that sqlfmt insists on lowercase SQL keywords:
- We believe that SQL is code (this is a surprisingly controversial statement!). Shouting-case keywords perpetuate the myth that SQL isn't "real code", or isn't "modern" and somehow is trapped in the era of low-level imperative languages: BASIC, COBOL, and FORTRAN. The reality is that SQL is an incredibly powerful, declarative, and modern language. It should look like one
- Syntax highlighting for SQL makes shouting-case keywords redundant; the syntax highlighter in any text editor is going to be more consistent than any manual shout-casing. If you have a SQL query as a string inside of a block of code in another language, you may want to capitalize your keywords; sqlfmt only operates on dedicated SQL (and templated sql) files, so this is not relevant. However, even without syntax highlighting, the hierarchical and consistent indentation provided by sqlfmt provides sufficient visual structure without shout-casing keywords
- Even among people who like shout-cased keywords, there are disagreements between what gets shout-cased. SELECT, sure, but SUM? AS? OVER? AND? All-lowercase keywords eliminates this potential source of irregularity and disagreement.
- Research shows that generally, lowercase words are more readable
Why trailing commas?
- Using trailing commas follows the convention of every other written language and programming language
- Leading commas require placing the first field name on the same line as
select
, which can obscure that field - SQL query compilation is extremely fast; the "cost" of "last field" errors is very low. Some dialects (e.g., BigQuery) even allow a trailing comma in the final field of a select statement
- Trailing commas generalize better within
select
statements (e.g.group by
andpartition by
clauses) and in other kinds of SQL statements (e.g.insert
statements)
Examples
sqlfmt will put very short queries on a single line:
SELECT a,
b,
c
FROM my_table
becomes
select a, b, c from my_table
If a query doesn't fit on a single line, sqlfmt will format the query to make its hierarchy apparent. The main keywords in a select
statement are the top nodes in hierarchy. Individual fields are indented a single level; unless all fields fit on the same line as select
, they must all be individually split onto their own lines. This is properly formatted code:
with t as (select * from my_schema."my_QUOTED_ table!")
select
a_long_field_name,
another_long_field_name,
(one_field + another_field) as c,
a_final_field
from t
where one_field < another_field
Note that the main keywords, with
, select
, from
, and where
, are indented to the same depth. If their arguments fit on a single line (as in with
, from
, and where
), they stay on that line, with the keyword. However, unless all arguments for a keyword fit on one line, they are all wrapped to their own line, and indented:
with
a_long_cte_name as (
select my_field, sum(another_field) from my_schema."my_QUOTED_ table!"
)
select
a_long_field_name,
another_long_field_name,
(one_field + another_field) as c,
a_final_field
from a_long_cte_name
where
one_field < another_field
and two_field > another_field
and three_field = another_field
Any expressions wrapped in parentheses are similarly one-lined if possible, and split if they are too long.
This hierarchical indentation scales to arbitrarily complex and nested expressions. Another example of properly formatted code (at line length of 88):
select
a,
sum(a) over () as b,
row_number() over () as c,
count(case when a is null then 1 end) over (
partition by user_id, date_trunc('year', performed_at)
) as d,
first_value(
coalesce(one_field, another_field, yet_another_field) ignore nulls
) over (
partition by user_id
order by performed_at desc
rows between unbounded preceding and unbounded following
) as e
from my_table
Want more examples? See the tests/data
directory, or go to http://sqlfmt.com to see how sqlfmt will format your queries.
Contributing
Providing Feedback
We'd love to hear from you! Open an Issue to request new features, report bad formatting, or say hello.
Setting up Your Dev Environment and Running Tests
- Install Poetry if you don't have it already. You may also need or want pyenv, make, and gcc. A complete setup from a fresh install of Ubuntu can be found here
- Clone this repo into a directory (let's call it
sqlfmt
), thencd sqlfmt
- Use
poetry install -E jinjafmt
to install the project (editable) and its dependencies into a new virtual env. To runsqlfmt_primer
, you will need to install it (and its dependencies) by specifying it as an extra:poetry install -E jinjafmt -E sqlfmt_primer
- Use
poetry shell
to spawn a subshell - Type
make
to run all tests and linters, or runpytest
,black
,flake8
,isort
, andmypy
individually.
Note: If encountering a JSONDecodeError during poetry install
, you will want to clear the poetry cache with poetry cache clear pypi --all
, or upgrade to poetry >= 1.12 with poetry self upgrade
Updating primer repos to reflect formatting changes
- Make sure all changes are committed to sqlfmt
- Check out
main
in the repo and make sure youpull
changes locally - Check out the
unformatted
tag in the repo withgit checkout -b chore/apply-abc123 unformatted
whereabc123
is the hash of the most recent sqlfmt commit (from 1) - Run sqlfmt against the working tree, then
git add .
andgit commit -m "chore: apply sqlfmt abc123"
- We will have conflicts with main that we want to ignore, so merge main into this branch, ignoring anything on main:
git merge -s ours main
- Push and open a PR; squash and merge. Grab the commit SHA
- Paste the commit SHA as a ref into
primer.py
- Run
sqlfmt_primer -k
to clear the cache, then update the stats inprimer.py
to match the results
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for shandy_sqlfmt-0.10.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | cef5111147cbaf060c416486d392672424f2446762abab40602a82663b8e8cce |
|
MD5 | 6a1657477fec410ad3896088ea2009c4 |
|
BLAKE2b-256 | 317e0c305caa3ee0ac5cd622b58242f5232fc73a23abe359478bda97b546d23f |