Modular SQL Linting for Humans
Project description
The SQL Linter for humans
Bored of not having a good SQL linter that works with whichever dialect you're working with? SQLFluff is an extensible and modular linter designed to help you write good SQL and catch errors before it hits your database. SQLFluff can auto-fix most linting errors too.
Getting Started
To get started, install the package and run sqlfluff lint
or sqlfluff fix
.
$ pip install sqlfluff
$ echo " SELECT a + b FROM tbl; " > test.sql
$ sqlfluff lint test.sql
== [test.sql] FAIL
L: 1 | P: 1 | L003 | Single indentation uses a number of spaces not a multiple of 4
L: 1 | P: 14 | L006 | Operators should be surrounded by a single space unless at the start/end of a line
L: 1 | P: 27 | L001 | Unnecessary trailing whitespace
You can also have a play using SQLFluff online.
For full CLI usage and rules reference, see the docs.
Documentation
For full documentation visit docs.sqlfluff.com.
Releases
SQLFluff is still in an open alpha phase - expect the tool to change significantly over the coming months, and expect potentially non-backward compatible api changes to happen at any point. If you'd like to help please consider contributing.
- 0.4.x (ongoing development) will fully support dbt packages and macros with significant speed improvements and new rules.
- 0.3.x dropped support for python 2.7 and 3.4, and reworked the handling of indentation linting in a potentially not backward compatible way.
- 0.2.x added templating support and a big restructure of rules and changed how users might interact with sqlfluff on templated code.
- 0.1.x involved a major re-write of the parser, completely changing the behaviour of the tool with respect to complex parsing.
SQLFluff on Slack
We have a fast-growing community on Slack, come and join us!
https://join.slack.com/t/sqlfluff/shared_invite/zt-js0wsnvo-78gEI0phW9_LiItXAiJcqQ
Contributing
There's lots to do in this project, and we're just getting started. If you want to understand more about the architecture of SQLFluff, you can find more here.
If you'd like to contribute, check out the open issues on GitHub. You can also see the guide to contributing.
Changelog
All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
[Unreleased]
Added
- Public API to enable people to import
sqlfluff
as a python module and callparse
,lint
andfix
within their own projects. See the docs for more information. (#501) - The ability to use
dbt
as a templating engine directly allowing richer and more accurate linting arounddbt
macros (and packages related todbt
). For more info see the docs. (#508) - Support for modulo (
%
) operator. (#447) - A limit in the internal fix routines to catch any infinite loops. (#494)
- Added the
.istype()
method on segments to more intelligently deal with type matching in rules when inheritance is at play. - Added the ability for the user to add their own rules when interacting
with the
Linter
directly usinguser_rules
. - Added L034 'Fields should be stated before aggregates / window functions' per dbt coding convenventions (#495)
- Templating tags, such as
{{ variables }}
,{# comments #}
and{% loops %}
(in jinja) now have placeholders in the parsed structure. Rule L003 (indentation), also now respects these placeholders so that their indentation is linted accordingly. For loop or block tags, they also generate anIndent
andDedent
tag accordingly (which can be enabled or disabled) with a configuration value so that indentation around these functions can be linted accordingly. (#541) - MyPy type linting into a large proportion of the core library. (#526, #580)
- Config values specific to a file can now be defined using a comment
line starting with
-- sqlfluff:
. (#541) - Added documentation for
--noqa:
use in rules. (#552) - Added
pre-commit
hooks forlint
andfix
. (#576) - Added a fix routine for Rule L019 (comma placement). (#575)
- Added Rule L031 to enforce "avoid using alias in the
FROM
/JOIN
clauses" from thedbt
coding conventions. (#473, #479) - Added Rule L032 to enforce "do not use
USING
" from thedbt
coding conventions. (#487) - Added Rule L033 to enforce "prefer
UNION ALL
toUNION *
" from thedbt
coding conventions. (#489) - Added Rule L034 to enforce "fields should be stated before aggregate/window functions" from the
dbt
coding conventions. (#495) - Adopted the
black
coding style. (#485) - Added validation and documentation for rule configuration options. (#462)
- Added documentation for which rules are fixable. (#594)
- Added
EPOCH
keyword for postgres dialect. (#522) - Added column index identifier in snowflake dialect. (#458)
- Added
USE
statement to the snowflake dialect. (#537) - Added
CODE_OF_CONDUCT
to the project. (#471) - Added
ISNULL
andNOTNULL
keywords to ansi dialect. (#441) - Added support for python 3.9. (#482)
- Added
requirements_dev.txt
for local testing/linting. (#500) - Added CLI option
--disregard-sqlfluffignores
to allow direct linting of files in the.sqlfluffignore
. (#486) - Added
dbt
incremental
macro. (#363) - Added links to cockroachlabs expression grammars in ansi dialect. (#592)
- Added favicon to the docs website. (#589)
Changed
- Big refactor of logging internally.
Linter
is now decoupled from logging so that it can be imported directly by subprojects without needing to worry about weird output or without the log handing getting in the way of your project. (#460) - Linting errors in the final file are now reported with their position in the source file rather than in the templated file. This means when using sqlfluff as a plugabble library within an IDE, the references match the file which is being edited. (#541)
- Created new Github Organisation (https://github.com/sqlfluff) and migrated from https://github.com/alanmcruickshank/sqlfluff to https://github.com/sqlfluff/sqlfluff. (#444)
- Changed the handling of
*
anda.b.*
expressions to have their own expressions. Any dependencies on this structure downstream will be broken. This also fixes the linting of both kinds of expressions with regard to L013 and L025. (#454) - Refactor of L022 to handle poorly formatted CTEs better. (#494)
- Internally added an
EphemeralSegment
to aid with parsing efficiency without altering the end structure of the query. (#491) - Split
ObjectReference
intoColumnReference
andTableReference
for more useful API access to the underlying structure. (#504) KeywordSegment
and the newSymbolSegment
both now inherit from_ProtoKeywordSegment
which allows symbols to match in a very similar way to keywords without later appearing with thetype
ofkeyword
. (#504)- Introduced the
Parser
class to parse a lexed query rather than relying on users to instantiate aFileSegment
directly. As a result theFileSegment
has been moved from the core parser directly into the dialects. Users can refer to it via theget_root_segment()
method of a dialect. (#510) - Several perfomance improvements through removing unused functionality, sensible caching and optimising loops within functions. (#526)
- Split up rule tests into separate
yml
files. (#553) - Allow escaped quotes in strings. (#557)
- Fixed
ESCAPE
parsing inLIKE
clause. (#566) - Fixed parsing of complex
BETWEEN
statements. (#498) - Fixed BigQuery
EXCEPT
clause parsing. (#472) - Fixed Rule L022 to respect leading comma configuration. (#455)
- Improved instructions on adding a virtual environment in the
README
. (#457) - Improved documentation for passing CLI defaults in
.sqlfluff
. (#452) - Fix bug with templated blocks +
capitalisation_policy = lower
. (#477) - Fix array accessors in snowflake dialect. (#442)
- Color
logging
warnings red. (#497) - Allow whitespace before a shorthand cast. (#544)
- Silenced warnings when fixing from stdin. (#522)
- Allow an underscore as the first char in a semi structured element key. (#596)
- Fix PostFunctionGrammar in the Snowflake dialect which was causing strange behaviour in L012. (#619)
Removed
- Dropped support for python 3.5. (#482)
- From the CLI, the
--no-safety
option has been removed, the default is now that all enabled rules will be fixed. (#583) - Removed
BaseSegment.grammar
,BaseSegment._match_grammar()
andBaseSegment._parse_grammar()
instead preferring references directly toBaseSegment.match_grammar
andBaseSegment.parse_grammar
. (#509) - Removed
EmptySegmentGrammar
and replaced with better non-code handling in theFileSegment
itself. (#509) - Remove the
ContainsOnly
grammar as it remained only as an anti-pattern. (#509) - Removed the
expected_string()
functionality from grammars and segments (#509) as it was poorly supported. - Removed
BaseSegment.as_optional()
as now this functionality happens mostly in grammars (includingRef
). (#509) - Removed
ColumnExpressionSegment
in favour ofColumnReference
. (#512) - Removed the
LambdaSegment
feature, instead replacing with an internal to the grammar module calledNonCodeMatcher
. (#512) - Case sensitivity as a feature for segment matching has been removed as not required for existing dialects. (#517)
- Dependency on
difflib
orcdifflib
, by relying on source mapping instead to apply fixes. (#541)
[0.3.6] - 2020-09-24
Added
sqlfluff dialects
command to get a readout of available dialects [+ associated docs].- More helpful error messages when trying to run in Python2.
- Window functions now parse with
IGNORE
/RESPECT
NULLS
. - Parsing of
current_timestamp
and similar functions. Thanks @dmateusp. - Snowflake
QUALIFY
clause.
Changed
- Respect user config directories. Thanks @sethwoodworth.
- Fix incorrect reporting of L013 with
*
. Thanks @dmateusp. - Fix incorrect reporting of L027 with column aliases. Thanks @pwildenhain.
- Simplification of application of fixes and correction of a case where fixes could be depleted. Thanks @NiallRees.
- Fix functions with a similar structure to
SUBSTRING
. - Refactor BigQuery
REPLACE
andEXCEPT
clauses. - Bigquery date parts corrected.
- Snowflake array accessors.
- Psotgres
NOTNULL
andISNULL
. - Bugfix in snowflake for keywords used in semistructured queries.
- Nested
WITH
statements now parse. - Performance improvements in the
fix
command. - Numeric literals starting with a decimal now parse.
- Refactor the jinja templater.
[0.3.5] - 2020-08-03
Added
- Patterns and Anti-patterns in documentation. Thanks @flpezet.
- Functions in
GROUP BY
. Thanks @flpezet.
Changed
- Deep bugfixes in the parser to handle simple matching better for a few edge cases. Also added some logging deeper in the parser.
- Added in the
SelectableGrammar
and some related segments to make it easier to refer to select-like things in other grammars. - Fixes to
CASE
statement parsing. Thanks @azhard. - Fix to snowflake
SAMPLE
implementation. Thanks @rkm3. - Numerous docs fixes. Thanks @SimonStJG, @flpezet, @s-pace, @nolanbconaway.
[0.3.4] - 2020-05-13
Changed
- Implementation of the bigquery
CREATE MODEL
syntax. Thanks @barrywhart. - Bugfixes for:
- Edge cases for L006
- False alarms on L025
ORDER BY x NULLS FIRST|LAST
FOR
keyword in bigquerySYSTEM_TIME
syntax.
[0.3.3] - 2020-05-11
Added
- Added the
--nofail
option toparse
andlint
commands to assist rollout. - Added the
--version
option to complement theversion
option already available on the cli. - Parsing for
ALTER TABLE
. - Warning for unset dialects when getting parsing errors.
- Configurable line lengths for output.
[0.3.2] - 2020-05-08
Added
- Support for the Teradata dialect. Thanks @Katzmann1983!
- A much more detailed getting started guide in the docs.
- For the
parse
command, added the--profiler
and--bench
options to help debugging performance issues. - Support for the
do
command in the jinja templater. - Proper parsing of the concatenate operator (
||
). - Proper indent handling of closing brackets.
- Logging and benchmarking of parse performance as part of the CI pipeline.
- Parsing of object references with defaults like
my_db..my_table
. - Support for the
INTERVAL '4 days'
style interval expression. - Configurable trailing or leading comma linting.
- Configurable indentation for
JOIN
clauses. - Rules now have their own logging interface to improve debugging ability.
- Snowflake and Postgres dialects.
- Support for a
.sqlfluffignore
file to ignore certain paths. - More generic interfaces for managing keywords in dialects, including
set
interfaces for managing and creating keywords and theRef.keyword()
method to refer to them, and the ability to refer directly to keyword names in most grammars using strings directly. IncludesSegmentGenerator
objects to bind dialect objects at runtime from sets. Thanks @Katzmann1983! - Rule
L029
for using unreserved keywords as variable names. - The jinja templater now allows macros loaded from files, and the
hydration of variables ending in
_path
in the config files. - JSON operators and the
DISTINCT ON ()
syntax for the postgres dialect.
Changed
- Refactor of whitespace and non-code handling so that segments are less greedy and default to not holding whitespace on ends. This allows more consistent linting rule application.
- Change config file reading to case-sensitive to support case sensitivity in jinja templating.
- Non-string values (including lists) now function in the python and jinja templating libraries.
- Validation of the match results of grammars has been reduced. In production cases the validation will still be done, but only on parse and not on match.
- At low verbosities, python level logging is also reduced.
- Some matcher rules in the parser can now be classified as simple which allows them to shortcut some of the matching routines.
- Yaml output now double quotes values with newlines or tab characters.
- Better handling on hanging and closing indents when linting rule L003.
- More capable handline of multi-line comments so that indentation and line length parsing works. This involves some deep changes to the lexer.
- Getting violations from the linter now automatically takes into account of ignore rules and filters.
- Several bugfixes, including catching potential infinite regress during fixing of files, if one fix would re-introduce a problem with another.
- Behaviour of the
Bracketed
grammar has been changed to treat its content as aSequence
rather than aOneOf
. - Move to
SandboxedEnvironment
rather thanEnvironment
for jinja templating for security. - Improve reporting of templating issues, especially for the jinja templater so that missing variables are rendered as blanks, but still reported as templating violations.
[0.3.1] - 2020-02-17
Added
- Support for
a.b.*
on top ofa.*
in select target expressions.
[0.3.0] - 2020-02-15
Changed
- Deprecated python 2.7 and python 3.4 which are now both past their maintenance horizon. The 0.2.x branch will remain available for continued development for these versions.
- Rule L003 is now significantly smarter in linting indentation with support for hanging indents and comparison to the most recent line which doesn't have an error. The old (more simple) functionality of directly checking whether an indent was a multiple of a preset value has been removed.
- Fixed the "inconsistent" bug in L010. Thanks @nolanbconaway.
- Updated logging of parsing and lexing errors to have more useful error codes.
- Changed parsing of expressions to favour functions over identifiers to fix the expression bug.
- Fixed the "inconsistent" bug in L010. Thanks @nolanbconaway.
- Moved where the
SELECT
keyword is parsed within a select statement, so that it belongs as part of the newly renamedselect_clause
(renamed from previouslyselect_target_group
). - Clarified handling of the
type
andname
properties of the BaseSegment class and its children.name
should be specific to a particular kind of segment, andtype
should express a wider group. Handling of thenewline
,whitespace
andcomma
segments has been updated so that we use thetype
property for most use cases rather thanname
.
Added
- Meta segments for indicating where things can be present in the parsed
tree. This is mostly illustrated using the
Indent
andDedent
segments used for indicating the position of theoretical indents in the structure. Several helper functions have been added across the codebase to handle this increase in the kinds of segments which might be encountered by various grammars. - Rule L016 has been added to lint long lines. In the
fix
phase of this rule, there is enough logic to try and reconstruct a sensible place for line breaks as re-flow the query. This will likely need further work and may still encounter places where it doesn't fix all errors but should be able to deal with the majority of simple cases. - BigQuery dialect, initially just for appropriate quoting.
- Added parsing of DDL statements such as
COMMIT
,DROP
,GRANT
,REVOKE
andROLLBACK
. Thanks @barrywhart. --format
option to theparse
command that allows a yaml output. This is mostly to make test writing easier in the development process but might also be useful for other things.- Parsing of set operations like
UNION
. - Support for the
diff-cover
tool. Thanks @barrywhart. - Enabled the
fix
command while usingstdin
. Thanks @nolanbconaway. - Rule to detect incorrect use of
DISTINCT
. Thanks @barrywhart. - Security fixes from DeepCover. Thanks @sanketsaurav.
- Automatic fix testing, to help support the newer more complicated rules.
- Interval literals
- Support for the
source
macro from dbt. Thanks @Dandandan - Support for functions with spaces between the function name and the brackets
and a linting rule
L017
to catch this. - Efficiency cache for faster pruning of the parse tree.
- Parsing of array notation as using in BigQuery and Postgres.
- Enable the
ignore
parameter on linting and fixing commands to ignore particular kinds of violations.
[0.2.4] - 2019-12-06
Added
- A
--code-only
option to theparse
command to spit out a more simplified output with only the code elements. - Rules can now optionally override the description of the violation
and pass that back via the
LintingResult
.
Changed
- Bugfix, correct missing files in
setup.py
install_requires
section. - Better parsing of the not equal operator.
- Added more exclusions to identifier reserved words to fix cross joins.
- At verbosity levels 2 or above, the root config is printed and then any diffs to that for specific files are also printed.
- Linting and parsing of directories now reports files in alphabetical order. Thanks @barrywhart.
- Better python 2.7 stability. Thanks @barrywhart.
- Fixing parsing of
IN
/NOT IN
andIS
/IS NOT
.
[0.2.3] - 2019-12-02
Changed
- Bugfix, default config not included.
[0.2.2] - 2019-12-02
Changed
- Tweek rule L005 to report more sensibly with newlines.
- Rework testing of rules to be more modular.
- Fix a config file bug if no root config file was present for some values. Thanks @barrywhart.
- Lexing rules are now part of the dialect rather than a global so that they can be overriden by other dialects when we get to that stage.
[0.2.0] - 2019-12-01
Added
- Templating support (jinja2, python or raw).
- Variables + Macros.
- The
fix
command is also sensitive to fixing over templates and will skip certain fixes if it feels that it's conflicted.
- Config file support, including specifying context for the templater.
- Documentation via Sphinx and readthedocs.
- Including a guide on the role of SQL in the real world. Assisted by @barrywhart.
- Documentation LINTING (given we're a linting project) introduced in CI.
- Reimplemented L006 & L007 which lint whitespace around operators.
- Ability to configure rule behaviour direclty from the config file.
- Implemented L010 to lint capitalisation of keywords.
- Allow casting in the parser using the
::
operator. - Implemented
GROUP BY
andLIMIT
. - Added
ORDER BY
using indexes and expressions. - Added parsing of
CASE
statements. - Support for window/aggregate functions.
- Added linting and parsing of alias expressions.
Changed
- Fixed a bug which could cause potential infinite recursion in configuration
- Changed how negative literals are handled, so that they're now a compound segment rather than being identified at the lexing stage. This is to allow the parser to resolve the potential ambiguity.
- Restructure of rule definitions to be more streamlined and also enable
autodocumentation. This includes a more complete
RuleSet
class which now holds the filtering code. - Corrected logging in fix mode not to duplicate the reporting of errors.
- Now allows insert statements with a nested
with
clause. - Fixed verbose logging during parsing.
- Allow the
Bracketed
grammar to optionally match empty brackets using the optional keyword.
[0.1.5] - 2019-11-11
Added
- Python 3.8 Support!
Changed
- Moved some of the responsibility for formatted logging into the linter to mean that we can log progressively in large directories.
- Fixed a bug in the grammar where one of the return values was messed up.
[0.1.4] - 2019-11-10
Added
- Added a
--exclude-rules
argument to most of the commands to allow rule users to exclude specific subset of rules, by @sumitkumar1209 - Added lexing for
!=
,~
and::
. - Added a new common segment:
LambdaSegment
which allows matching based on arbitrary functions which can be applied to segments. - Recursive Expressions for both arithmetic and functions, based heavily off the grammar provided by the guys at CockroachDB.
- An
Anything
grammar, useful in matching rather than in parsing to match anything.
Changed
- Complete rewrite of the bracket counting functions, using some centralised class methods
on the
BaseGrammar
class to support common matching features across multiple grammars. In particular this affects theDelimited
grammar which is now much simpler but does also require slightly more liberal use of terminators to match effectively. - Rather than passing around multiple variables during parsing and matching, there is now
a
ParseContext
object which contains things like the dialect and various depths. This simplifies the parsing and matching code significantly. - Bracket referencing is now done from the dialect directly, rather than in individual
Grammars (except the
Bracketed
grammar, which still implements it directly). This takes out some originally duplicated code. - Corrected the parsing of ordering keywords in and
ORDER BY
clause.
Removed
- Removed the
bracket_sensitive_forward_match
method from theBaseGrammar
. It was ugly and not flexible enough. It's been replaced by a suite of methods as described above.
[0.1.3] - 2019-10-30
Changed
- Tweak to the L001 rule so that it doesn't crash the whole thing.
[0.1.2] - 2019-10-30
Changed
- Fixed the errors raised by the lexer.
[0.1.1] - 2019-10-30
Changed
- Fixed which modules from sqlfluff are installed in the setup.py. This affects
the
version
command.
[0.1.0] - 2019-10-29
Changed
- Big Rewrite - some loss in functionality might be apparent compared to pre-0.1.0. Please submit any major problems as issues on github
- Changed unicode handling for better escape codes in python 2. Thanks @mrshu
- BIG rewrite of the parser, completely new architecture. This introduces
breaking changes and some loss of functionality while we catch up.
- In particular, matches now return partial matches to speed up parsing.
- The
Delimited
matcher has had a significant re-write with a major speedup and broken the dependency onSequence
. - Rewrite of
StartsWith
andSequence
to use partial matches properly. - Different treatment of numeric literals.
- Both
Bracketed
andDelimited
respect bracket counting. - MASSIVE rewrite of
Bracketed
.
- Grammars now have timers.
- Joins properly parsing,
- Rewrite of logging to selectively output commands at different levels
of verbosity. This uses the
verbosity_logger
method. - Added a command line
sqlfluff parse
option which runs just the parsing step of the process to better understand how a file is being parsed. This also has options to configure how deep we recurse. - Complete Re-write of the rules section, implementing new
crawlers
which implement the linting rules. Now with inbuilt fixers in them. - Old rules removed and re implemented so we now have parity with the old rule sets.
- Moved to using Ref mostly within the core grammar so that we can have recursion.
- Used recursion to do a first implementation of arithmetic parsing. Including a test for it.
- Moved the main grammar into a seperate dialect and renamed source and test files accordingly.
- Moved to file-based tests for the ansi dialect to make it easier to test using the tool directly.
- As part of file tests - expected outcomes are now encoded in yaml to make it easier to write new tests.
- Vastly improved readability and debugging potential of the _match logging.
- Added support for windows line endings in the lexer.
[0.0.7] - 2018-11-19
Added
- Added a
sqlfluff fix
as a command to implement auto-fixing of linting errors. For now onlyL001
is implemented as a rule that can fix things. - Added a
rules
command to introspect the available rules. - Updated the cli table function to use the
testwrap
library and also deal a lot better with longer values. - Added a
--rules
argument to most of the commands to allow rule users to focus their search on a specific subset of rules.
Changed
- Refactor the cli tests to use the click CliRunner. Much faster
[0.0.6] - 2018-11-15
Added
- Number matching
Changed
- Fixed operator parsing and linting (including allowing the exception of
(*)
)
[0.0.5] - 2018-11-15
Added
- Much better documentation including the DOCS.md
Changed
- Fixed comma parsing and linting
[0.0.4] - 2018-11-14
Added
- Added operator regexes
- Added a priority for matchers to resolve some ambiguity
- Added tests for operator regexes
- Added ability to initialise the memory in rules
[0.0.3] - 2018-11-14
Added
- Refactor of rules to allow rules with memory
- Adding comma linting rules (correcting the single character matchers)
- Adding mixed indentation linting rules
- Integration with CircleCI, CodeCov and lots of badges
Changed
- Changed import of version information to fix bug with importing config.ini
- Added basic violations/file reporting for some verbosities
- Refactor of rules to simplify definition
- Refactor of color cli output to make it more reusable
[0.0.2] - 2018-11-09
Added
- Longer project description
- Proper exit codes
- colorama for colored output
Changed
- Significant CLI changes
- Much improved output from CLI
[0.0.1] - 2018-11-07
Added
- Initial Commit! - VERY ALPHA
- Restructure into package layout
- Adding Tox and Pytest so that they work
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for sqlfluff-0.4.0a3-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 51323621b134c24f3793b2ca036508ebb29f01c35bda4a73f1171073587e3fd8 |
|
MD5 | 38f329c4a500618a0890beb339fffd85 |
|
BLAKE2b-256 | 982f207d41137034940cbca84f75933c49cc1645de1d579936cf16d2335a9473 |