A complete, production-ready SQLite SQL parser that returns detailed Abstract Syntax Trees (AST)
Project description
sqlite-ast-parser
A complete, production-ready SQLite SQL parser in Python that returns detailed Abstract Syntax Trees (AST) for all SQLite SQL statements.
๐ Read the Literate Programming Documentation - Comprehensive code walkthrough with explanations
Features
- Complete SQLite Coverage: Supports all SQLite statements (SELECT, INSERT, UPDATE, DELETE, CREATE, ALTER, DROP, transactions, etc.)
- Comprehensive AST: Detailed AST nodes for every SQL construct
- Position Tracking: All tokens and nodes include line/column information for error reporting
- Error Recovery: Collects multiple errors and provides helpful error messages
- Pure Python: No external dependencies
- Production Quality: Robust error handling and comprehensive node definitions
Architecture
The parser follows a scanner-buffer lexer and recursive descent parser design using precedence climbing for expressions.
Components
-
Lexer (
lexer.py): Mode-driven tokenizer- Handles strings, identifiers, comments, operators, keywords
- Position tracking for every token
- Multiple modes (NORMAL, STRING, COMMENT, etc.)
-
Parser (
parser.py): Recursive descent parser- Precedence climbing for expressions
- Error recovery with synchronization points
- Comprehensive support for all SQLite statements
-
AST Nodes (
ast_nodes.py): Complete node definitions- 80+ node types covering all SQLite constructs
- Position information on every node
- Dataclasses for easy manipulation
-
Error Handling (
errors.py): Production-quality errors- Position-aware error messages
- Context snippets showing error location
- Multiple error collection
-
Utilities (
utils.py): Constants and helpers- All 147 SQLite keywords
- Operator precedence table
- Token type definitions
Installation
pip install sqlite-ast-parser
No external dependencies needed - pure Python!
Usage
Basic Parsing
from sqlite_parser import parse_sql
# Parse a SQL statement
sql = "SELECT * FROM users WHERE age > 18"
ast = parse_sql(sql)
# ast is a list of Statement nodes
for statement in ast:
print(statement)
Tokenization Only
from sqlite_parser import tokenize_sql
sql = "SELECT id, name FROM users"
tokens = tokenize_sql(sql)
for token in tokens:
print(f"{token.type}: {token.value} at {token.position}")
Error Handling
from sqlite_parser import parse_sql
from sqlite_parser.errors import ParseError
try:
ast = parse_sql("SELECT FROM") # Invalid SQL
except ParseError as e:
print(f"Parse error: {e}")
# Output includes line/column and context
AST Structure
Statement Nodes
The parser returns a list of Statement nodes. Main statement types:
Data Manipulation (DML)
SelectStatement- SELECT queries (including compound SELECTs with UNION/INTERSECT/EXCEPT)InsertStatement- INSERT/REPLACE statementsUpdateStatement- UPDATE statementsDeleteStatement- DELETE statements
Data Definition (DDL)
CreateTableStatement- CREATE TABLEAlterTableStatement- ALTER TABLE (RENAME, ADD/DROP COLUMN)CreateIndexStatement- CREATE INDEXCreateViewStatement- CREATE VIEWCreateTriggerStatement- CREATE TRIGGERCreateVirtualTableStatement- CREATE VIRTUAL TABLEDrop*Statement- DROP TABLE/INDEX/VIEW/TRIGGER
Transaction Control
BeginStatement- BEGIN TRANSACTIONCommitStatement- COMMITRollbackStatement- ROLLBACKSavepointStatement- SAVEPOINTReleaseStatement- RELEASE SAVEPOINT
Database Management
AttachStatement- ATTACH DATABASEDetachStatement- DETACH DATABASEAnalyzeStatement- ANALYZEVacuumStatement- VACUUMReindexStatement- REINDEXExplainStatement- EXPLAIN [QUERY PLAN]PragmaStatement- PRAGMA
Expression Nodes
All expressions inherit from Expression. Main types:
Literals
NumberLiteral- integers and floatsStringLiteral- string constantsBlobLiteral- BLOB literals (X'hex')NullLiteral- NULLBooleanLiteral- TRUE/FALSECurrentTimeLiteral- CURRENT_TIME/DATE/TIMESTAMP
Identifiers
Identifier- simple column/table namesQualifiedIdentifier- schema.table.columnParameter- placeholders (?123, :name, @name, $name)
Operators
UnaryExpression- unary +, -, NOT, ~BinaryExpression- arithmetic, comparison, logical, bitwiseBetweenExpression- BETWEEN ... AND ...InExpression- IN (values) or IN (subquery)LikeExpression- LIKE/GLOB/REGEXP/MATCH
Functions and Special
FunctionCall- function calls with DISTINCT, FILTER, OVERCaseExpression- CASE WHEN ... THEN ... ELSE ... ENDCastExpression- CAST(expr AS type)CollateExpression- expr COLLATE collationExistsExpression- EXISTS (subquery)SubqueryExpression- (SELECT ...)WindowExpression- OVER (PARTITION BY ... ORDER BY ... ROWS/RANGE ...)
Example AST
sql = "SELECT id, name FROM users WHERE age > 18"
ast = parse_sql(sql)
# ast[0] is a SelectStatement with:
# - select_core.columns: [ResultColumn(Identifier("id")), ResultColumn(Identifier("name"))]
# - select_core.from_clause: FromClause(TableReference("users"))
# - select_core.where: WhereClause(BinaryExpression(>, Identifier("age"), NumberLiteral(18)))
Implementation Status
โ Complete
- Complete AST node definitions (80+ types)
- Full lexer implementation (tokenization)
- Full parser implementation (all SQLite statements)
- Error classes with position tracking
- Utility functions and constants (147 keywords, precedence table)
- Public API in
__init__.py - Comprehensive test suite (127 tests, 100% pass rate)
- Example scripts and debugging utilities
Development Guide
Adding New Statement Types
- Define AST node in
ast_nodes.py - Add parsing method in
parser.py - Add tests in
tests/
Example:
# In ast_nodes.py
@dataclass
class MyStatement(Statement):
field1: str
field2: Expression
# In parser.py
def parse_my_statement(self) -> MyStatement:
self.expect(TokenType.MY_KEYWORD)
field1 = self.consume(TokenType.IDENTIFIER).value
field2 = self.parse_expression()
return MyStatement(field1=field1, field2=field2)
Extending the Parser
The parser uses recursive descent with these key methods:
peek(n)- look ahead n tokensmatch(*types)- check if current token matchesconsume(type)- advance if matches, else errorexpect(type)- consume with detailed error message
Expression parsing uses precedence climbing:
parse_expression()- entry pointparse_binary_expression(min_prec)- handles operator precedenceparse_primary_expression()- literals, identifiers, function calls
Testing
# Example test
def test_select_statement():
sql = "SELECT * FROM users"
ast = parse_sql(sql)
assert len(ast) == 1
assert isinstance(ast[0], SelectStatement)
assert ast[0].select_core.columns[0].expression is None # * means no expression
Documentation
Additional documentation is available in the docs/ directory:
docs/sqlite_syntax_reference.md- Complete SQLite syntax referencedocs/lexer+parser_mental_model.md- Conceptual foundation of the lexer and parser designdocs/IMPLEMENTATION_COMPLETE.md- Implementation detailsdocs/TEST_RESULTS.md- Test results and coverage
Implementation Notes
Lexer Design
The lexer follows a mode-driven scanner-buffer model:
Modes:
NORMAL- default SQL parsingSTRING_SINGLE- inside 'string'STRING_DOUBLE- inside "identifier"BLOCK_COMMENT- inside /* comment */LINE_COMMENT- inside -- commentBRACKET_IDENTIFIER- inside [identifier]BACKTICK_IDENTIFIER- insideidentifier
Core Operations:
peek(n)- look ahead n charactersadvance()- consume character, add to bufferskip()- consume without bufferingemit(type)- create token with positionerror(msg)- report lexer error
Parser Design
The parser uses recursive descent with precedence climbing for expressions:
Expression Precedence (low to high):
- OR
- AND
- NOT
- Comparison (=, <, >, <=, >=, !=, <>, IS, IN, LIKE, BETWEEN)
- Bitwise (<<, >>, &, |)
- Addition (+, -, ||)
- Multiplication (*, /, %)
- COLLATE
- Unary (+, -, NOT, ~)
- Primary (literals, identifiers, function calls, subqueries)
Error Recovery:
- Panic mode: skip to synchronization points (semicolon, keywords)
- Error nodes: placeholder nodes to continue parsing
- Multiple error collection: don't stop at first error
Special SQLite Features
- Case Insensitivity: Keywords are case-insensitive
- Optional Semicolons: Statements don't require semicolons
- Flexible Quoting: Identifiers can use ", ', [], or `
- Parameters: Supports ?, ?N, :name, @name, $name
- Comments: Both -- line comments and /* block comments */
- BLOB Literals: X'hexadecimal'
- WITHOUT ROWID: Special table option
- STRICT: Type enforcement option
- Generated Columns: GENERATED ALWAYS AS (expr) STORED/VIRTUAL
Performance Considerations
- Lexer uses character-by-character scanning with lookahead
- Parser uses recursive descent (no backtracking except for disambiguation)
- Expression parsing uses precedence climbing (O(n) for expressions)
- No AST optimization passes (returns raw parse tree)
Limitations
- Does not validate semantic correctness (e.g., column existence)
- Does not resolve identifier scope
- Does not perform type checking
- Does not optimize or transform the AST
- Parser may accept some syntactically invalid SQL that SQLite would reject
References
- SQLite Official Docs: https://sqlite.org/lang.html
- SQLite Syntax Diagrams: https://sqlite.org/syntax.html
- See
docs/directory for additional documentation
Project Structure
SqliteASTParser/
โโโ README.md # This file
โโโ sqlite_syntax_reference.md # Complete SQLite syntax reference
โโโ lexer+parser_mental_model.md # Design mental model
โโโ sqlite_parser/ # Main package
โ โโโ __init__.py # Public API
โ โโโ ast_nodes.py # AST node definitions (โ
Complete)
โ โโโ errors.py # Error classes (โ
Complete)
โ โโโ utils.py # Constants and helpers (โ
Complete)
โ โโโ lexer.py # Tokenizer (๐ง In progress)
โ โโโ parser.py # Parser (๐ง In progress)
โโโ tests/ # Test suite
โ โโโ test_lexer.py
โ โโโ test_parser_select.py
โ โโโ test_parser_dml.py
โ โโโ test_parser_ddl.py
โ โโโ test_integration.py
โโโ examples/ # Example scripts
โโโ basic_usage.py
โโโ ast_traversal.py
โโโ pretty_print.py
Testing
The parser includes a comprehensive test suite with 127 tests covering all SQLite statement types:
pytest tests/ -v
Or run the full test suite:
python tests/test_all_statements.py
Contributing
Contributions welcome! Areas for potential improvement:
- Additional test cases for edge cases
- Performance optimizations
- Semantic analysis capabilities
- Pretty printer to format AST back to SQL
- AST transformation utilities
License
MIT License - see LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sqlite_ast_parser-0.1.0.tar.gz.
File metadata
- Download URL: sqlite_ast_parser-0.1.0.tar.gz
- Upload date:
- Size: 149.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
369fb1908f80bc5de71c0bd913f45b47bf05fe877a9e56aa227cf0fe945dd8d7
|
|
| MD5 |
3b369923ba26994c65b828a70ae7c1bc
|
|
| BLAKE2b-256 |
901d060d30d454300d5b3c48bcd7d547ceea30aa4a651db2b6e93e90052b899e
|
File details
Details for the file sqlite_ast_parser-0.1.0-py3-none-any.whl.
File metadata
- Download URL: sqlite_ast_parser-0.1.0-py3-none-any.whl
- Upload date:
- Size: 45.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
18570525a6c13dbcab7b6a6bca73b27af084c2e1a747c12e4f01ab4d37ca5730
|
|
| MD5 |
fdda7530d83c6b2438f0975d36a3a35d
|
|
| BLAKE2b-256 |
790303c77404751292d6814020edcd02c3c130eafbb0495d09ee5728fe73c62a
|