Advanced PostgreSQL SQL parser for extracting database schema metadata
Project description
PostgreSQL SQL Parser
SQLParser Project
Overview
SQLParser is a Python library designed to parse and analyze SQL scripts, focusing on Data Definition Language (DDL) statements and SELECT queries. It processes SQL statements to extract structured information about tables, columns, constraints, and query elements, providing a robust foundation for database schema analysis and query processing.
Features
- DDL Parsing: Supports parsing of
CREATE TABLE,ALTER TABLE, andCREATE INDEXstatements to extract table structures, columns, primary keys, foreign keys, and constraints. - SELECT Query Parsing: Handles
SELECTstatements, including those with Common Table Expressions (CTEs), to identify query structure and column expressions. - Tokenization: Utilizes an
AdvancedSQLLexerto tokenize SQL scripts accurately. - Structured Output: Represents parsed SQL components as structured objects (
Table,Column,PrimaryKey,ForeignKey,Constraint, etc.). - Extensible Architecture: Built with a modular design using a base parser class (
ParserBase) for easy extension and maintenance.
Source Installation
To use the SQLParser library, ensure you have Python 3.8+ installed. Clone the repository and install dependencies:
git clone https://github.com/devsunny/python-pgsql-parser.git
cd sqlparser
pip install -r requirements.txt
Pip installation
pip install python-pgsql-parser
Usage
The AdvancedSQLParser class is the main entry point for parsing SQL scripts. Below is an example of how to use it:
from sqlparser import AdvancedSQLParser
# Example SQL script
sql_script = """
CREATE TABLE users (
id INT PRIMARY KEY,
username VARCHAR(50) NOT NULL,
email VARCHAR(100),
FOREIGN KEY (email) REFERENCES accounts(email)
);
SELECT username, email FROM users;
"""
# Initialize parser and parse the script
parser = AdvancedSQLParser(sql_script)
# Access parsed tables
tables = parser.get_tables()
for table_name, table in tables.items():
print(f"Table: {table.name}")
for col_name, column in table.columns.items():
print(f" Column: {col_name}, Type: {column.col_type}, Nullable: {column.nullable}")
if table.primary_key:
print(f" Primary Key: {table.primary_key.columns}")
for fk in table.foreign_keys:
print(f" Foreign Key: {fk.columns} -> {fk.ref_table}({fk.ref_columns})")
Output
For the above SQL script, the parser will output structured information about the users table, its columns, primary key, foreign key, and the SELECT query components.
Project Structure
- sqlparser/models.py: Defines data models (
Token,TokenType,Table,Column,PrimaryKey,ForeignKey,Constraint, etc.) for representing SQL components. - sqlparser/sql_lexer.py: Implements
AdvancedSQLLexerfor tokenizing SQL scripts. - sqlparser/parser_base.py: Provides the base
ParserBaseclass with utility methods for token consumption and parsing. - sqlparser/advanced_statement_analyzer.py: Analyzes SQL statements and handles nested subqueries.
- sqlparser/column_def_parser.py: Parses column definitions, including data types, constraints, and foreign key references.
- sqlparser/advanced_ddl_statement_parser.py: Parses DDL statements (
CREATE TABLE,ALTER TABLE,CREATE INDEX). - sqlparser/sql_query_parser.py: Parses
SELECTqueries, including CTEs and column expressions. - sqlparser/advanced_sql_parser.py: Main parser class that orchestrates the parsing process and maintains the table registry.
Code Example
Below is a snippet from column_def_parser.py that demonstrates how column definitions are parsed:
class ColumnDefParser(ParserBase): def init(self, col_def: List[Token]): super().init(col_def) self.col_name = None self.col_type = None self.nullable = True self.default_value = None self.is_primary = False self.fk_ref = None self.max_char_length = None self.precision = 0 self.scale = 0 self.primary_key = None self.foreign_key = None self.constraint = None self.foreign_key_ref = None self._parse()
def _parse_column_def(self):
while self.current_pos < self.num_of_tokens:
tok = self._consume_one()
if self.col_name is None:
self.col_name = tok.value
elif self.col_type is None:
self.col_type = tok.value.upper()
elif tok.token_type == TokenType.OPEN_PAREN:
self._parse_precision_scale()
self._consume_one() # consume close_paren
elif self._is_keyword(tok, "NOT") and self._is_keyword(self._peek(), "NULL"):
self.nullable = False
self._consume_one() # consume null
# ... (additional parsing logic)
Dependencies
- Python 3.8+
typing(standard library)collections.OrderedDict(standard library)
Contributing
Contributions are welcome! Please follow these steps:
- Fork the repository
- Create your feature branch (
git checkout -b feature/your-feature) - Commit your changes (
git commit -am 'Add some feature') - Push to the branch (
git push origin feature/your-feature) - Open a pull request
License
This project is licensed under the MIT License - see the LICENSE file for details.
Support
For issues and feature requests, please open an issue.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file python_pgsql_parser-0.2.0.tar.gz.
File metadata
- Download URL: python_pgsql_parser-0.2.0.tar.gz
- Upload date:
- Size: 27.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b8349e13148b3d176688195a90dca94b7cc04ff7d6f72cbe74b29325148b11ff
|
|
| MD5 |
451e592e42f2abd5103a0172e24524a1
|
|
| BLAKE2b-256 |
c7f0fbfded9ac2b3dbaab0b0f31314145bf4aaccc8a85caf2a5daaa02dd7d810
|
File details
Details for the file python_pgsql_parser-0.2.0-py3-none-any.whl.
File metadata
- Download URL: python_pgsql_parser-0.2.0-py3-none-any.whl
- Upload date:
- Size: 22.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f494ecf8e168639706c92aafcf976ed09bb0e64c8ebbc8c8ebc2c9c8debdb8a3
|
|
| MD5 |
19ff8b7bf0bf4277978ce5397623be16
|
|
| BLAKE2b-256 |
06343d7d7313b5e38124647486a36904cfbf3d4ae9e1ccba131d92cde01249cd
|