Skip to main content

Advanced PostgreSQL SQL parser for extracting database schema metadata

Project description

PostgreSQL SQL Parser

Python Version License: MIT PyPI Version

SQLParser Project

Overview

SQLParser is a Python library designed to parse and analyze SQL scripts, focusing on Data Definition Language (DDL) statements and SELECT queries. It processes SQL statements to extract structured information about tables, columns, constraints, and query elements, providing a robust foundation for database schema analysis and query processing.

Features

  • DDL Parsing: Supports parsing of CREATE TABLE, ALTER TABLE, and CREATE INDEX statements to extract table structures, columns, primary keys, foreign keys, and constraints.
  • SELECT Query Parsing: Handles SELECT statements, including those with Common Table Expressions (CTEs), to identify query structure and column expressions.
  • Tokenization: Utilizes an AdvancedSQLLexer to tokenize SQL scripts accurately.
  • Structured Output: Represents parsed SQL components as structured objects (Table, Column, PrimaryKey, ForeignKey, Constraint, etc.).
  • Extensible Architecture: Built with a modular design using a base parser class (ParserBase) for easy extension and maintenance.

Source Installation

To use the SQLParser library, ensure you have Python 3.8+ installed. Clone the repository and install dependencies:

git clone https://github.com/devsunny/python-pgsql-parser.git
cd sqlparser
pip install -r requirements.txt

Pip installation

pip install python-pgsql-parser

Usage

The AdvancedSQLParser class is the main entry point for parsing SQL scripts. Below is an example of how to use it:

from sqlparser import AdvancedSQLParser

# Example SQL script
sql_script = """
CREATE TABLE users (
    id INT PRIMARY KEY,
    username VARCHAR(50) NOT NULL,
    email VARCHAR(100),
    FOREIGN KEY (email) REFERENCES accounts(email)
);
SELECT username, email FROM users;
"""

# Initialize parser and parse the script
parser = AdvancedSQLParser(sql_script)

# Access parsed tables
tables = parser.get_tables()
for table_name, table in tables.items():
    print(f"Table: {table.name}")
    for col_name, column in table.columns.items():
        print(f"  Column: {col_name}, Type: {column.col_type}, Nullable: {column.nullable}")
    if table.primary_key:
        print(f"  Primary Key: {table.primary_key.columns}")
    for fk in table.foreign_keys:
        print(f"  Foreign Key: {fk.columns} -> {fk.ref_table}({fk.ref_columns})")

Output

For the above SQL script, the parser will output structured information about the users table, its columns, primary key, foreign key, and the SELECT query components.

Project Structure

  • sqlparser/models.py: Defines data models (Token, TokenType, Table, Column, PrimaryKey, ForeignKey, Constraint, etc.) for representing SQL components.
  • sqlparser/sql_lexer.py: Implements AdvancedSQLLexer for tokenizing SQL scripts.
  • sqlparser/parser_base.py: Provides the base ParserBase class with utility methods for token consumption and parsing.
  • sqlparser/advanced_statement_analyzer.py: Analyzes SQL statements and handles nested subqueries.
  • sqlparser/column_def_parser.py: Parses column definitions, including data types, constraints, and foreign key references.
  • sqlparser/advanced_ddl_statement_parser.py: Parses DDL statements (CREATE TABLE, ALTER TABLE, CREATE INDEX).
  • sqlparser/sql_query_parser.py: Parses SELECT queries, including CTEs and column expressions.
  • sqlparser/advanced_sql_parser.py: Main parser class that orchestrates the parsing process and maintains the table registry.

Code Example

Below is a snippet from column_def_parser.py that demonstrates how column definitions are parsed:

from typing import List from .models import Token, TokenType, Column, PrimaryKey, ForeignKey, Constraint from .parser_base import ParserBase

class ColumnDefParser(ParserBase): def init(self, col_def: List[Token]): super().init(col_def) self.col_name = None self.col_type = None self.nullable = True self.default_value = None self.is_primary = False self.fk_ref = None self.max_char_length = None self.precision = 0 self.scale = 0 self.primary_key = None self.foreign_key = None self.constraint = None self.foreign_key_ref = None self._parse()

def _parse_column_def(self):
    while self.current_pos < self.num_of_tokens:
        tok = self._consume_one()
        if self.col_name is None:
            self.col_name = tok.value
        elif self.col_type is None:
            self.col_type = tok.value.upper()
        elif tok.token_type == TokenType.OPEN_PAREN:
            self._parse_precision_scale()
            self._consume_one()  # consume close_paren
        elif self._is_keyword(tok, "NOT") and self._is_keyword(self._peek(), "NULL"):
            self.nullable = False
            self._consume_one()  # consume null
        # ... (additional parsing logic)

Dependencies

  • Python 3.8+
  • typing (standard library)
  • collections.OrderedDict (standard library)

Contributing

Contributions are welcome! Please follow these steps:

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/your-feature)
  3. Commit your changes (git commit -am 'Add some feature')
  4. Push to the branch (git push origin feature/your-feature)
  5. Open a pull request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Support

For issues and feature requests, please open an issue.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

python_pgsql_parser-0.2.0.tar.gz (27.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

python_pgsql_parser-0.2.0-py3-none-any.whl (22.3 kB view details)

Uploaded Python 3

File details

Details for the file python_pgsql_parser-0.2.0.tar.gz.

File metadata

  • Download URL: python_pgsql_parser-0.2.0.tar.gz
  • Upload date:
  • Size: 27.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for python_pgsql_parser-0.2.0.tar.gz
Algorithm Hash digest
SHA256 b8349e13148b3d176688195a90dca94b7cc04ff7d6f72cbe74b29325148b11ff
MD5 451e592e42f2abd5103a0172e24524a1
BLAKE2b-256 c7f0fbfded9ac2b3dbaab0b0f31314145bf4aaccc8a85caf2a5daaa02dd7d810

See more details on using hashes here.

File details

Details for the file python_pgsql_parser-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for python_pgsql_parser-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f494ecf8e168639706c92aafcf976ed09bb0e64c8ebbc8c8ebc2c9c8debdb8a3
MD5 19ff8b7bf0bf4277978ce5397623be16
BLAKE2b-256 06343d7d7313b5e38124647486a36904cfbf3d4ae9e1ccba131d92cde01249cd

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page