SQL Dependency Extractor

These details have not been verified by PyPI

Project links

Project description

SQLDeps: SQL Dependency Extractor

SQLDeps Logo

A tool that automatically extracts and maps SQL dependencies and outputs using Large Language Models (LLMs).

Documentation: https://sqldeps.readthedocs.io/
Code repositoty: https://github.com/glue-lab/sqldeps

Overview

SQLDeps analyzes SQL scripts to identify:

Dependencies: Tables and columns that must exist BEFORE query execution
Outputs: Tables and columns permanently CREATED or MODIFIED by the query

It intelligently filters out temporary constructs like CTEs and derived tables, focusing only on the real database objects that matter.

Benefits

🛠️ Change Management: Safely modify schemas by identifying true dependencies
💾 Storage Optimization: Focus resources on essential tables and columns
🚢 Migration Planning: Precisely determine what needs to be migrated
📝 Project Documentation: Create comprehensive maps of database dependencies

Installation

pip install sqldeps

Quick Start

SQLDeps provides both API and CLI interfaces:

API: Enables flexibility for Python developers to integrate with their applications
CLI: Offers an easy-to-use command-line interface for quick analysis

API Usage

from sqldeps.llm_parsers import create_extractor

# Create extractor with default settings (framework="groq", model="llama-3.3-70b-versatile")
extractor = create_extractor()

# Extract dependencies and outputs from a SQL query
sql_query = """
WITH user_orders AS (
    SELECT o.user_id, COUNT(*) AS order_count
    FROM orders o
    JOIN users u ON o.user_id = u.id
    WHERE u.status = 'active'
    GROUP BY o.user_id
)

CREATE TABLE transactions.user_order_summary AS
SELECT * FROM user_orders;
"""
result = extractor.extract_from_query(sql_query)

# Print the results
print("Dependencies:")
print(result.dependencies)
print("\nOutputs:")
print(result.outputs)

# Or extract from a file
result = extractor.extract_from_file('path/to/query.sql')

# Convert to dictionary or DataFrame
dict_format = result.to_dict()
df_format = result.to_dataframe()

CLI Usage

# Basic example with default settings
sqldeps path/to/query.sql

# Specify framework and output format
sqldeps path/to/query.sql --framework=openai --model=gpt-4o-mini -o query_deps.csv

# Process a folder recursively with database validation
sqldeps data/sql_folder \
    --recursive \
    --framework=deepseek \
    --db-match-schema \
    --db-target-schemas public,sales \
    --db-credentials configs/database.yml \
    -o folder_deps.csv

Example

Given this SQL query:

-- Common Table Expression (CTE) to count user orders for active users
WITH user_orders AS (
    SELECT o.user_id, COUNT(*) AS order_count
    FROM orders o
    JOIN users u ON o.user_id = u.id
    WHERE u.status = 'active'
    GROUP BY o.user_id
)

-- Create a new table from the CTE
CREATE TABLE transactions.user_order_summary AS
SELECT * FROM user_orders;

-- Truncate an existing table before repopulating
TRUNCATE TABLE order_summary;

SQLDeps will extract:

{
  "dependencies": {
    "orders": ["user_id"],
    "users": ["id", "status"],
    "order_summary": []
  },
  "outputs": {
    "transactions.user_order_summary": ["*"],
    "order_summary": []
  }
}

Note how:

CTE (user_orders) is correctly excluded
Real source tables (orders, users) are included as dependencies
Target tables (transactions.user_order_summary, order_summary) are correctly identified as outputs
For TRUNCATE operations, the table appears in both dependencies and outputs because it must exist before truncating and the operation modifies the table

Supported Models

All models available on Groq, OpenAI, and DeepSeek.
For up-to-date pricing details, please check Groq, OpenAI, DeepSeek.

API Keys / Configuration

You'll need to set up API keys for your chosen LLM provider. Create a .env file in your project root (see the provided .env.example):

# LLM API Keys
GROQ_API_KEY=your_groq_api_key
OPENAI_API_KEY=your_openai_api_key
DEEPSEEK_API_KEY=your_deepseek_api_key

# Database credentials (for schema validation)
DB_HOST=localhost
DB_PORT=5432
DB_NAME=mydatabase
DB_USER=username
DB_PASSWORD=password

For custom database YAML configuration file (optional):

# database.yml
database:
  host: localhost
  port: 5432
  database: mydatabase
  username: username
  password: password

Advanced Usage

Database Schema Validation

from sqldeps.database import PostgreSQLConnector
from sqldeps.llm_parsers import create_extractor

# Extract dependencies
extractor = create_extractor(framework="openai", model="gpt-4o")
result = extractor.extract_from_file('query.sql')

# Connect to database and validate
conn = PostgreSQLConnector(
    host="localhost",
    port=5432,
    database="mydatabase",
    username="username"
)

# Match extracted dependencies against database schema
validated_schema = extractor.match_database_schema(
    result,
    db_connection=conn,
    target_schemas=["public", "sales"]
)

# View validation results (pandas DataFrame)
print(validated_schema)

Processing Multiple Files

# Extract dependencies from all SQL files in a folder
result = extractor.extract_from_folder('/path/to/sql_folder', recursive=True)

Using Custom Prompts

You can customize the prompts used to instruct the LLM:

# Create extractor with custom prompt
extractor = create_extractor(
    framework="groq",
    model="llama-3.3-70b-versatile",
    prompt_path="path/to/custom_prompt.yml"
)

The custom prompt YAML should include:

system_prompt: |
  Detailed instructions to the model...

user_prompt: |
  Extract SQL dependencies and outputs from this query:
  {sql}

Web Application

SQLDeps includes a Streamlit-based web interface for interactive exploration of single SQL files:

# Install with web app dependencies
pip install "sqldeps[app]"

# Run the app
streamlit run app/main.py

Note: The web application is currently designed for single-file extraction and demonstration purposes. For processing multiple files or entire folders, use the API or CLI.

Documentation

For comprehensive documentation, including API reference and examples, visit https://sqldeps.readthedocs.io.

Contributing

Contributions are welcome!

Found a bug? Please open an issue with detailed information.
Missing a feature? Feel free to suggest enhancements or submit a pull request.

Check out the issues page or submit a pull request.

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.1

May 12, 2025

0.1.0

Apr 4, 2025

This version

0.0.10

Mar 20, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sqldeps-0.0.10.tar.gz (23.6 kB view details)

Uploaded Mar 20, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

sqldeps-0.0.10-py3-none-any.whl (28.3 kB view details)

Uploaded Mar 20, 2025 Python 3

File details

Details for the file sqldeps-0.0.10.tar.gz.

File metadata

Download URL: sqldeps-0.0.10.tar.gz
Upload date: Mar 20, 2025
Size: 23.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.5.9

File hashes

Hashes for sqldeps-0.0.10.tar.gz
Algorithm	Hash digest
SHA256	`11b16f01b8d6b442eef975fb2d29e5ad4eafe29bc2d413e599f4b2c643547f65`
MD5	`f4c6a9507a86eca81cfd2a8e60c803a9`
BLAKE2b-256	`46d66f1454bc6ff84e6f9b0fc947df72d16c16b61fbecaef6398c8e26f9375ba`

See more details on using hashes here.

File details

Details for the file sqldeps-0.0.10-py3-none-any.whl.

File metadata

Download URL: sqldeps-0.0.10-py3-none-any.whl
Upload date: Mar 20, 2025
Size: 28.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.5.9

File hashes

Hashes for sqldeps-0.0.10-py3-none-any.whl
Algorithm	Hash digest
SHA256	`2b4f40e2aafa320bd842d956df650bd962c0057512a8740284314bfaaca40c82`
MD5	`6fcbcd35633783638a5d5bcafa204d91`
BLAKE2b-256	`e07ee4a87044be6ddef2703226ac3743f03248be294616702f288472959d4d7e`

See more details on using hashes here.

sqldeps 0.0.10

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

SQLDeps: SQL Dependency Extractor

Overview

Benefits

Installation

Quick Start

API Usage

CLI Usage

Example

Supported Models

API Keys / Configuration

Advanced Usage

Database Schema Validation

Processing Multiple Files

Using Custom Prompts

Web Application

Documentation

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes