SQL Dependency Extractor
Project description
SQLDeps: SQL Dependency Extractor
A tool that automatically extracts and maps SQL dependencies and outputs using Large Language Models (LLMs).
- Documentation: https://sqldeps.readthedocs.io/
- Code repositoty: https://github.com/glue-lab/sqldeps
Overview
SQLDeps analyzes SQL scripts to identify:
- Dependencies: Tables and columns that must exist BEFORE query execution
- Outputs: Tables and columns permanently CREATED or MODIFIED by the query
It intelligently filters out temporary constructs like CTEs and derived tables, focusing only on the real database objects that matter.
Benefits
- 🛠️ Change Management: Safely modify schemas by identifying true dependencies
- 💾 Storage Optimization: Focus resources on essential tables and columns
- 🚢 Migration Planning: Precisely determine what needs to be migrated
- 📝 Project Documentation: Create comprehensive maps of database dependencies
Installation
pip install sqldeps
Quick Start
SQLDeps provides both API and CLI interfaces:
- API: Enables flexibility for Python developers to integrate with their applications
- CLI: Offers an easy-to-use command-line interface for quick analysis
API Usage
from sqldeps.llm_parsers import create_extractor
# Create extractor with default settings (framework="groq", model="llama-3.3-70b-versatile")
extractor = create_extractor()
# Extract dependencies and outputs from a SQL query
sql_query = """
WITH user_orders AS (
SELECT o.user_id, COUNT(*) AS order_count
FROM orders o
JOIN users u ON o.user_id = u.id
WHERE u.status = 'active'
GROUP BY o.user_id
)
CREATE TABLE transactions.user_order_summary AS
SELECT * FROM user_orders;
"""
result = extractor.extract_from_query(sql_query)
# Print the results
print("Dependencies:")
print(result.dependencies)
print("\nOutputs:")
print(result.outputs)
# Or extract from a file
result = extractor.extract_from_file('path/to/query.sql')
# Convert to dictionary or DataFrame
dict_format = result.to_dict()
df_format = result.to_dataframe()
CLI Usage
# Basic example with default settings
sqldeps path/to/query.sql
# Specify framework and output format
sqldeps path/to/query.sql --framework=openai --model=gpt-4o-mini -o query_deps.csv
# Process a folder recursively with database validation
sqldeps data/sql_folder \
--recursive \
--framework=deepseek \
--db-match-schema \
--db-target-schemas public,sales \
--db-credentials configs/database.yml \
-o folder_deps.csv
Example
Given this SQL query:
-- Common Table Expression (CTE) to count user orders for active users
WITH user_orders AS (
SELECT o.user_id, COUNT(*) AS order_count
FROM orders o
JOIN users u ON o.user_id = u.id
WHERE u.status = 'active'
GROUP BY o.user_id
)
-- Create a new table from the CTE
CREATE TABLE transactions.user_order_summary AS
SELECT * FROM user_orders;
-- Truncate an existing table before repopulating
TRUNCATE TABLE order_summary;
SQLDeps will extract:
{
"dependencies": {
"orders": ["user_id"],
"users": ["id", "status"],
"order_summary": []
},
"outputs": {
"transactions.user_order_summary": ["*"],
"order_summary": []
}
}
Note how:
- CTE (
user_orders) is correctly excluded - Real source tables (
orders,users) are included as dependencies - Target tables (
transactions.user_order_summary,order_summary) are correctly identified as outputs - For
TRUNCATEoperations, the table appears in both dependencies and outputs because it must exist before truncating and the operation modifies the table
Supported Models
All models available on Groq, OpenAI, and DeepSeek.
For up-to-date pricing details, please check Groq, OpenAI, DeepSeek.
API Keys / Configuration
You'll need to set up API keys for your chosen LLM provider. Create a .env file in your project root (see the provided .env.example):
# LLM API Keys
GROQ_API_KEY=your_groq_api_key
OPENAI_API_KEY=your_openai_api_key
DEEPSEEK_API_KEY=your_deepseek_api_key
# Database credentials (for schema validation)
DB_HOST=localhost
DB_PORT=5432
DB_NAME=mydatabase
DB_USER=username
DB_PASSWORD=password
For custom database YAML configuration file (optional):
# database.yml
database:
host: localhost
port: 5432
database: mydatabase
username: username
password: password
Advanced Usage
Database Schema Validation
from sqldeps.database import PostgreSQLConnector
from sqldeps.llm_parsers import create_extractor
# Extract dependencies
extractor = create_extractor(framework="openai", model="gpt-4o")
result = extractor.extract_from_file('query.sql')
# Connect to database and validate
conn = PostgreSQLConnector(
host="localhost",
port=5432,
database="mydatabase",
username="username"
)
# Match extracted dependencies against database schema
validated_schema = extractor.match_database_schema(
result,
db_connection=conn,
target_schemas=["public", "sales"]
)
# View validation results (pandas DataFrame)
print(validated_schema)
Processing Multiple Files
# Extract dependencies from all SQL files in a folder
result = extractor.extract_from_folder('/path/to/sql_folder', recursive=True)
Using Custom Prompts
You can customize the prompts used to instruct the LLM:
# Create extractor with custom prompt
extractor = create_extractor(
framework="groq",
model="llama-3.3-70b-versatile",
prompt_path="path/to/custom_prompt.yml"
)
The custom prompt YAML should include:
system_prompt: |
Detailed instructions to the model...
user_prompt: |
Extract SQL dependencies and outputs from this query:
{sql}
Web Application
SQLDeps includes a Streamlit-based web interface for interactive exploration of single SQL files:
# Install with web app dependencies
pip install "sqldeps[app]"
# Run the app
streamlit run app/main.py
Note: The web application is currently designed for single-file extraction and demonstration purposes. For processing multiple files or entire folders, use the API or CLI.
Documentation
For comprehensive documentation, including API reference and examples, visit https://sqldeps.readthedocs.io.
Contributing
Contributions are welcome!
- Found a bug? Please open an issue with detailed information.
- Missing a feature? Feel free to suggest enhancements or submit a pull request.
Check out the issues page or submit a pull request.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sqldeps-0.0.10.tar.gz.
File metadata
- Download URL: sqldeps-0.0.10.tar.gz
- Upload date:
- Size: 23.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.5.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
11b16f01b8d6b442eef975fb2d29e5ad4eafe29bc2d413e599f4b2c643547f65
|
|
| MD5 |
f4c6a9507a86eca81cfd2a8e60c803a9
|
|
| BLAKE2b-256 |
46d66f1454bc6ff84e6f9b0fc947df72d16c16b61fbecaef6398c8e26f9375ba
|
File details
Details for the file sqldeps-0.0.10-py3-none-any.whl.
File metadata
- Download URL: sqldeps-0.0.10-py3-none-any.whl
- Upload date:
- Size: 28.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.5.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2b4f40e2aafa320bd842d956df650bd962c0057512a8740284314bfaaca40c82
|
|
| MD5 |
6fcbcd35633783638a5d5bcafa204d91
|
|
| BLAKE2b-256 |
e07ee4a87044be6ddef2703226ac3743f03248be294616702f288472959d4d7e
|