Skip to main content

Enterprise-grade Jupyter extension for secure SQL query execution through internal services

Project description

SQL Extension for Jupyter

A powerful Jupyter extension designed to work seamlessly with the SyneHQ.com data platform, providing secure access to your connected data sources without the need for managing database credentials.

Release to PyPI

About SyneHQ Integration

This extension is specifically built for the SyneHQ.com data platform, which provides:

  • Zero-Credential Data Access: Connect to your data sources without exposing database credentials
  • Unified Data Platform: Access all your connected data sources through a single, secure interface
  • Enterprise-Grade Security: Built-in authentication, authorization, and audit logging
  • Multi-Platform Support: Works with your favorite data analysis platforms including Jupyter, R, and more

Key Features

🔐 Secure Connection Management

  • Credential-Free Access: Retrieve database connections securely through SyneHQ's internal services
  • Enterprise Authentication: Built-in support for SSO, OAuth, and enterprise identity providers
  • Connection Pooling: Efficient connection management with automatic retry and failover

🛡️ Security & Validation

  • SQL Injection Prevention: Advanced input validation and query sanitization
  • Query Safety Checks: Automatic detection of potentially harmful operations
  • Audit Logging: Complete query execution tracking for compliance and monitoring

📊 Rich Output Formatting

  • Pandas DataFrames: Native support for DataFrame output with automatic type inference
  • Interactive Tables: HTML tables with sorting, filtering, and pagination
  • JSON Export: Structured data output for API integrations
  • Custom Visualization: Support for charts and graphs integration

🔄 Advanced Query Features

  • Variable Assignment: Assign query results to Python variables using intuitive syntax
  • Python Variable Substitution: Use Python variables, expressions, and function calls directly in SQL queries
  • Type-Safe Formatting: Automatic type detection and SQL-safe formatting for all Python data types
  • Expression Evaluation: Evaluate complex Python expressions safely within SQL queries
  • Async Execution: Non-blocking query execution for better performance
  • Query Caching: Intelligent caching to reduce redundant database calls

📈 Performance & Monitoring

  • Execution Metrics: Detailed performance tracking and query optimization insights
  • Connection Health: Real-time monitoring of database connection status
  • Error Recovery: Automatic retry mechanisms with exponential backoff

Installation

Prerequisites

  • Python 3.8 or higher
  • Jupyter Notebook or JupyterLab
  • Access to SyneHQ.com data platform

Install via pip

pip install syne-sql-extension

Install from source

git clone https://github.com/synehq/jupyter-sql-extension.git
cd jupyter-sql-extension
pip install -e .

Load the extension in Jupyter

%load_ext syne_sql_extension

Quick Start

1. Connect to SyneHQ

%%sqlconnect --connection-id my_database
SELECT * FROM users LIMIT 10

2. Use with variables

# Assign results to a variable
%%sqlconnect --connection-id analytics_db --output users_df
SELECT user_id, name, email, created_at 
FROM users 
WHERE created_at >= '2024-01-01'

3. Parameterized queries

user_limit = 100
department = 'engineering'

%%sqlconnect --connection-id hr_db
SELECT * FROM employees 
WHERE department = {department} 
LIMIT {user_limit}

4. Different output formats

# DataFrame output (default)
%%sqlconnect --connection-id sales_db --format dataframe
SELECT product, SUM(revenue) as total_revenue 
FROM sales 
GROUP BY product

# HTML table
%%sqlconnect --connection-id sales_db --format html
SELECT * FROM products WHERE price > 100

# JSON output
%%sqlconnect --connection-id api_db --format json
SELECT config FROM settings WHERE active = true

Usage Examples

Basic Queries

# Simple select
%%sqlconnect --connection-id main_db
SELECT COUNT(*) as total_users FROM users

# Join multiple tables
%%sqlconnect --connection-id warehouse
SELECT 
    u.name,
    p.product_name,
    o.order_date,
    o.amount
FROM users u
JOIN orders o ON u.id = o.user_id
JOIN products p ON o.product_id = p.id
WHERE o.order_date >= '2024-01-01'

Data Analysis Workflow

# Load data into DataFrame
%%sqlconnect --connection-id analytics
sales_data >> SELECT 
    DATE(order_date) as date,
    product_category,
    SUM(amount) as daily_revenue,
    COUNT(*) as order_count
FROM orders 
WHERE order_date >= '2024-01-01'
GROUP BY DATE(order_date), product_category
ORDER BY date DESC

# Analyze the data
print(f"Total revenue: ${sales_data['daily_revenue'].sum():,.2f}")
print(f"Average daily orders: {sales_data['order_count'].mean():.1f}")

# Create visualization
sales_data.groupby('product_category')['daily_revenue'].sum().plot(kind='bar')

Advanced Features

# Using Python variables in queries with enhanced syntax
start_date = '2024-01-01'
end_date = '2024-12-31'
min_revenue = 1000
user_ids = [1, 2, 3, 4, 5]

# Simple variable substitution
%%sqlconnect --connection-id finance
SELECT 
    customer_id,
    SUM(amount) as total_spent
FROM transactions 
WHERE transaction_date BETWEEN {start_date} AND {end_date}
GROUP BY customer_id
HAVING SUM(amount) >= {min_revenue}
ORDER BY total_spent DESC

# List variables with automatic formatting
%%sqlconnect --connection-id analytics
SELECT * FROM users WHERE id IN {user_ids}

# Type-specific formatting
%%sqlconnect --connection-id analytics
SELECT * FROM users WHERE id IN {user_ids:list}

# Expression evaluation
%%sqlconnect --connection-id finance
SELECT * FROM products WHERE price = {min_revenue * 1.5}

# Complex expressions with functions
from datetime import datetime, timedelta
%%sqlconnect --connection-id analytics
SELECT * FROM users WHERE created_at >= {datetime.now() - timedelta(days=30)}

Python Variable Support

The extension provides comprehensive Python variable substitution in SQL queries with multiple syntax options and safety features.

Syntax Options

1. Simple Variable Substitution

user_id = 123
user_name = "John Doe"

%%sqlconnect my_connection --api-key my_key
SELECT * FROM users WHERE id = {user_id}
SELECT * FROM users WHERE name = {user_name}

2. Type-Specific Formatting

user_ids = [1, 2, 3, 4, 5]
price = 99.99
created_date = datetime(2024, 1, 1)

%%sqlconnect my_connection --api-key my_key
SELECT * FROM users WHERE id IN {user_ids:list}
SELECT * FROM products WHERE price = {price:number}
SELECT * FROM users WHERE created_at >= {created_date:date}

3. Expression Evaluation

base_price = 100
discount_rate = 0.1
tax_rate = 0.08

%%sqlconnect my_connection --api-key my_key
SELECT * FROM products WHERE final_price = {base_price * (1 - discount_rate) * (1 + tax_rate)}

4. Function Calls and Complex Expressions

from datetime import datetime, timedelta

%%sqlconnect my_connection --api-key my_key
SELECT * FROM users WHERE created_at >= {datetime.now() - timedelta(days=30)}
SELECT * FROM products WHERE rounded_price = {round(99.99 * 1.15, 2)}

Supported Data Types

  • Strings: Automatically quoted and escaped
  • Numbers: Used as-is without quotes
  • Lists/Tuples: Formatted as SQL IN clauses
  • Booleans: Converted to strings
  • None: Converted to SQL NULL
  • Datetime objects: Formatted as ISO strings

Security Features

  • Safe Expression Evaluation: Only safe built-in functions are allowed
  • Pattern Blocking: Dangerous patterns like import, exec, eval are blocked
  • Function Blacklisting: Dangerous functions like os, sys, subprocess are blocked
  • Sandboxed Environment: Expressions run in a restricted environment

Complete Example

# Set up variables
start_date = datetime(2024, 1, 1)
end_date = datetime(2024, 12, 31)
min_age = 18
max_age = 65
active_statuses = ['active', 'premium']
excluded_users = [999, 1000, 1001]

# Complex query with multiple variable types
%%sqlconnect analytics_db --api-key my_key
SELECT 
    u.id,
    u.name,
    u.email,
    u.age,
    u.status,
    COUNT(o.id) as order_count,
    SUM(o.total) as total_spent
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE u.created_at BETWEEN {start_date:date} AND {end_date:date}
  AND u.age BETWEEN {min_age} AND {max_age}
  AND u.status IN {active_statuses:list}
  AND u.id NOT IN {excluded_users:list}
GROUP BY u.id, u.name, u.email, u.age, u.status
HAVING COUNT(o.id) > 0
ORDER BY total_spent DESC
LIMIT 100

For detailed documentation on Python variable support, see PYTHON_VARIABLE_SUPPORT.md.

Connection Management

Available Connections

# List available connections
%%sqlconnect --list-connections --api-key {key}

Test Connection

# Test if connection is working
%%sqlconnect {connection_id}
SELECT 1

Error Handling

The extension provides comprehensive error handling with user-friendly messages:

%%sqlconnect --connection-id invalid_db
SELECT * FROM nonexistent_table

Common error scenarios:

  • Connection Errors: Invalid connection ID, network issues, authentication failures
  • Query Errors: SQL syntax errors, table not found, permission denied
  • Validation Errors: SQL injection attempts, unsafe operations
  • Timeout Errors: Long-running queries, connection timeouts

Security Features

SQL Injection Prevention

# ❌ This will be blocked
user_input = "'; DROP TABLE users; --"
%%sqlconnect --connection-id db
SELECT * FROM users WHERE name = '{user_input}'

# ✅ Use parameter binding instead
user_input = "John Doe"
%%sqlconnect --connection-id db
SELECT * FROM users WHERE name = {user_input}

Query Validation

The extension automatically validates queries for:

  • Potentially dangerous operations (DROP, DELETE, etc.)
  • SQL injection patterns
  • Syntax errors
  • Resource usage limits

Performance Optimization

Query Caching

# Enable caching for repeated queries
%%sqlconnect --connection-id db --cache
SELECT expensive_aggregation() FROM large_table

Async Execution

# Run multiple queries concurrently
import asyncio

async def run_queries():
    tasks = []
    for db in ['db1', 'db2', 'db3']:
        task = execute_query(f"%%sqlconnect --connection-id {db}\nSELECT COUNT(*) FROM table")
        tasks.append(task)
    
    results = await asyncio.gather(*tasks)
    return results

Troubleshooting

Common Issues

Extension not loading:

# Check if extension is properly installed
%load_ext syne_sql_extension

Connection failures:

  • Verify your SyneHQ API credentials
  • Check network connectivity to SyneHQ services
  • Ensure your workspace has access to the requested data sources

Query errors:

  • Validate SQL syntax
  • Check table and column names
  • Verify permissions for the data source

Debug Mode

# Enable debug logging
%%sqlconnect --connection-id db --debug
SELECT * FROM users

Getting Help

API Reference

Magic Command Options

Option Description Default
--connection-id SyneHQ connection identifier Required
--output Variable name for query results None
--format Output format (dataframe, html, json) dataframe
--timeout Query timeout in seconds 30
--cache Enable query caching false
--debug Enable debug logging false
--test Test connection without executing query false

Configuration Options

Setting Description Default
api_url SyneHQ API endpoint https://api.synehq.com
timeout Default query timeout 30
retry_attempts Number of retry attempts 3
cache_enabled Enable query caching true
cache_ttl Cache time-to-live (seconds) 300
output_format Default output format dataframe

Contributing

We welcome contributions! Please see our Contributing Guide for details.

Development Setup

git clone https://github.com/synehq/jupyter-sql-extension.git
cd jupyter-sql-extension
pip install -e ".[dev]"
pre-commit install

Running Tests

pytest tests/

License

This project is licensed under the MIT License - see the LICENSE file for details.

Support


Made with ❤️ by the SyneHQ team

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

syne_sql_extension-1.0.5.tar.gz (50.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

syne_sql_extension-1.0.5-py3-none-any.whl (192.1 kB view details)

Uploaded Python 3

File details

Details for the file syne_sql_extension-1.0.5.tar.gz.

File metadata

  • Download URL: syne_sql_extension-1.0.5.tar.gz
  • Upload date:
  • Size: 50.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.23

File hashes

Hashes for syne_sql_extension-1.0.5.tar.gz
Algorithm Hash digest
SHA256 5219ba83215dc422d6ad6127e38de6a447c35709535833e6ac33026a03eb2b88
MD5 25172db7dfe8bf327c6c10be02bd863c
BLAKE2b-256 b95b1e4f01b7dc6fe79cdb65b610c54a80df3a0981191092991496b783925c2c

See more details on using hashes here.

File details

Details for the file syne_sql_extension-1.0.5-py3-none-any.whl.

File metadata

File hashes

Hashes for syne_sql_extension-1.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 f4b5f50e8451718d9712e9d59ad64777992937042a5679019b96fb7eb1c6c723
MD5 5d031d5f1fa5cd701d9c58655b7c3baf
BLAKE2b-256 6a5717f7cbaed94732a9c673d972f42030cb6a91406b583ff7b35a04101237e0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page