Enterprise-grade Jupyter extension for secure SQL query execution through internal services
Project description
SQL Extension for Jupyter
A powerful Jupyter extension designed to work seamlessly with the SyneHQ.com data platform, providing secure access to your connected data sources without the need for managing database credentials.
About SyneHQ Integration
This extension is specifically built for the SyneHQ.com data platform, which provides:
- Zero-Credential Data Access: Connect to your data sources without exposing database credentials
- Unified Data Platform: Access all your connected data sources through a single, secure interface
- Enterprise-Grade Security: Built-in authentication, authorization, and audit logging
- Multi-Platform Support: Works with your favorite data analysis platforms including Jupyter, R, and more
Key Features
🔐 Secure Connection Management
- Credential-Free Access: Retrieve database connections securely through SyneHQ's internal services
- Enterprise Authentication: Built-in support for SSO, OAuth, and enterprise identity providers
- Connection Pooling: Efficient connection management with automatic retry and failover
🛡️ Security & Validation
- SQL Injection Prevention: Advanced input validation and query sanitization
- Query Safety Checks: Automatic detection of potentially harmful operations
- Audit Logging: Complete query execution tracking for compliance and monitoring
📊 Rich Output Formatting
- Pandas DataFrames: Native support for DataFrame output with automatic type inference
- Interactive Tables: HTML tables with sorting, filtering, and pagination
- JSON Export: Structured data output for API integrations
- Custom Visualization: Support for charts and graphs integration
🔄 Advanced Query Features
- Variable Assignment: Assign query results to Python variables using intuitive syntax
- Python Variable Substitution: Use Python variables, expressions, and function calls directly in SQL queries
- Type-Safe Formatting: Automatic type detection and SQL-safe formatting for all Python data types
- Expression Evaluation: Evaluate complex Python expressions safely within SQL queries
- Async Execution: Non-blocking query execution for better performance
- Query Caching: Intelligent caching to reduce redundant database calls
📈 Performance & Monitoring
- Execution Metrics: Detailed performance tracking and query optimization insights
- Connection Health: Real-time monitoring of database connection status
- Error Recovery: Automatic retry mechanisms with exponential backoff
Installation
Prerequisites
- Python 3.8 or higher
- Jupyter Notebook or JupyterLab
- Access to SyneHQ.com data platform
Install via pip
pip install syne-sql-extension
Install from source
git clone https://github.com/synehq/jupyter-sql-extension.git
cd jupyter-sql-extension
pip install -e .
Load the extension in Jupyter
%load_ext syne_sql_extension
Quick Start
1. Set up Authentication
# Option 1: Set global variable (recommended for Jupyter notebooks)
SYNE_OAUTH_KEY = 'your_api_key_here'
# Option 2: Use environment variable
# export SYNE_OAUTH_KEY='your_api_key_here'
# Option 3: Provide via command line (most explicit)
%%sql my_database --api-key your_api_key_here
SELECT * FROM users LIMIT 10
2. Connect to SyneHQ
%%sql my_database
SELECT * FROM users LIMIT 10
3. Use with variables
# Assign results to a variable
%%sql analytics_db --output users_df
SELECT user_id, name, email, created_at
FROM users
WHERE created_at >= '2024-01-01'
4. Parameterized queries
user_limit = 100
department = 'engineering'
%%sql hr_db
SELECT * FROM employees
WHERE department = {department}
LIMIT {user_limit}
5. Different output formats
# DataFrame output (default)
%%sql sales_db --format dataframe
SELECT product, SUM(revenue) as total_revenue
FROM sales
GROUP BY product
# HTML table
%%sql sales_db --format html
SELECT * FROM products WHERE price > 100
# JSON output
%%sql api_db --format json
SELECT config FROM settings WHERE active = true
Authentication
The extension supports multiple ways to provide your SyneHQ API key for authentication. The API key is resolved in the following order of preference:
1. Command Line (Most Explicit)
%%sql my_db --api-key your_api_key_here
SELECT * FROM users LIMIT 10
2. Global Variable (Recommended for Jupyter)
# Set once at the beginning of your notebook
SYNE_OAUTH_KEY = 'your_api_key_here'
# Then use without specifying the key
%%sql my_db
SELECT * FROM users LIMIT 10
3. Environment Variable
# Set in your shell environment
export SYNE_OAUTH_KEY='your_api_key_here'
# Or in your Jupyter environment
import os
os.environ['SYNE_OAUTH_KEY'] = 'your_api_key_here'
Security Best Practices
- Never hardcode API keys in your notebooks or commit them to version control
- Use environment variables for production deployments
- Use global variables for interactive development in Jupyter
- Rotate API keys regularly for enhanced security
Getting Your API Key
- Log in to your SyneHQ account
- Navigate to Teams > Choose the team >API Keys
- Generate a new API key with appropriate permissions
- Copy the key and use one of the authentication methods above
Usage Examples
Basic Queries
# Simple select
%%sql main_db
SELECT COUNT(*) as total_users FROM users
# Join multiple tables
%%sql warehouse
SELECT
u.name,
p.product_name,
o.order_date,
o.amount
FROM users u
JOIN orders o ON u.id = o.user_id
JOIN products p ON o.product_id = p.id
WHERE o.order_date >= '2024-01-01'
Data Analysis Workflow
# Load data into DataFrame
%%sql analytics
sales_data >> SELECT
DATE(order_date) as date,
product_category,
SUM(amount) as daily_revenue,
COUNT(*) as order_count
FROM orders
WHERE order_date >= '2024-01-01'
GROUP BY DATE(order_date), product_category
ORDER BY date DESC
# Analyze the data
print(f"Total revenue: ${sales_data['daily_revenue'].sum():,.2f}")
print(f"Average daily orders: {sales_data['order_count'].mean():.1f}")
# Create visualization
sales_data.groupby('product_category')['daily_revenue'].sum().plot(kind='bar')
Advanced Features
# Using Python variables in queries with enhanced syntax
start_date = '2024-01-01'
end_date = '2024-12-31'
min_revenue = 1000
user_ids = [1, 2, 3, 4, 5]
# Simple variable substitution
%%sql finance
SELECT
customer_id,
SUM(amount) as total_spent
FROM transactions
WHERE transaction_date BETWEEN {start_date} AND {end_date}
GROUP BY customer_id
HAVING SUM(amount) >= {min_revenue}
ORDER BY total_spent DESC
# List variables with automatic formatting
%%sql analytics
SELECT * FROM users WHERE id IN {user_ids}
# Type-specific formatting
%%sql analytics
SELECT * FROM users WHERE id IN {user_ids:list}
# Expression evaluation
%%sql finance
SELECT * FROM products WHERE price = {min_revenue * 1.5}
# Complex expressions with functions
from datetime import datetime, timedelta
%%sql analytics
SELECT * FROM users WHERE created_at >= {datetime.now() - timedelta(days=30)}
Python Variable Support
The extension provides comprehensive Python variable substitution in SQL queries with multiple syntax options and safety features.
Syntax Options
1. Simple Variable Substitution
user_id = 123
user_name = "John Doe"
%%sql my_connection -k my_key
SELECT * FROM users WHERE id = {user_id}
SELECT * FROM users WHERE name = {user_name}
2. Type-Specific Formatting
user_ids = [1, 2, 3, 4, 5]
price = 99.99
created_date = datetime(2024, 1, 1)
%%sql my_connection -k my_key
SELECT * FROM users WHERE id IN {user_ids:list}
SELECT * FROM products WHERE price = {price:number}
SELECT * FROM users WHERE created_at >= {created_date:date}
3. Expression Evaluation
base_price = 100
discount_rate = 0.1
tax_rate = 0.08
%%sql my_connection -k my_key
SELECT * FROM products WHERE final_price = {base_price * (1 - discount_rate) * (1 + tax_rate)}
4. Function Calls and Complex Expressions
from datetime import datetime, timedelta
%%sql my_connection -k my_key
SELECT * FROM users WHERE created_at >= {datetime.now() - timedelta(days=30)}
SELECT * FROM products WHERE rounded_price = {round(99.99 * 1.15, 2)}
Supported Data Types
- Strings: Automatically quoted and escaped
- Numbers: Used as-is without quotes
- Lists/Tuples: Formatted as SQL IN clauses
- Booleans: Converted to strings
- None: Converted to SQL NULL
- Datetime objects: Formatted as ISO strings
Security Features
- Safe Expression Evaluation: Only safe built-in functions are allowed
- Pattern Blocking: Dangerous patterns like
import,exec,evalare blocked - Function Blacklisting: Dangerous functions like
os,sys,subprocessare blocked - Sandboxed Environment: Expressions run in a restricted environment
Complete Example
# Set up variables
start_date = datetime(2024, 1, 1)
end_date = datetime(2024, 12, 31)
min_age = 18
max_age = 65
active_statuses = ['active', 'premium']
excluded_users = [999, 1000, 1001]
# Complex query with multiple variable types
%%sql analytics_db -k my_key
SELECT
u.id,
u.name,
u.email,
u.age,
u.status,
COUNT(o.id) as order_count,
SUM(o.total) as total_spent
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE u.created_at BETWEEN {start_date:date} AND {end_date:date}
AND u.age BETWEEN {min_age} AND {max_age}
AND u.status IN {active_statuses:list}
AND u.id NOT IN {excluded_users:list}
GROUP BY u.id, u.name, u.email, u.age, u.status
HAVING COUNT(o.id) > 0
ORDER BY total_spent DESC
LIMIT 100
For detailed documentation on Python variable support, see PYTHON_VARIABLE_SUPPORT.md.
Connection Management
Available Connections
# List available connections
%%sql --list-connections -k {key}
Test Connection
# Test if connection is working
%%sql {connection_id}
SELECT 1
Error Handling
The extension provides comprehensive error handling with user-friendly messages:
%%sql invalid_db
SELECT * FROM nonexistent_table
Common error scenarios:
- Connection Errors: Invalid connection ID, network issues, authentication failures
- Query Errors: SQL syntax errors, table not found, permission denied
- Validation Errors: SQL injection attempts, unsafe operations
- Timeout Errors: Long-running queries, connection timeouts
Security Features
SQL Injection Prevention
# ❌ This will be blocked
user_input = "'; DROP TABLE users; --"
%%sql db
SELECT * FROM users WHERE name = '{user_input}'
# ✅ Use parameter binding instead
user_input = "John Doe"
%%sql db
SELECT * FROM users WHERE name = {user_input}
Query Validation
The extension automatically validates queries for:
- Potentially dangerous operations (DROP, DELETE, etc.)
- SQL injection patterns
- Syntax errors
- Resource usage limits
Performance Optimization
Query Caching
# Enable caching for repeated queries
%%sql db --cache
SELECT expensive_aggregation() FROM large_table
Async Execution
# Run multiple queries concurrently
import asyncio
async def run_queries():
tasks = []
for db in ['db1', 'db2', 'db3']:
task = execute_query(f"%%sql {db}\nSELECT COUNT(*) FROM table")
tasks.append(task)
results = await asyncio.gather(*tasks)
return results
Troubleshooting
Common Issues
Extension not loading:
# Check if extension is properly installed
%load_ext syne_sql_extension
Connection failures:
- Verify your SyneHQ API credentials
- Check network connectivity to SyneHQ services
- Ensure your workspace has access to the requested data sources
Query errors:
- Validate SQL syntax
- Check table and column names
- Verify permissions for the data source
Debug Mode
# Enable debug logging
%%sql db --debug
SELECT * FROM users
Getting Help
- Check the SyneHQ Documentation
- Visit our GitHub Issues
- Contact support at support@synehq.com
API Reference
Magic Command Options
| Option | Description | Default |
|---|---|---|
--connection-id |
SyneHQ connection identifier | Required |
--api-key |
SyneHQ API key for authentication | Auto-detected |
--output |
Variable name for query results | None |
--format |
Output format (dataframe, html, json) | dataframe |
--timeout |
Query timeout in seconds | 30 |
--cache |
Enable query caching | false |
--debug |
Enable debug logging | false |
--test |
Test connection without executing query | false |
Configuration Options
| Setting | Description | Default |
|---|---|---|
api_url |
SyneHQ API endpoint | https://api.synehq.com |
timeout |
Default query timeout | 30 |
retry_attempts |
Number of retry attempts | 3 |
cache_enabled |
Enable query caching | true |
cache_ttl |
Cache time-to-live (seconds) | 300 |
output_format |
Default output format | dataframe |
Contributing
We welcome contributions! Please see our Contributing Guide for details.
Development Setup
git clone https://github.com/synehq/jupyter-sql-extension.git
cd jupyter-sql-extension
pip install -e ".[dev]"
pre-commit install
Running Tests
pytest tests/
License
This project is licensed under the MIT License - see the LICENSE file for details.
Support
- Documentation: docs.synehq.com
- Issues: GitHub Issues
- Email: support@synehq.com
- SyneHQ Platform: synehq.com
Made with ❤️ by the SyneHQ team
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file syne_sql_extension-1.0.11.tar.gz.
File metadata
- Download URL: syne_sql_extension-1.0.11.tar.gz
- Upload date:
- Size: 54.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.27 {"installer":{"name":"uv","version":"0.9.27","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c08eb8fc6259e4da6343ba92ab0733d82de5ea825feb0bf57eae8cb82a37bb85
|
|
| MD5 |
c39bf715b86bbd10b4f44cad120d8c2a
|
|
| BLAKE2b-256 |
1d0c3cdeb80dbcd262ee802a0d7472bb7976c49079b513b7b6de7f3e79bfa78b
|
File details
Details for the file syne_sql_extension-1.0.11-py3-none-any.whl.
File metadata
- Download URL: syne_sql_extension-1.0.11-py3-none-any.whl
- Upload date:
- Size: 201.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.27 {"installer":{"name":"uv","version":"0.9.27","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
be27b694e1c1efa32eec20c1feaa15a09cc4701f1abba832546c7ca8fdd879bf
|
|
| MD5 |
1db4100cfc0961c9e0050f9d22f19a54
|
|
| BLAKE2b-256 |
399185ffead59af1a25758516c118391e1901400066e011ba58a2adfcc6cf8ae
|