Skip to main content

A comprehensive Databricks SQL handler with Pydantic models, OAuth authentication, and connection management

Project description

Databricks SQL Handler

A comprehensive Python package for interacting with Databricks SQL warehouses using Pydantic models, OAuth authentication, and robust connection management.

Features

  • OAuth Authentication: Secure client credentials flow authentication
  • Pydantic Models: Type-safe data models for query results
  • Connection Management: Robust connection handling with automatic retry logic
  • Query Execution: Execute single or multiple queries with structured results
  • Built-in Models: Pre-defined models for common Databricks operations
  • Extensible: Easy to extend with custom data models
  • CLI Interface: Interactive command-line interface for testing and development

Installation

pip install dbxsql

Quick Start

Basic Usage

from dbxsql import QueryHandler, DatabricksSettings, NexsysRecord

# Configure settings (or use environment variables)
settings = DatabricksSettings(
    client_id="your_client_id",
    client_secret="your_client_secret", 
    server_hostname="your_databricks_hostname",
    http_path="/sql/1.0/warehouses/your_warehouse_id"
)

# Use as context manager
with QueryHandler(settings) as handler:
    # Execute a simple query
    result = handler.execute_query("SELECT * FROM my_table LIMIT 10")
    
    # Execute with Pydantic model parsing
    result = handler.execute_query(
        "SELECT * FROM nexsys_table LIMIT 10", 
        NexsysRecord
    )
    
    # Access structured data
    for record in result.data:
        print(f"ID: {record.id}, Name: {record.name}")

Environment Variables

Create a .env file:

DATABRICKS_CLIENT_ID=your_client_id
DATABRICKS_CLIENT_SECRET=your_client_secret
DATABRICKS_SERVER_HOSTNAME=your_hostname.databricks.com
DATABRICKS_HTTP_PATH=/sql/1.0/warehouses/warehouse_id
DATABRICKS_LOG_LEVEL=INFO

CLI Usage

# Interactive mode
dbxsql --interactive

# Run example queries
dbxsql --examples

# Execute a single query
dbxsql --query "SELECT current_timestamp()"

Custom Models

Create your own Pydantic models:

from pydantic import BaseModel
from datetime import datetime
from typing import Optional

class MyCustomModel(BaseModel):
    id: int
    name: str
    created_at: datetime
    amount: Optional[float] = None

# Register the model
from dbxsql import register_model
register_model("my_custom", MyCustomModel)

# Use it in queries and get a list of MyCustomModels
result = handler.execute_query("SELECT * FROM my_table", "my_custom")

Configuration

All configuration can be done via environment variables with the DATABRICKS_ prefix or programmatically:

Setting Environment Variable Default Description
client_id DATABRICKS_CLIENT_ID Required OAuth client ID
client_secret DATABRICKS_CLIENT_SECRET Required OAuth client secret
server_hostname DATABRICKS_SERVER_HOSTNAME Required Databricks hostname
http_path DATABRICKS_HTTP_PATH Required SQL warehouse HTTP path
log_level DATABRICKS_LOG_LEVEL INFO Logging level
max_retries DATABRICKS_MAX_RETRIES 3 Query retry attempts
query_timeout DATABRICKS_QUERY_TIMEOUT 300 Query timeout (seconds)

API Reference

QueryHandler

Main class for executing queries and managing connections.

Methods

  • execute_query(query, model_class=None): Execute single query
  • execute_multiple_queries(queries, model_classes=None): Execute multiple queries
  • execute_query_with_retry(query, model_class=None): Execute with automatic retry
  • list_files(path): List files in Databricks path
  • show_tables(database=None): Show tables in database
  • test_connection(): Test database connectivity

Built-in Models

  • NexsysRecord: For NEXSYS system data
  • SalesRecord: For sales transaction data
  • FileInfo: For file listing results
  • TableInfo: For table information
  • GenericRecord: For unknown data structures

Error Handling

The package provides specific exceptions:

from dbxsql import (
    AuthenticationError,
    ConnectionError, 
    QueryExecutionError,
    SyntaxError,
    TimeoutError,
    DataParsingError
)

try:
    result = handler.execute_query("SELECT * FROM table")
except AuthenticationError:
    print("Authentication failed")
except QueryExecutionError as e:
    print(f"Query failed: {e}")

Troubleshooting

CERTIFICATE_VERIFY_FAILED

Sometimes, depending on an operational system the next error message could be returned:

dbxsql.exceptions.ConnectionError: Failed to connect to Databricks: Error during request to server: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (_ssl.c:1007)

This error can be fixed by setting an environment variable SSL_CERT_FILE. This can be done in a terminal:

  export SSL_CERT_FILE=/path/to/databricks.pem

or directly in the code:

import os
import certifi

from dbxsql import QueryHandler
from dbxsql.settings import settings
from dbxsql.main import ApplicationRunner

os.environ["SSL_CERT_FILE"]  = certifi.where()


with QueryHandler(settings) as handler:
    app_runner = ApplicationRunner(handler)
    app_runner.run_example_queries()

License

MIT License

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dbxsql-1.0.2.tar.gz (15.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dbxsql-1.0.2-py3-none-any.whl (19.8 kB view details)

Uploaded Python 3

File details

Details for the file dbxsql-1.0.2.tar.gz.

File metadata

  • Download URL: dbxsql-1.0.2.tar.gz
  • Upload date:
  • Size: 15.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.13

File hashes

Hashes for dbxsql-1.0.2.tar.gz
Algorithm Hash digest
SHA256 99e314e4199ecaed34f435c8dc06b8439ea125e393adbb76f24bd333403b8e61
MD5 a61984a3493c27b1c36526cb437565ee
BLAKE2b-256 46a7ff315f74a3a2fde82da8f6d991a0a689af4d00f0554ff3a5549e883660fd

See more details on using hashes here.

File details

Details for the file dbxsql-1.0.2-py3-none-any.whl.

File metadata

  • Download URL: dbxsql-1.0.2-py3-none-any.whl
  • Upload date:
  • Size: 19.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.13

File hashes

Hashes for dbxsql-1.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 13476044c6dbc628cfa46189604bf19c56265a68832b1154348001c46545229c
MD5 790d39f0138b93b98b9e2a8cd590cb32
BLAKE2b-256 88b6da1f7e97ff577a182dceb8c41f7db9acb23b2e711b4ea300ca3dc30bad75

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page