Skip to main content

A comprehensive Databricks SQL handler with Pydantic models, OAuth authentication, and connection management

Project description

Databricks SQL Handler

A comprehensive Python package for interacting with Databricks SQL warehouses using Pydantic models, OAuth authentication, and robust connection management.

Features

  • OAuth Authentication: Secure client credentials flow authentication
  • Pydantic Models: Type-safe data models for query results
  • Connection Management: Robust connection handling with automatic retry logic
  • Query Execution: Execute single or multiple queries with structured results
  • Built-in Models: Pre-defined models for common Databricks operations
  • Extensible: Easy to extend with custom data models
  • CLI Interface: Interactive command-line interface for testing and development

Installation

pip install dbxsql

Quick Start

Basic Usage

from dbxsql import QueryHandler, DatabricksSettings, NexsysRecord

# Configure settings (or use environment variables)
settings = DatabricksSettings(
    client_id="your_client_id",
    client_secret="your_client_secret", 
    server_hostname="your_databricks_hostname",
    http_path="/sql/1.0/warehouses/your_warehouse_id"
)

# Use as context manager
with QueryHandler(settings) as handler:
    # Execute a simple query
    result = handler.execute_query("SELECT * FROM my_table LIMIT 10")
    
    # Execute with Pydantic model parsing
    result = handler.execute_query(
        "SELECT * FROM nexsys_table LIMIT 10", 
        NexsysRecord
    )
    
    # Access structured data
    for record in result.data:
        print(f"ID: {record.id}, Name: {record.name}")

Environment Variables

Create a .env file:

DATABRICKS_CLIENT_ID=your_client_id
DATABRICKS_CLIENT_SECRET=your_client_secret
DATABRICKS_SERVER_HOSTNAME=your_hostname.databricks.com
DATABRICKS_HTTP_PATH=/sql/1.0/warehouses/warehouse_id
DATABRICKS_LOG_LEVEL=INFO

CLI Usage

# Interactive mode
dbxsql --interactive

# Run example queries
dbxsql --examples

# Execute a single query
dbxsql --query "SELECT current_timestamp()"

Custom Models

Create your own Pydantic models:

from pydantic import BaseModel
from datetime import datetime
from typing import Optional

class MyCustomModel(BaseModel):
    id: int
    name: str
    created_at: datetime
    amount: Optional[float] = None

# Register the model
from dbxsql import register_model
register_model("my_custom", MyCustomModel)

# Use it in queries and get a list of MyCustomModels
result = handler.execute_query("SELECT * FROM my_table", "my_custom")

Configuration

All configuration can be done via environment variables with the DATABRICKS_ prefix or programmatically:

Setting Environment Variable Default Description
client_id DATABRICKS_CLIENT_ID Required OAuth client ID
client_secret DATABRICKS_CLIENT_SECRET Required OAuth client secret
server_hostname DATABRICKS_SERVER_HOSTNAME Required Databricks hostname
http_path DATABRICKS_HTTP_PATH Required SQL warehouse HTTP path
log_level DATABRICKS_LOG_LEVEL INFO Logging level
max_retries DATABRICKS_MAX_RETRIES 3 Query retry attempts
query_timeout DATABRICKS_QUERY_TIMEOUT 300 Query timeout (seconds)

API Reference

QueryHandler

Main class for executing queries and managing connections.

Methods

  • execute_query(query, model_class=None): Execute single query
  • execute_multiple_queries(queries, model_classes=None): Execute multiple queries
  • execute_query_with_retry(query, model_class=None): Execute with automatic retry
  • list_files(path): List files in Databricks path
  • show_tables(database=None): Show tables in database
  • test_connection(): Test database connectivity

Built-in Models

  • NexsysRecord: For NEXSYS system data
  • SalesRecord: For sales transaction data
  • FileInfo: For file listing results
  • TableInfo: For table information
  • GenericRecord: For unknown data structures

Error Handling

The package provides specific exceptions:

from dbxsql import (
    AuthenticationError,
    ConnectionError, 
    QueryExecutionError,
    SyntaxError,
    TimeoutError,
    DataParsingError
)

try:
    result = handler.execute_query("SELECT * FROM table")
except AuthenticationError:
    print("Authentication failed")
except QueryExecutionError as e:
    print(f"Query failed: {e}")

Troubleshooting

CERTIFICATE_VERIFY_FAILED

Sometimes, depending on an operational system the next error message could be returned:

dbxsql.exceptions.ConnectionError: Failed to connect to Databricks: Error during request to server: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (_ssl.c:1007)

This error can be fixed by setting an environment variable SSL_CERT_FILE. This can be done in a terminal:

  export SSL_CERT_FILE=/path/to/databricks.pem

or directly in the code:

import os
import certifi

from dbxsql import QueryHandler
from dbxsql.settings import settings
from dbxsql.main import ApplicationRunner

os.environ["SSL_CERT_FILE"]  = certifi.where()


with QueryHandler(settings) as handler:
    app_runner = ApplicationRunner(handler)
    app_runner.run_example_queries()

License

MIT License

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dbxsql-1.0.3.tar.gz (15.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dbxsql-1.0.3-py3-none-any.whl (20.0 kB view details)

Uploaded Python 3

File details

Details for the file dbxsql-1.0.3.tar.gz.

File metadata

  • Download URL: dbxsql-1.0.3.tar.gz
  • Upload date:
  • Size: 15.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.13

File hashes

Hashes for dbxsql-1.0.3.tar.gz
Algorithm Hash digest
SHA256 1f2db10de797b5657ea129ea3f99d3d68e8fd28de40a185d9e4d11850a45a550
MD5 534790cea6f0c053f46128f4556d7099
BLAKE2b-256 50f4884af5b2b8de027f27ebaec2cbf5c66d50713ec056f4a64aa172bf6d8f0d

See more details on using hashes here.

File details

Details for the file dbxsql-1.0.3-py3-none-any.whl.

File metadata

  • Download URL: dbxsql-1.0.3-py3-none-any.whl
  • Upload date:
  • Size: 20.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.13

File hashes

Hashes for dbxsql-1.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 e158f535251349dddb37ec769b1e3a9956ff9dcce1e492fb1a882963c69a63f4
MD5 aa510bc00661e127eacda78a61510174
BLAKE2b-256 c5f1720ca84fd3a639fdc266bb4ebcc11b580eafe83e294b9a104fa2cb59e25a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page