Skip to main content

A comprehensive Databricks SQL handler with Pydantic models, OAuth authentication, and connection management

Project description

Databricks SQL Handler

A comprehensive Python package for interacting with Databricks SQL warehouses using Pydantic models, OAuth authentication, and robust connection management.

Features

  • OAuth Authentication: Secure client credentials flow authentication
  • Pydantic Models: Type-safe data models for query results
  • Connection Management: Robust connection handling with automatic retry logic
  • Query Execution: Execute single or multiple queries with structured results
  • Built-in Models: Pre-defined models for common Databricks operations
  • Extensible: Easy to extend with custom data models
  • CLI Interface: Interactive command-line interface for testing and development

Installation

pip install dbxsql

Quick Start

Basic Usage

from dbxsql import QueryHandler, DatabricksSettings, NexsysRecord

# Configure settings (or use environment variables)
settings = DatabricksSettings(
    client_id="your_client_id",
    client_secret="your_client_secret", 
    server_hostname="your_databricks_hostname",
    http_path="/sql/1.0/warehouses/your_warehouse_id"
)

# Use as context manager
with QueryHandler(settings) as handler:
    # Execute a simple query
    result = handler.execute_query("SELECT * FROM my_table LIMIT 10")
    
    # Execute with Pydantic model parsing
    result = handler.execute_query(
        "SELECT * FROM nexsys_table LIMIT 10", 
        NexsysRecord
    )
    
    # Access structured data
    for record in result.data:
        print(f"ID: {record.id}, Name: {record.name}")

Environment Variables

Create a .env file:

DATABRICKS_CLIENT_ID=your_client_id
DATABRICKS_CLIENT_SECRET=your_client_secret
DATABRICKS_SERVER_HOSTNAME=your_hostname.databricks.com
DATABRICKS_HTTP_PATH=/sql/1.0/warehouses/warehouse_id
DATABRICKS_LOG_LEVEL=INFO

CLI Usage

# Interactive mode
dbxsql --interactive

# Run example queries
dbxsql --examples

# Execute a single query
dbxsql --query "SELECT current_timestamp()"

Custom Models

Create your own Pydantic models:

from pydantic import BaseModel
from datetime import datetime
from typing import Optional

class MyCustomModel(BaseModel):
    id: int
    name: str
    created_at: datetime
    amount: Optional[float] = None

# Register the model
from dbxsql import register_model
register_model("my_custom", MyCustomModel)

# Use it in queries and get a list of MyCustomModels
result = handler.execute_query("SELECT * FROM my_table", "my_custom")

Configuration

All configuration can be done via environment variables with the DATABRICKS_ prefix or programmatically:

Setting Environment Variable Default Description
client_id DATABRICKS_CLIENT_ID Required OAuth client ID
client_secret DATABRICKS_CLIENT_SECRET Required OAuth client secret
server_hostname DATABRICKS_SERVER_HOSTNAME Required Databricks hostname
http_path DATABRICKS_HTTP_PATH Required SQL warehouse HTTP path
log_level DATABRICKS_LOG_LEVEL INFO Logging level
max_retries DATABRICKS_MAX_RETRIES 3 Query retry attempts
query_timeout DATABRICKS_QUERY_TIMEOUT 300 Query timeout (seconds)

API Reference

QueryHandler

Main class for executing queries and managing connections.

Methods

  • execute_query(query, model_class=None): Execute single query
  • execute_multiple_queries(queries, model_classes=None): Execute multiple queries
  • execute_query_with_retry(query, model_class=None): Execute with automatic retry
  • list_files(path): List files in Databricks path
  • show_tables(database=None): Show tables in database
  • test_connection(): Test database connectivity

Built-in Models

  • NexsysRecord: For NEXSYS system data
  • SalesRecord: For sales transaction data
  • FileInfo: For file listing results
  • TableInfo: For table information
  • GenericRecord: For unknown data structures

Error Handling

The package provides specific exceptions:

from dbxsql import (
    AuthenticationError,
    ConnectionError, 
    QueryExecutionError,
    SyntaxError,
    TimeoutError,
    DataParsingError
)

try:
    result = handler.execute_query("SELECT * FROM table")
except AuthenticationError:
    print("Authentication failed")
except QueryExecutionError as e:
    print(f"Query failed: {e}")

Troubleshooting

CERTIFICATE_VERIFY_FAILED

Sometimes, depending on an operational system the next error message could be returned:

dbxsql.exceptions.ConnectionError: Failed to connect to Databricks: Error during request to server: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (_ssl.c:1007)

This error can be fixed by setting an environment variable SSL_CERT_FILE. This can be done in a terminal:

  export SSL_CERT_FILE=/path/to/databricks.pem

or directly in the code:

import os
import certifi

from dbxsql import QueryHandler
from dbxsql.settings import settings
from dbxsql.main import ApplicationRunner

os.environ["SSL_CERT_FILE"]  = certifi.where()


with QueryHandler(settings) as handler:
    app_runner = ApplicationRunner(handler)
    app_runner.run_example_queries()

License

MIT License

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dbxsql-1.0.4.tar.gz (15.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dbxsql-1.0.4-py3-none-any.whl (20.0 kB view details)

Uploaded Python 3

File details

Details for the file dbxsql-1.0.4.tar.gz.

File metadata

  • Download URL: dbxsql-1.0.4.tar.gz
  • Upload date:
  • Size: 15.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.13

File hashes

Hashes for dbxsql-1.0.4.tar.gz
Algorithm Hash digest
SHA256 75aed56cb94e3abbb11cb96d4b164bd9ac41d23610da7e953c442d0143b275b4
MD5 cd991d7ee57839dd535f0a930afe3580
BLAKE2b-256 4560878a3ebbdb611eb6808c767b6b901fabc449520f0a7ea1921118acbf21b6

See more details on using hashes here.

File details

Details for the file dbxsql-1.0.4-py3-none-any.whl.

File metadata

  • Download URL: dbxsql-1.0.4-py3-none-any.whl
  • Upload date:
  • Size: 20.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.13

File hashes

Hashes for dbxsql-1.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 a30dc1895f9e89308c22f3106cc7dee7089c9b14ba1f7cecbcac967de61edad3
MD5 fd953e5693014a18475ad80f97a848af
BLAKE2b-256 3ba17b7d9059493d6bd11f066c778e9d78262a9dc9232562ae4202f53241cb31

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page