Skip to main content

Production-focused Databricks API toolkit by Rehla Digital Inc for workspace and account automation on AWS.

Project description

Unified Databricks API

Single Python package to call Databricks Workspace and Account APIs and convert JSON responses into Pandas or PySpark DataFrames.

About Rehla Digital Inc

Rehla Digital Inc builds cloud and data engineering solutions that help teams standardize platform operations, accelerate delivery, and reduce integration risk. This package is maintained as part of that effort to provide a practical, production-oriented Databricks API toolkit.

Install

pip install rehla-dbx-tools

Import in Python with underscores:

from rehla_dbx_tools import DatabricksApiClient

Install Spark extras if needed:

pip install "rehla-dbx-tools[spark]"

Quick Start

from rehla_dbx_tools import DatabricksApiClient

client = DatabricksApiClient.from_env()
if client.workspace is not None:
    jobs = client.workspace.list_jobs()
    df = jobs.to_pandas()
    print(df.head())

# Force both workspace/account config to a target cloud
client = DatabricksApiClient.from_env_for_cloud("azure")

Simple host/token setup:

from rehla_dbx_tools import DatabricksApiClient

client = DatabricksApiClient.simple(
    host="https://dbc-xxxx.cloud.databricks.com",
    token="dapi...token...",
)

jobs = client.list_jobs(limit=25)
print("jobs:", len(jobs))

for run in client.list_recent_job_runs(limit=25):
    print(run.get("run_id"))

Token can be omitted if you want guided auth:

client = DatabricksApiClient.simple(
    host="https://dbc-xxxx.cloud.databricks.com",
    open_browser_for_token=True,  # opens Access Tokens page
    prompt_for_token=True,         # prompts to paste token
)

Windows SSO flow (Databricks CLI login):

client = DatabricksApiClient.from_windows_sso(
    host="https://dbc-xxxx.cloud.databricks.com",
)

Notebook Context Bootstrap

Inside Databricks notebooks:

from rehla_dbx_tools import DatabricksApiClient

client = DatabricksApiClient.from_notebook_context()
if client.workspace is not None:
    clusters = client.workspace.list_clusters()
    spark_df = clusters.to_spark()
    display(spark_df)

Account API

account client is enabled when DATABRICKS_ACCOUNT_HOST and DATABRICKS_ACCOUNT_ID are set.

if client.account is not None:
    workspaces = client.account.list_workspaces()
    print(workspaces.to_pandas().head())

Version-Aware Generic Request

response = client.workspace.request_versioned(
    "GET",
    service="unity-catalog",
    endpoint="metastores",
    api_version="2.1",
)

Expanded Convenience Wrappers

import getpass

if client.workspace is not None:
    run = client.workspace.run_job_now(job_id=123)
    runs = client.workspace.list_job_runs(job_id=123, active_only=True, limit=10)
    run_export = client.workspace.export_job_run(run_id=987, views_to_export="CODE")
    run_output = client.workspace.get_job_run_output(run_id=987)
    run_submit = client.workspace.submit_job_run({"run_name": "ad-hoc-check"})
    run_delete = client.workspace.delete_job_run(run_id=987)
    job_permissions = client.workspace.get_job_permissions(job_id=123)
    permission_levels = client.workspace.get_job_permission_levels(job_id=123)
    permission_update = client.workspace.update_job_permissions(
        job_id=123,
        access_control_list=[{"group_name": "admins", "permission_level": "CAN_MANAGE"}],
    )
    cluster_permissions = client.workspace.get_cluster_permissions(cluster_id="0123-abc")
    cluster_permission_levels = client.workspace.get_cluster_permission_levels(cluster_id="0123-abc")
    repo_permissions = client.workspace.get_repo_permissions(repo_id=12345)
    repo_permission_levels = client.workspace.get_repo_permission_levels(repo_id=12345)
    repair = client.workspace.repair_job_run(run_id=987, rerun_all_failed_tasks=True)
    cancel_all = client.workspace.cancel_all_job_runs(job_id=123, all_queued_runs=True)
    cluster = client.workspace.get_cluster(cluster_id="0123-abc")
    catalogs = client.workspace.list_catalogs(max_results=25)
    warehouses = client.workspace.list_sql_warehouses()
    dbfs_files = client.workspace.list_dbfs("dbfs:/tmp")
    token = client.workspace.create_token(lifetime_seconds=3600, comment="ci-short-lived")
    rotated_token = client.workspace.rotate_token(
        token_id_to_revoke="old-token-id",
        lifetime_seconds=3600,
        comment="ci-rotation",
    )
    repos = client.workspace.list_repos(path_prefix="/Repos/team")
    repo = client.workspace.get_repo(repo_id=12345)
    client.workspace.put_secret(
        scope="app-prod",
        key="api-token",
        string_value=getpass.getpass("Secret value: "),
    )

if client.account is not None:
    ws = client.account.get_workspace(workspace_id=101)
    creds = client.account.list_credentials()
    storage_cfgs = client.account.list_storage_configurations()
    networks = client.account.list_networks()
    private_access = client.account.list_private_access_settings()
    vpc_endpoints = client.account.list_vpc_endpoints()
    cmks = client.account.list_customer_managed_keys()
    users = client.account.list_users()
    user = client.account.get_user("user-101")
    groups = client.account.list_groups()
    group = client.account.get_group("group-101")
    budgets = client.account.list_budget_policies()
    log_delivery_configs = client.account.list_log_delivery_configurations()

For detailed setup and examples, see docs/USAGE.md.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rehla_dbx_tools-1.1.2.tar.gz (32.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rehla_dbx_tools-1.1.2-py3-none-any.whl (36.0 kB view details)

Uploaded Python 3

File details

Details for the file rehla_dbx_tools-1.1.2.tar.gz.

File metadata

  • Download URL: rehla_dbx_tools-1.1.2.tar.gz
  • Upload date:
  • Size: 32.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for rehla_dbx_tools-1.1.2.tar.gz
Algorithm Hash digest
SHA256 277632035e6e07de8bb4512c8eb8655b94802f8b1e9918be7b2408ef12ef0bdd
MD5 61a441b462d11254ac8ee4e9e6399e57
BLAKE2b-256 fd59d133998238b35be218d78b66944c95bb753d19be9512bb14fd74a08f70da

See more details on using hashes here.

Provenance

The following attestation bundles were made for rehla_dbx_tools-1.1.2.tar.gz:

Publisher: workflow.yml on rehladigital/rehla_dbx_tools

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rehla_dbx_tools-1.1.2-py3-none-any.whl.

File metadata

File hashes

Hashes for rehla_dbx_tools-1.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 4e9de08ab1594547feea00cdf4d5a8c780a381c20e6c1b5c3e5333e986709168
MD5 c07a391062e9c8a1eadf52fff808c1cd
BLAKE2b-256 c7ce6a4b6d2d584a350d4ae3c705d1d6b59215dd38ca665ccb8ca797f1ca9b21

See more details on using hashes here.

Provenance

The following attestation bundles were made for rehla_dbx_tools-1.1.2-py3-none-any.whl:

Publisher: workflow.yml on rehladigital/rehla_dbx_tools

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page