Skip to main content

No project description provided

Project description

marimo-databricks-connect

This package provides compatibility & widgets for marimo notebooks & databricks. The goal is to be able to build notebooks that combine code (both python and sql), visualzations, and widgets to create "command center" style one-stop-shop UIs that can monitor, triage, troubleshoot and control our databricks projects.

  • Connect to databricks using databricks-connect & spark (not sql warehouse)
  • Authenticate/configure spark using the default databricks-connect process (env vars, .databrickscfg etc)
  • Execution of both python & sql cells
  • Browsing of catalogs/schemas/tables/columns in the marimo data sources view
  • Browsing of external locations, volumes, and dbfs in the marimo storage browser
  • Notebook widgets to monitor and control of specific instances of databricks capabilities (clusters, workflows, vector search, apps etc)
  • Widgets to browse & explore databricks capabilities (compute, workflows, unity catalog)

Why Marimo?

We already have databricks notebooks, jupyter, and python. Why should you try Marimo? Because it checks all the boxes:

Code/Format Easy Merges OSS Editor Visualizations Runs in Normal Python REPL Custom Widgets
Python
Databricks Notebook ❌ (ignores magic and sql)
Jupyter
Marimo

Unfortunately, "out of the box", Marimo databricks support, especially for databricks-connect isn't great. This package aims enable all of the cool Marimo features for databricks

Pyspark

Dataframe

pyspark

Streaming

streaming

SQL

sql

Quickstart

Authenticate once on your machine:

az login
# or
databricks configure

Start Marimo (or use vscode extension)

marimo edit mynotebook.py
# or
marimo new

Then in any notebook in this folder:

import marimo as mo
from marimo_databricks_connect import (
    dbfs, dbutils, external_location, spark,
    exclude_catalogs, include_catalogs, show_all_catalogs,
    workflows_widget, compute_widget, unity_catalog_widget,
)

That single import gives you:

  • spark — a DatabricksSession on serverless compute (OAuth, no host/token config).

  • dbutils — bound to that session.

  • external_location - Add external locations to browse in the UI

  • include/exclude_catalogs - Show/Hide catalogs in the datasource UI

  • dbfs — an fsspec filesystem rooted at /Volumes that powers the marimo storage browser via Unity Catalog (no direct ADLS access).

  • A registered SparkConnectEngine so marimo's data sources panel browses catalogs / schemas / tables, and SQL cells run on Spark when you pass engine=spark:

    mo.sql("SELECT * FROM samples.nyctaxi.trips LIMIT 100", engine=spark)
    
  • Streaming DataFrame support — streaming DataFrames (from spark.readStream) are automatically rendered with their schema and a helpful status message instead of silently failing.

  • StreamingQuery display — streaming queries (from .writeStream.start()) render a live status card with query name, ID, active state, progress metrics, and any exceptions.

Streaming DataFrames

Streaming DataFrames (spark.readStream) cannot be collected or displayed as tables. This package automatically detects them and renders a schema summary with column names and types:

stream = spark.readStream.table("catalog.schema.my_table")
stream  # displays schema + STREAMING badge instead of an empty cell

Streaming queries (returned by .writeStream.start()) are also rendered with a status card showing the query name, ID, active/stopped state, progress metrics (batch ID, input rows, rows/sec), source and sink info, and any exceptions:

query = (
    stream.writeStream
    .format("memory")
    .trigger(availableNow=True)
    .queryName("preview")
    .start()
)
query  # displays status card with ACTIVE/STOPPED badge + progress

To preview actual data from a streaming source, write to a memory sink and read the results:

query.awaitTermination()  # wait for availableNow trigger to finish
spark.table("preview")    # now displays as a normal table

Browsing UC external locations

Add a cell to expose another root in the storage browser:

from marimo_databricks_connect import external_location

landing = external_location("finops_landing")                  # by UC name
raw     = external_location("abfss://c@acct.dfs.core.windows.net/data")  # by path

storage

Each variable shows up as its own tree in the storage panel.

Filtering the data sources panel (catalogs / schemas)

With 1000+ UC catalogs the panel becomes unusable. By default only the current catalog (SELECT current_catalog()) is surfaced. Add catalogs (or specific schemas) explicitly with fnmatch globs:

from marimo_databricks_connect import (
    include_catalogs, exclude_catalogs, show_all_catalogs, reset_catalog_filter,
)

include_catalogs("main", "samples")            # exact names
include_catalogs("dev_*", "*_prod")             # globs
include_catalogs("main.bronze_*", "*_dev.silver")  # narrow to specific schemas

exclude_catalogs("system", "__databricks_*")    # always wins over includes

show_all_catalogs()                             # opt out of the allow-list
reset_catalog_filter()                          # back to defaults

Filtering only affects the data sources panelmo.sql(..., engine=spark) and spark.sql(...) can still query any catalog you have UC permission for.

Persistent defaults

Set once per project in pyproject.toml:

[tool.marimo_databricks_connect]
include_catalogs = ["main", "dev_*"]
exclude_catalogs = ["system", "__databricks_internal"]
# show_all_catalogs = true

…or per shell with environment variables (these override pyproject.toml):

export MARIMO_DBC_INCLUDE_CATALOGS="main,dev_*"
export MARIMO_DBC_EXCLUDE_CATALOGS="system"
export MARIMO_DBC_SHOW_ALL_CATALOGS=1

catalogs

Resource Specific Widgets

Databricks Apps

dbr app

Cluster

cluster

Job

job

Schema

schema

Serving Endpoint

serving

Table

table

Vector Index

vector index

Vector Search

vector search

Warehouse

warehouse

Exploration Widgets

The package ships two interactive widgets built with anywidget for exploring your Databricks workspace directly inside marimo notebooks.

Unity Catalog widget

Browse catalogs, schemas, tables, columns, volumes, and more. Inspect table details, view sample data, explore table & column lineage, and check permissions. Also browse external locations (with drill-through into their contents), storage credentials, connections, and external metadata:

from marimo_databricks_connect import unity_catalog_widget

widget = unity_catalog_widget()
widget  # display in cell output

uc

Workflows widget

Browse jobs, drill into tasks, and view run history:

from marimo_databricks_connect import workflows_widget

widget = workflows_widget()
widget  # display in cell output

workflows

Compute widget

Browse clusters, SQL warehouses, vector search endpoints, instance pools, and cluster policies in a tabbed interface:

from marimo_databricks_connect import compute_widget

widget = compute_widget()
widget  # display in cell output

All widgets authenticate using the default Databricks auth chain (env vars, ~/.databrickscfg, az login, etc.) when no explicit client is provided.

compute

Running

marimo edit scratch/m.py

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

marimo_databricks_connect-0.1.22.tar.gz (1.6 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

marimo_databricks_connect-0.1.22-py3-none-any.whl (244.9 kB view details)

Uploaded Python 3

File details

Details for the file marimo_databricks_connect-0.1.22.tar.gz.

File metadata

  • Download URL: marimo_databricks_connect-0.1.22.tar.gz
  • Upload date:
  • Size: 1.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.8 {"installer":{"name":"uv","version":"0.11.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for marimo_databricks_connect-0.1.22.tar.gz
Algorithm Hash digest
SHA256 8cecfdd0af4b9b972602d61a2d00154f3cb1f98c7e62476f8b62c8a1a37fd6e0
MD5 5d1bcdf8160873c7a3322747c1fa98bb
BLAKE2b-256 a702b688fdba92a23172a394ca334622dd9109d8dac4eed3453bbc27a15e90e3

See more details on using hashes here.

File details

Details for the file marimo_databricks_connect-0.1.22-py3-none-any.whl.

File metadata

  • Download URL: marimo_databricks_connect-0.1.22-py3-none-any.whl
  • Upload date:
  • Size: 244.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.8 {"installer":{"name":"uv","version":"0.11.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for marimo_databricks_connect-0.1.22-py3-none-any.whl
Algorithm Hash digest
SHA256 b91402177c6b50c0d3986256c17fe57e8b62edef24ec247d462109b0ac6e3c44
MD5 67ad4ac1dbbdfdd9f0db93b43982814b
BLAKE2b-256 fed4e24203a703994e9e606392abada75a313920580aa665b2c770cbe1b5c5eb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page