databricks-connect compatibility & widgets for Marimo Notebooks
Project description
marimo-databricks-connect
This package provides compatibility & widgets for marimo notebooks & databricks. The goal is to be able to build notebooks that combine code (both python and sql), visualzations, and widgets to create "command center" style one-stop-shop UIs that can monitor, triage, troubleshoot and control our databricks projects.
- Connect to databricks using databricks-connect & spark (not sql warehouse)
- Authenticate/configure spark using the default databricks-connect process (env vars, .databrickscfg etc)
- Execution of both python & sql cells
- Autocomplete Catalog/Schema/Table/Column Names
- Browsing of catalogs/schemas/tables/columns in the marimo data sources view
- Browsing of external locations, volumes, dbfs, workspace in the marimo storage browser
- Notebook widgets to monitor and control of specific instances of databricks capabilities (clusters, workflows, vector search, apps etc)
- Widgets to browse & explore databricks capabilities (compute, workflows, unity catalog)
Why Marimo?
We already have databricks notebooks, jupyter, and python. Why should you try Marimo? Because it checks all the boxes:
| Code/Format | Easy Merges | OSS Editor | Visualizations | Runs in Normal Python | REPL | Custom Widgets |
|---|---|---|---|---|---|---|
| Python | ✅ | ✅ | ❌ | ✅ | ❌ | ❌ |
| Databricks Notebook | ✅ | ❌ | ✅ | ❌ (ignores magic and sql) | ✅ | ❌ |
| Jupyter | ❌ | ✅ | ✅ | ❌ | ✅ | ✅ |
| Marimo | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
Unfortunately, "out of the box", Marimo databricks support, especially for databricks-connect isn't great. This package aims enable all of the cool Marimo features for databricks
Pyspark
Dataframe
Streaming
SQL
Quickstart
Authenticate once on your machine:
az login
# or
databricks configure
Start Marimo (or use vscode extension)
marimo edit mynotebook.py
# or
marimo new
Then in any notebook in this folder:
import marimo as mo
from marimo_databricks_connect import (
dbfs, dbutils, external_location, spark, workspace,
exclude_catalogs, include_catalogs, show_all_catalogs,
workflows_widget, compute_widget, unity_catalog_widget
)
That single import gives you:
-
spark— aDatabricksSessionon serverless compute (OAuth, no host/token config). -
dbutils— bound to that session. -
external_location - Add external locations to browse in the UI
-
include/exclude_catalogs - Show/Hide catalogs in the datasource UI
-
dbfs— an fsspec filesystem rooted at/Volumesthat powers the marimo storage browser via Unity Catalog (no direct ADLS access). -
workspace- filesystem browser for the workspace -
A registered
SparkConnectEngineso marimo's data sources panel browses catalogs / schemas / tables, and SQL cells run on Spark when you passengine=spark:mo.sql("SELECT * FROM samples.nyctaxi.trips LIMIT 100", engine=spark)
-
SQL autocomplete — the engine feeds marimo's in-cell SQL completion with catalogs, schemas, tables, and columns. Discovery is done in bulk via
<catalog>.information_schema(one query per catalog instead of NSHOW/DESCRIBEround trips) and cached in-process. Callprefetch()at the top of a notebook to warm the cache eagerly so suggestions appear on the first keystroke:from marimo_databricks_connect import include_catalogs, prefetch, refresh_metadata include_catalogs("main", "samples") # narrow scope (also makes columns eager) prefetch() # populate cache for everything visible # refresh_metadata("main") # drop cache after schema changes
-
Streaming DataFrame support — streaming DataFrames (from
spark.readStream) are automatically rendered with their schema and a helpful status message instead of silently failing. -
StreamingQuery display — streaming queries (from
.writeStream.start()) render a live status card with query name, ID, active state, progress metrics, and any exceptions.
Streaming DataFrames
Streaming DataFrames (spark.readStream) cannot be collected or displayed as
tables. This package automatically detects them and renders a schema summary
with column names and types:
stream = spark.readStream.table("catalog.schema.my_table")
stream # displays schema + STREAMING badge instead of an empty cell
Streaming queries (returned by .writeStream.start()) are also rendered with a
status card showing the query name, ID, active/stopped state, progress metrics
(batch ID, input rows, rows/sec), source and sink info, and any exceptions:
query = (
stream.writeStream
.format("memory")
.trigger(availableNow=True)
.queryName("preview")
.start()
)
query # displays status card with ACTIVE/STOPPED badge + progress
To preview actual data from a streaming source, write to a memory sink and read the results:
query.awaitTermination() # wait for availableNow trigger to finish
spark.table("preview") # now displays as a normal table
Browsing UC external locations
Add a cell to expose another root in the storage browser:
from marimo_databricks_connect import external_location
landing = external_location("finops_landing") # by UC name
raw = external_location("abfss://c@acct.dfs.core.windows.net/data") # by path
Each variable shows up as its own tree in the storage panel.
Filtering the data sources panel (catalogs / schemas)
With 1000+ UC catalogs the panel becomes unusable. By default only the
current catalog (SELECT current_catalog()) is surfaced. Add catalogs (or
specific schemas) explicitly with fnmatch globs:
from marimo_databricks_connect import (
include_catalogs, exclude_catalogs, show_all_catalogs, reset_catalog_filter,
)
include_catalogs("main", "samples") # exact names
include_catalogs("dev_*", "*_prod") # globs
include_catalogs("main.bronze_*", "*_dev.silver") # narrow to specific schemas
exclude_catalogs("system", "__databricks_*") # always wins over includes
show_all_catalogs() # opt out of the allow-list
reset_catalog_filter() # back to defaults
Filtering only affects the data sources panel — mo.sql(..., engine=spark)
and spark.sql(...) can still query any catalog you have UC permission for.
Persistent defaults
Set once per project in pyproject.toml:
[tool.marimo_databricks_connect]
include_catalogs = ["main", "dev_*"]
exclude_catalogs = ["system", "__databricks_internal"]
# show_all_catalogs = true
…or per shell with environment variables (these override pyproject.toml):
export MARIMO_DBC_INCLUDE_CATALOGS="main,dev_*"
export MARIMO_DBC_EXCLUDE_CATALOGS="system"
export MARIMO_DBC_SHOW_ALL_CATALOGS=1
Resource Specific Widgets
Databricks Apps
Cluster
Job
Schema
Genie
Chat with a Databricks AI/BI Genie space — ask natural-language questions, get back text answers and generated SQL, run the queries inline, and follow suggested next questions. Browse and resume past conversations.
from marimo_databricks_connect import genie_widget
widget = genie_widget("01ef...space_id...")
widget
Serving Endpoint
Table
Vector Index
Vector Search
Warehouse
Selector widgets (mdc.ui.*)
First-class mo.ui-style selectors for every Databricks resource. Each one is
a searchable dropdown whose .value traitlet plugs straight into marimo's
reactive graph — picking a different option re-runs every cell that reads it,
just like mo.ui.dropdown:
import marimo as mo
import marimo_databricks_connect as mdc
catalog = mdc.ui.catalog()
schema = mdc.ui.schema(catalog=catalog) # auto-refreshes when catalog changes
table = mdc.ui.table(schema=schema)
column = mdc.ui.column(table=table)
mo.hstack([catalog, schema, table, column])
Then in any downstream cell:
spark.table(table.value).select(column.value).limit(20)
Available selectors (all under mdc.ui):
| Factory | value is... |
|---|---|
mdc.ui.catalog() |
catalog name |
mdc.ui.schema(catalog=...) |
catalog.schema |
mdc.ui.table(schema=...) |
catalog.schema.table |
mdc.ui.column(table=...) |
column name |
mdc.ui.secret_scope() |
scope name |
mdc.ui.secret(scope=...) |
secret key (with {{secrets/...}} ref in selected_meta) |
mdc.ui.cluster() |
cluster id |
mdc.ui.warehouse() |
warehouse id |
mdc.ui.workflow() |
job id (str) |
mdc.ui.pipeline() |
DLT pipeline id |
mdc.ui.app() |
app name |
mdc.ui.serving_endpoint() |
endpoint name |
mdc.ui.vector_search() |
Vector Search endpoint name (alias: vector_search_endpoint) |
mdc.ui.vector_index(endpoint=...) |
three-part index name |
mdc.ui.genie_space() |
Genie space id |
mdc.ui.principal() |
userName / applicationId / group displayName |
Dependent selectors (schema, table, column, secret, vector_index)
accept either a literal string parent or another selector — when given a
selector they observe its .value and refetch automatically. All selectors
also expose .selected_meta (parsed dict with extra metadata), .options
(synced JSON list), a refresh button in the UI, and a refresh() method.
Exploration Widgets
The package ships two interactive widgets built with anywidget for exploring your Databricks workspace directly inside marimo notebooks.
Unity Catalog widget
Browse catalogs, schemas, tables, columns, volumes, and more. Inspect table details, view sample data, explore table & column lineage, and check permissions. Also browse external locations (with drill-through into their contents), storage credentials, connections, and external metadata:
from marimo_databricks_connect import unity_catalog_widget
widget = unity_catalog_widget()
widget # display in cell output
Workflows widget
Browse jobs, drill into tasks, and view run history:
from marimo_databricks_connect import workflows_widget
widget = workflows_widget()
widget # display in cell output
Compute widget
Browse clusters, SQL warehouses, vector search endpoints, instance pools, and cluster policies in a tabbed interface:
from marimo_databricks_connect import compute_widget
widget = compute_widget()
widget # display in cell output
All widgets authenticate using the default Databricks auth chain (env vars, ~/.databrickscfg, az login, etc.) when no explicit client is provided.
Running
marimo edit scratch/m.py
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file marimo_databricks_connect-0.1.37.tar.gz.
File metadata
- Download URL: marimo_databricks_connect-0.1.37.tar.gz
- Upload date:
- Size: 1.8 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.11.8 {"installer":{"name":"uv","version":"0.11.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dd569178bbbbf8b9a6ffe3b1e22f7da54254779fc6d4788bb275066192a42a88
|
|
| MD5 |
0dbac53e9be208ac99a569e3a8c45144
|
|
| BLAKE2b-256 |
0566ac61ae875100879b05abdb4ad44796eced36f3897f55f71136ac46e7f8b4
|
File details
Details for the file marimo_databricks_connect-0.1.37-py3-none-any.whl.
File metadata
- Download URL: marimo_databricks_connect-0.1.37-py3-none-any.whl
- Upload date:
- Size: 308.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.11.8 {"installer":{"name":"uv","version":"0.11.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6542dcd145428f3785ced03d8c896da52e63f19db0956eabd25bf0a6ba320bfa
|
|
| MD5 |
1c631193dab9f58d6c41a428fcbc5979
|
|
| BLAKE2b-256 |
78680ba461d41cfa3603ad6604e4c6338a8243afcc12e49b85560a59d0f110f9
|