Cross-workspace access visibility for Databricks — audit who can reach what across every workspace in your account.
Project description
databricks-access-audit
Databricks gives you no native way to answer "what can this identity access across all my workspaces?" — this tool does.
The problem
The Account Console shows you one workspace at a time. INFORMATION_SCHEMA shows you one metastore at a time. Neither resolves nested group memberships across workspaces. Neither tells you whether a personal grant duplicates what the group already provides. Neither tells you what workspace objects (jobs, clusters, dashboards, pipelines) a principal can reach.
You're a Databricks admin. Your boss walks over and asks:
- "What can alice@company.com access across all our workspaces?"
- "We're offboarding Bob — what exactly does he have direct access to before we deprovision him?"
- "Which groups have access to the
maincatalog? Who in those groups has their own redundant grants?" - "Our quarterly access review is due. Can you export who has what and compare it against last quarter?"
Answering any of these means opening each workspace, clicking through Unity Catalog, filtering by user, checking workspace assignments — then repeating for every workspace. You still can't see nested group inheritance. You still can't see whether a personal grant duplicates what the group already provides.
This tool answers those questions in one command, across all workspaces at once.
Two modes
| Mode | Entry point | Question it answers |
|---|---|---|
| Principal audit | --principal "alice@example.com" |
What can this user / SP / group access across every workspace? (Unity Catalog grants + workspace object ACLs) |
| Group audit | --group "data-engineers" |
What does this group access? Who in it has redundant personal grants the group already covers? |
--group and --principal are mutually exclusive. Use --group to audit a group's grants and find cleanup opportunities; use --principal to reverse-lookup what a specific identity can reach.
When to use this tool
| Scenario | Mode | Key flags |
|---|---|---|
| Offboarding — pull everything before deprovisioning | --principal |
--scan-workspace-objects --output csv |
| Access review — export permissions, compare to last quarter | either | --output csv --baseline last_quarter.json |
| Incident response — map what a compromised identity can reach | --principal |
--escalation-check --scan-workspace-objects |
| Permission hygiene — find redundant personal grants, generate cleanup SQL | --group |
--revoke-script |
| Stale access — flag grants with no recorded activity | --group |
--stale-days 90 --sql-warehouse-id ... |
| Compliance snapshot — prove permissions haven't drifted | either | --save-snapshot / --baseline |
Prerequisites
Python: 3.9 or later.
Databricks account: Unity Catalog must be enabled. The tool uses the Account API and the per-workspace Unity Catalog API — it does not scan Hive Metastore or workspace-local permissions.
Service Principal permissions: The SP running the audit requires elevated access at three distinct levels:
| Level | Required role / privilege | Why |
|---|---|---|
| Databricks account | Account Admin | Required to call the SCIM API (list groups, users, SPs), list workspaces, and read workspace permission assignments — all account-level endpoints enforce Account Admin |
| Each workspace | Workspace Admin (recommended) or Workspace User | The SP must be assigned to each workspace it will scan; Workspace Admin is the safest choice since some permission assignment APIs are not visible to plain users |
| Unity Catalog | Metastore Admin, or MANAGE on every catalog to audit |
The GET /permissions/catalog/{name} endpoint (and the equivalent schema/table endpoints) requires either Metastore Admin or MANAGE on the securable — BROWSE or SELECT are not sufficient to read grant lists |
The simplest setup that is guaranteed to work: grant the SP Account Admin, add it to each workspace as Workspace Admin, and assign it Metastore Admin. Scoping to minimum required permissions is possible but requires granting MANAGE on every individual catalog, which is difficult to maintain across a large account.
Installation
The package is not yet published to PyPI. Install from source:
git clone https://github.com/lukaleet/databricks-access-audit.git
cd databricks-access-audit
# Core install (raw HTTP client only - no extra dependencies beyond requests)
pip install -e .
# Recommended: include the Databricks SDK for automatic auth and retries
pip install -e ".[sdk]"
# Full development install
pip install -e ".[sdk,dev]"
Client backends
The tool ships two interchangeable API backends, selected automatically by create_client():
| Client | Requires | Auth | Pagination | Retries |
|---|---|---|---|---|
DatabricksAPIClient |
requests (always included) |
Manual OAuth + per-host token cache | Manual SCIM page walking | Exponential backoff (configurable) |
DatabricksSDKClient |
databricks-sdk (optional) |
Automatic via SDK | Automatic iterators | Built-in SDK retries |
create_client() returns DatabricksSDKClient when databricks-sdk is installed, and falls back to DatabricksAPIClient otherwise. Pass prefer_sdk=False or --no-sdk to force the raw HTTP backend.
Quick Start
Credentials
All three credential flags can be set as environment variables:
export DATABRICKS_CLIENT_ID="your-sp-client-id"
export DATABRICKS_CLIENT_SECRET="your-sp-secret"
export DATABRICKS_ACCOUNT_ID="your-account-id"
CLI - Principal Audit
# What can alice access across all workspaces?
databricks-access-audit --principal "alice@example.com" --cloud azure
# Include workspace object ACLs (jobs, clusters, dashboards, pipelines, ...)
databricks-access-audit \
--principal "alice@example.com" \
--cloud azure \
--scan-workspace-objects \
--escalation-check
# Deep scan including schema and table grants, export to CSV for a review
databricks-access-audit \
--principal "alice@example.com" \
--cloud azure \
--scan-schemas \
--scan-tables \
--scan-workspace-objects \
--output csv > alice_access_$(date +%F).csv
# Audit a service principal by display name or application ID
databricks-access-audit --principal "ETL-Bot" --cloud azure
# Audit a group (shows what it can access, not who is in it)
databricks-access-audit --principal "data-engineers" --cloud azure
CLI - Group Audit
# Scan all workspaces, catalog level only
databricks-access-audit --group "data-engineers" --cloud azure
# Deep scan including schema and table grants, with REVOKE script
databricks-access-audit \
--group "data-engineers" \
--cloud azure \
--scan-schemas \
--scan-tables \
--revoke-script
# JSON output (progress lines go to stdout before the JSON block)
databricks-access-audit --group "data-engineers" --output json
# Scan specific workspaces instead of all discovered ones
databricks-access-audit \
--group "data-engineers" \
--workspace-urls "https://adb-123.azuredatabricks.net,https://adb-456.azuredatabricks.net"
# Force raw HTTP client even when databricks-sdk is installed
databricks-access-audit --group "data-engineers" --no-sdk
All CLI flags
Credentials (or set via env vars):
--client-id DATABRICKS_CLIENT_ID
--client-secret DATABRICKS_CLIENT_SECRET
--account-id DATABRICKS_ACCOUNT_ID
Target (mutually exclusive):
--group NAME Display name of the group to audit
--principal ID User email, SP app-ID/display name, or group name
Scan scope:
--cloud azure | aws | gcp (default: azure)
--workspace-urls Comma-separated URLs; omit to scan all discovered workspaces
--scan-schemas Also scan schema-level grants
--scan-tables Also scan table/view-level grants (implies --scan-schemas)
--scan-workspace-objects
Scan workspace-level object permissions across 13 types:
jobs, clusters, cluster_policies, pipelines, sql_warehouses,
sql_queries, sql_alerts, lakeview_dashboards, genie_spaces,
mlflow_experiments, registered_models, serving_endpoints, apps.
Off by default — adds significant API calls per workspace.
--workspace-object-types LIST
Comma-separated subset of the 13 types to scan.
Default: all 13 types.
--workers N Parallel threads for workspace/schema/table scanning
(default: 8; set to 1 for sequential)
Output:
--output text | json | csv (default: text)
--revoke-script Print REVOKE SQL for redundant grants (group audit only)
Snapshot / diff:
--save-snapshot PATH Save a timestamped JSON snapshot of this run to PATH.
--baseline PATH Compare this run against a previous snapshot at PATH
and print what changed (new grants, removed grants,
new/removed members). Compatible with all --output
formats.
Permission elevation:
--auto-elevate Temporarily grant the audit SP Workspace Admin on any
workspace where it lacks that role, then restore the
prior state after the audit completes (success or
failure). Requires Account Admin. Metastore Admin
must still be granted manually.
--dry-run-elevation Preview which workspaces would be elevated without
writing any permission changes. Implies --auto-elevate.
Security analysis:
--escalation-check (principal audit) Flag ALL_PRIVILEGES and MANAGE
grants inherited transitively through group membership.
--stale-days N Flag member-direct catalog grants whose holders have
had no recorded activity in system.access.audit for
the last N days. Requires --sql-warehouse-id.
--sql-warehouse-id SQL warehouse ID used to query system.access.audit.
--sql-workspace-url Workspace URL whose audit table to query (defaults to
first discovered workspace).
--check-local-groups Scan workspace SCIM directories and flag groups that
exist only at workspace level (legacy local groups not
yet migrated to account SCIM).
Client:
--no-sdk Force raw HTTP client even if databricks-sdk is installed
--max-retries Retry attempts on 429/5xx (default: 5; raw client only)
--retry-base-delay Base delay in seconds (default: 1.0)
--retry-max-delay Maximum delay cap in seconds (default: 60.0)
Just-in-Time Permission Elevation
Getting Workspace Admin on every workspace is a hard prerequisite for a full audit but granting it permanently can be undesirable. The --auto-elevate flag automates the lifecycle:
- Before the audit — for each workspace where the SP lacks Workspace Admin, the tool calls the Account API to grant it temporarily.
- During the audit — all workspace and Unity Catalog APIs are called with Workspace Admin in place.
- After the audit — regardless of whether the audit succeeded or failed, the tool restores each workspace to its prior state:
- If the SP had no assignment before elevation, the assignment is deleted.
- If the SP had a USER-level assignment before elevation, it is restored to USER.
This cleanup guarantee is implemented as a context manager __exit__, so it runs unconditionally even if the audit raises an exception.
Usage
# Elevate automatically, then restore
databricks-access-audit --group "data-engineers" --cloud azure --auto-elevate
# Preview which workspaces would be elevated (no writes)
databricks-access-audit --group "data-engineers" --cloud azure --dry-run-elevation
Scope and limitations
| What is managed | How |
|---|---|
| Workspace Admin | Granted temporarily per-workspace; restored after the audit |
| Metastore Admin | Not managed. Must be granted manually before running the tool |
| Account Admin | Not managed. Hard prerequisite — must be in place before using --auto-elevate |
The SP can only be elevated on workspaces discovered through the Account API (workspaces with a known numeric workspace ID). Workspaces supplied via --workspace-urls have no known ID and are skipped with a warning — the SP must already be a Workspace Admin on those.
If cleanup fails (e.g. due to a network error), the tool:
- Logs an ERROR with explicit manual revocation instructions (workspace name, SP SCIM ID, and the
DELETE /permissionassignments/principals/{id}endpoint to call). - Raises
RuntimeErrorso the failure is never silently swallowed.
Programmatic usage
from databricks_access_audit import create_client, PermissionElevator, WorkspaceDiscovery
client = create_client(cloud="azure", client_id="...",
client_secret="...", account_id="...")
workspaces = WorkspaceDiscovery(client, "azure").discover()
with PermissionElevator(client, sp_application_id="<client-id>") as elev:
for ws in workspaces:
elev.ensure_workspace_admin(ws.workspace_id, ws.workspace_name)
# ... run audit ...
# prior workspace assignments are restored here, success or failure
Dry-run mode (preview only, no writes):
with PermissionElevator(client, sp_application_id="<client-id>", dry_run=True) as elev:
for ws in workspaces:
elev.ensure_workspace_admin(ws.workspace_id, ws.workspace_name)
Python API
Principal audit
from databricks_access_audit import create_client, PrincipalAuditor, WorkspaceDiscovery
client = create_client(cloud="azure", client_id="...",
client_secret="...", account_id="...")
auditor = PrincipalAuditor(
client,
workspace_discovery=WorkspaceDiscovery(client, "azure"),
cloud_provider="azure",
)
result = auditor.audit("alice@example.com", scan_schemas=True)
for g in result.groups:
tag = "direct" if g.is_direct else "transitive"
print(f" {g.group_name} ({tag}) - path: {' → '.join(g.path)}")
for r in result.workspace_roles:
print(f" {r.workspace_name}: {r.permission_level} via {r.via_group}")
for p in result.permissions:
print(f" [{p.securable_type}] {p.securable_name}: {', '.join(p.privileges)} via {p.via_group}")
if result.dead_end_groups:
print(f" Dead-end groups (no workspace access): {result.dead_end_groups}")
Group audit
from databricks_access_audit import (
create_client,
GroupMembershipResolver,
WorkspaceDiscovery,
CatalogPermissionScanner,
RedundancyDetector,
RevokeScriptGenerator,
)
client = create_client(
cloud="azure",
client_id="...",
client_secret="...",
account_id="...",
)
resolver = GroupMembershipResolver(client)
node = resolver.resolve_group("data-engineers")
members = resolver.get_all_members_flat(node) # {"users": [...], "service_principals": [...]}
workspaces = WorkspaceDiscovery(client, "azure").discover()
scanner = CatalogPermissionScanner(client, resolver)
grants = scanner.scan_all_workspaces(workspaces, "data-engineers", node, members)
redundancy = RedundancyDetector().detect_redundancy(grants, "data-engineers")
print(RevokeScriptGenerator.generate(redundancy, include_partial=True))
Databricks Notebook
Import Databricks Access Audit.ipynb into your workspace. The first cell installs the package — adjust the path to match your workspace location and run it, then Run All.
# If the package is cloned into your Databricks workspace:
%pip install -q "/Workspace/Users/your.name@company.com/databricks-access-audit-tool[sdk]"
# Once published to PyPI:
# %pip install -q "databricks-access-audit[sdk]"
Do not use
pip install -e(editable mode) on a Databricks cluster — the cluster's setuptools may not support PEP 660 and editable installs serve no purpose on a cluster where you are not editing the source.
Widgets
| Widget | Type | Description |
|---|---|---|
secret_scope |
text | (optional) Databricks secret scope whose keys client_id, client_secret, account_id take priority over plain-text widgets and environment variables |
client_id |
text | Service Principal application (client) ID |
client_secret |
text | Service Principal secret |
account_id |
text | Databricks account ID |
cloud |
dropdown | azure / aws / gcp |
target_group |
text | Group display name to audit |
principal_identifier |
text | (optional) User email, SP name/app-ID, or group name for the principal reverse-lookup |
workspace_urls |
text | (optional) Comma-separated workspace URLs; blank to scan all |
scan_schemas |
dropdown | true / false |
scan_tables |
dropdown | true / false |
scan_workspace_objects |
dropdown | true / false — scan 13 workspace object types (jobs, clusters, pipelines, SQL, Genie, MLflow, serving endpoints, apps) |
workspace_object_types |
text | Comma-separated object types; blank = all 13 types |
workers |
text | Number of parallel threads for workspace/schema/table scanning (default: 8) |
auto_elevate |
dropdown | true / false — temporarily grant Workspace Admin to the audit SP |
dry_run_elevation |
dropdown | true / false — log elevation actions without applying them |
escalation_check |
dropdown | true / false — flag ALL_PRIVILEGES / MANAGE grants (principal audit) |
stale_days |
text | Stale-grant threshold in days; 0 disables the check |
sql_warehouse_id |
text | SQL warehouse ID for stale grant queries |
sql_workspace_url |
text | Workspace URL used for stale grant queries |
check_local_groups |
dropdown | true / false — detect workspace-local groups not in account SCIM |
save_snapshot |
text | (optional) Path to write a JSON snapshot of this audit run |
baseline_snapshot |
text | (optional) Path to a prior snapshot; produces a diff instead of a full report |
export_delta_path |
text | (optional) Delta table path for historical export (e.g. abfss://container@account.dfs.core.windows.net/audit) |
Output DataFrames - Group Audit
| DataFrame | Contents |
|---|---|
df_grants |
All catalog-level grants (direct, upstream, member-direct) |
df_membership |
Users and SPs with IdP-sync source and full inheritance paths |
df_redundancy |
Redundancy analysis with revoke recommendations |
df_schema_grants |
Schema-level grants (populated when scan_schemas=true) |
df_table_grants |
Table/view-level grants (populated when scan_tables=true) |
df_top_members |
Members ranked by personal grant count, with redundancy level — the cleanup shortlist |
df_stale |
Member-direct grants with no recent audit-log activity (when stale_days>0) |
df_local_groups |
Workspace-local groups absent from account SCIM (when check_local_groups=true) |
df_workspace_objects |
Workspace object ACL grants across all 13 types (when scan_workspace_objects=true) |
_revoke_sql |
Auto-generated REVOKE SQL script (string) |
Output DataFrames - Principal Audit
| DataFrame | Contents |
|---|---|
df_pa_groups |
All group memberships (direct + transitive) with full inheritance path |
df_pa_ws |
Workspace access roles with the source group |
df_pa_perms |
UC permissions (catalog/schema/table) with granting group and workspace |
df_escalation |
ALL_PRIVILEGES / MANAGE escalation findings (when escalation_check=true) |
df_pa_workspace_objects |
Workspace object ACL grants across all 13 types (when scan_workspace_objects=true) |
Delta export
When export_delta_path is set, all DataFrames are appended to partitioned Delta tables under that path with audit_timestamp and target_group columns added. This supports incremental audit history and change tracking over time.
Output Reference
CLI - JSON (principal audit)
{
"principal": "Alice Smith",
"principal_type": "USER",
"timestamp": "2025-01-15T10:30:00",
"groups": [
{"name": "data-engineers", "direct": true, "path": ["alice@example.com", "data-engineers"]},
{"name": "all-data-team", "direct": false, "path": ["alice@example.com", "data-engineers", "all-data-team"]}
],
"workspace_roles": [
{"workspace": "prod-ws", "permission": "USER", "via_group": "data-engineers"}
],
"permissions": [
{"type": "CATALOG", "name": "main", "privileges": ["USE_CATALOG", "SELECT"],
"via_group": "data-engineers", "workspace": "prod-ws"}
],
"dead_end_groups": ["compliance-readers"],
"workspace_object_permissions": [
{"object_type": "jobs", "object_name": "nightly-etl", "workspace": "prod-ws",
"permission_level": "CAN_MANAGE", "grant_source": "DIRECT", "principal": "alice@example.com"}
]
}
CLI - JSON (group audit)
{
"group": "data-engineers",
"timestamp": "2025-01-15T10:30:00+00:00",
"users": 12,
"service_principals": 2,
"catalog_grants": 8,
"schema_grants": 0,
"table_grants": 0,
"full_redundancy": 3,
"partial_redundancy": 1,
"top_members": [
{"principal": "alice@example.com", "personal_grants": 3, "redundancy": "Full"},
{"principal": "bob@example.com", "personal_grants": 1, "redundancy": "Partial"}
]
}
Grant Classification
Each grant on a catalog, schema, or table is classified relative to the target group or principal:
| Classification | Meaning |
|---|---|
| Direct | The target group or principal itself holds this grant |
| Upstream | A parent (ancestor) group of the target holds this grant |
| Member Direct | An individual user or SP within the target group has a personal grant |
Principal matching handles backtick-quoted names, case-insensitive email addresses, display names, and service principal application IDs.
Azure AD B2B guest users (Azure only)
Azure AD B2B guest users have two workspace SCIM records with different userName values:
| Record | userName |
Used for |
|---|---|---|
| Account-synced | alice@gmail.com (the invitation email) |
SCIM identity, group memberships |
| Azure AD guest | alice_gmail.com#EXT#@tenant.onmicrosoft.com (the B2B guest UPN) |
Workspace object ACLs |
The principal audit automatically resolves both identities by searching workspace SCIM using the account externalId (shared by both records). The guest UPN is added as an alias so workspace object ACL entries stored under the B2B identity are correctly attributed to the principal.
Redundancy Detection
Redundancy is computed at the catalog level. Member-direct grants are compared against the group's effective privileges after privilege hierarchy expansion (e.g. ALL_PRIVILEGES implies SELECT, MODIFY implies SELECT):
| Level | Meaning | Recommended action |
|---|---|---|
| Full | Every member privilege is covered by the group | Safe to REVOKE — --revoke-script generates the SQL |
| Partial | Some overlap, some privileges unique to the member | Review: the tool lists which privileges are redundant and which are unique |
| None | No overlap | No action needed |
The --revoke-script flag generates copy-paste REVOKE SQL for both full and partial redundancy findings.
Architecture
databricks_access_audit/
├── __init__.py # Public API exports
├── __main__.py # python -m entry point
├── cli.py # argparse CLI (--group / --principal modes)
├── client.py # AuditClient protocol, DatabricksAPIClient, create_client()
├── sdk_client.py # DatabricksSDKClient (optional - requires databricks-sdk)
├── models.py # All dataclasses and enums
├── group_resolver.py # Recursive SCIM walker with bulk pre-fetch and caching
├── workspace.py # Multi-cloud workspace discovery and URL resolution
├── catalog_scanner.py # Catalog-level permission scanning
├── schema_scanner.py # Schema-level permission scanning
├── table_scanner.py # Table/view-level permission scanning
├── _classification.py # classify_grant() and build_member_lookups() shared helpers
├── redundancy.py # Privilege hierarchy expansion and redundancy detection
├── revoke.py # REVOKE SQL generation from RedundancyResult objects
├── principal_auditor.py # BFS reverse-lookup from user/SP/group
├── elevate.py # Just-in-time Workspace Admin elevation with cleanup guarantee
├── escalation.py # ALL_PRIVILEGES / MANAGE escalation detection
├── stale_checker.py # Stale grant detection via system.access.audit SQL
├── local_groups.py # Workspace-local (legacy) SCIM group detection
├── workspace_object_scanner.py # Workspace-level ACL scanning (13 types)
├── csv_output.py # CSV serialisation for group and principal audit results
└── snapshot.py # Snapshot build/save/load and delta comparison
Client protocol: AuditClient is a structural Protocol in client.py. Both DatabricksAPIClient and DatabricksSDKClient satisfy it — all scanners, resolvers, and the auditor accept either backend without modification.
SCIM performance: GroupMembershipResolver bulk-fetches all users and service principals in two paginated calls before resolving individual members, avoiding N+1 API calls. For accounts with fewer than ~10 000 users this is significantly faster than per-member lookups.
Identity Source Tagging (Internal vs. IdP-Synced)
Every user, service principal, and group in Databricks has an origin: either it was created directly inside Databricks (Databricks-managed / internal) or it was provisioned by an external identity provider via SCIM (IdP-synced / external).
The tool reads the SCIM externalId field — the standard SCIM indicator for IdP-provisioned principals — and tags every member accordingly.
| Source | Indicator | Examples |
|---|---|---|
IdP-synced (external) |
externalId present and non-empty |
Users/groups from Azure Entra ID, Okta, AWS IAM Identity Center, OneLogin |
Databricks-managed (internal) |
externalId absent or empty |
Accounts created in the Databricks UI, OAuth service principals, programmatically created SPs |
Group audit output
Users: 12 (8 IdP-synced, 4 Databricks-managed) | SPs: 2 (0 IdP-synced, 2 Databricks-managed)
JSON output adds users_external, users_internal, sps_external, sps_internal to the group audit result.
Principal audit output
Principal: alice@example.com (USER, external)
Group memberships (3, 2 IdP-synced, 1 Databricks-managed):
* data-engineers (direct, external)
- all-data-team (transitive, external)
- local-admins (transitive, internal)
JSON output adds "source": "external"/"internal" to each group in the groups array and a top-level "principal_source" field.
Why this matters
- Shadow accounts: Internal (Databricks-managed) users that don't correspond to any IdP identity may be orphaned service accounts, test accounts, or users who bypassed the provisioning process.
- Off-boarding gaps: If an IdP user was removed from the IdP but their Databricks account wasn't deprovisioned, they'll appear as
internal(no longer externally managed) — exactly the accounts that matter during an access review. - Compliance: Many security policies require that all human users be provisioned through the corporate IdP. Internal users are exceptions that need justification.
Python API
The source property is available on GroupMember, GroupNode, GroupMembership, and the principal_source property on PrincipalAuditResult:
from databricks_access_audit import PrincipalSource
node = resolver.resolve_group("data-engineers")
members = resolver.get_all_members_flat(node)
external = [u for u in members["users"] if u.source == PrincipalSource.EXTERNAL]
internal = [u for u in members["users"] if u.source == PrincipalSource.INTERNAL]
print(f"{len(external)} IdP-synced, {len(internal)} Databricks-managed")
Privilege Escalation Detection
The --escalation-check flag adds a security pass to the principal audit. After collecting all effective permissions it flags any grant that contains ALL_PRIVILEGES or MANAGE — the two privileges that represent meaningful escalation vectors in Unity Catalog:
| Privilege | Risk |
|---|---|
ALL_PRIVILEGES |
Grants unrestricted read/write/admin access to the securable and everything beneath it |
MANAGE |
Grants the ability to add and remove grants on the securable — can be used to self-escalate or escalate other principals |
databricks-access-audit --principal "alice@example.com" --cloud azure --escalation-check
Output (text):
Escalation risks (2):
! RISK [CATALOG] main: ALL_PRIVILEGES via data-engineers (transitive)
! RISK [CATALOG] staging: MANAGE via all-admins (transitive)
Output (JSON) adds an "escalation_findings" array to the principal audit result.
What it does not cover: workspace-level admin roles (WORKSPACE_ADMIN) are already visible in the workspace roles section. Databricks account-level admin is out of scope for the Unity Catalog scan.
Stale Grant Detection
The --stale-days N flag cross-references current member-direct catalog grants against system.access.audit — the Unity Catalog system table that records every API call, SQL command, and data-access event. Any grant holder with no recorded activity in the last N days is flagged as potentially stale.
Prerequisites
- System tables must be enabled for the account (the
systemcatalog must be visible in the metastore). - The audit SP must have
SELECTonsystem.access.audit(requires Metastore Admin or explicit grant). - A SQL warehouse with access to the system catalog must be available.
Usage
# Flag grants with no activity in 90 days
databricks-access-audit \
--group "data-engineers" \
--cloud azure \
--stale-days 90 \
--sql-warehouse-id "abc123def456" \
--sql-workspace-url "https://adb-123.azuredatabricks.net"
The tool queries:
SELECT principal, DATE(MAX(event_time)) AS last_seen_date
FROM system.access.audit
WHERE event_time >= DATEADD(DAY, -90, CURRENT_TIMESTAMP())
AND principal IS NOT NULL
GROUP BY 1
Grant holders absent from this result (no activity in the window) are returned as StaleFinding objects with last_access = None.
Stale findings do not automatically generate REVOKE SQL. Review them manually: absence from the audit log may indicate legitimate inactivity (e.g. a batch job that runs quarterly) rather than an unused grant.
Workspace-Local Group Detection
The --check-local-groups flag scans every workspace's SCIM directory and flags groups that exist only at the workspace level, absent from the account-level SCIM directory.
Workspace-local groups are a legacy artefact from before Unity Catalog. They cannot hold Unity Catalog grants, are not visible to the Account API, and are invisible to account-level SCIM tooling. If your account still has workspace-local groups, they will not appear in group audit results — this flag helps you find them.
databricks-access-audit --group "data-engineers" --cloud azure --check-local-groups
Output:
Workspace-local groups (2):
! legacy-analysts in 'prod-workspace' (5 members) - not in account SCIM
! old-read-only in 'staging-workspace' (2 members) - not in account SCIM
Works with both --group and --principal modes, and is available as a standalone Python API:
from databricks_access_audit import LocalGroupChecker, WorkspaceDiscovery
checker = LocalGroupChecker(client)
workspaces = WorkspaceDiscovery(client, "azure").discover()
findings = checker.check_all_workspaces(workspaces)
for f in findings:
print(f" {f.group_name} in '{f.workspace_name}' ({f.member_count} members) - workspace-local")
CSV Output
Pass --output csv to get audit results as comma-separated values — the format auditors and security leads need to share findings with people who won't run a CLI themselves.
# Group audit - export all grants to a spreadsheet
databricks-access-audit --group "data-engineers" --cloud azure \
--output csv > grants_$(date +%F).csv
# Principal audit - export all permissions
databricks-access-audit --principal "alice@example.com" --cloud azure \
--output csv > alice_permissions.csv
Group audit CSV contains up to three sections:
- Grants table — one row per grant (catalog, schema, or table level) with columns:
securable_type,workspace,securable_name,principal,principal_type,privileges(pipe-separated),grant_source,inherited_from. - Redundancy table (appended after a blank row when redundancies are found) — one row per redundant personal grant with
redundancy_levelandrecommendation. - Workspace objects table (appended when
--scan-workspace-objectsis used) — one row per workspace ACL grant withobject_type,object_id,object_name,workspace,principal,permission_level,grant_source.
Principal audit CSV contains:
- Permissions table — one row per effective Unity Catalog permission with
securable_type,securable_name,privileges,via_group,workspace. - Escalation table (appended when
--escalation-checkis used) — one row per escalation risk.
Combine with other flags — --scan-schemas --scan-tables expands scope before export, --escalation-check adds the escalation section to a principal audit CSV.
Diff / Delta Mode
Every audit run can save a timestamped JSON snapshot to disk. Pass a previous snapshot as --baseline and the tool reports exactly what changed: new grants, removed grants, new or removed group members.
This is the SOC 2 / ISO 27001 evidence workflow: "Prove permissions haven't drifted since the last quarterly review."
# Save the current state
databricks-access-audit --group "data-engineers" --cloud azure \
--save-snapshot snapshots/data-engineers_$(date +%F).json
# Three months later - run again and compare
databricks-access-audit --group "data-engineers" --cloud azure \
--baseline snapshots/data-engineers_2025-01-01.json
Output when changes are found:
============================================================
Diff: data-engineers (group)
Baseline: 2025-01-01T00:00:00+00:00
Current: 2025-04-01T12:34:56+00:00
============================================================
Grants added (1):
+ [CATALOG] main - bob@example.com (USE_CATALOG|SELECT)
Grants removed (1):
+ [CATALOG] staging - carol@example.com (MODIFY)
Members added (1):
+ Bob Jones (User)
============================================================
When nothing has changed:
No changes detected.
Snapshot format: plain JSON, human-readable without this library. Snapshots are versioned ("version": "1") and safe to store in version control alongside other compliance artefacts.
Change detection rules:
- A grant is "added" or "removed" based on a full-field fingerprint — any field change (including privilege modifications) is reported as a removal and addition pair, making every change explicit.
- Member identity is tracked by ID and type only — display-name changes are not flagged as membership churn.
--output csv and --baseline compose: --output csv --baseline previous.json exports the diff as CSV, one row per change, for import into a spreadsheet or SIEM.
--save-snapshot and --baseline are independent and can be combined in a single run:
databricks-access-audit --group "data-engineers" --cloud azure \
--baseline last_quarter.json \
--save-snapshot snapshots/data-engineers_$(date +%F).json
Python API:
from databricks_access_audit import (
build_group_snapshot, save_snapshot, load_snapshot, diff_snapshots,
)
# After running a group audit
snap = build_group_snapshot("data-engineers", members, catalog_grants, schema_grants, table_grants)
save_snapshot(snap, "snapshots/data-engineers_2025-04-01.json")
# Compare against a previous snapshot
baseline = load_snapshot("snapshots/data-engineers_2025-01-01.json")
diff = diff_snapshots(baseline, snap)
if diff.has_changes:
print(f" {len(diff.grants_added)} grants added")
print(f" {len(diff.grants_removed)} grants removed")
print(f" {len(diff.members_added)} members added")
print(f" {len(diff.members_removed)} members removed")
When Not to Use This Tool
If you only need to query permissions within a single workspace, Databricks Unity Catalog provides built-in system views you can query directly via SQL, without any external tooling:
-- Catalog-level grants in the current workspace
SELECT * FROM system.information_schema.catalog_privileges
WHERE grantee = 'data-engineers';
-- Schema-level grants
SELECT * FROM system.information_schema.schema_privileges
WHERE grantee = 'data-engineers';
-- Table-level grants
SELECT * FROM system.information_schema.table_privileges
WHERE grantee = 'data-engineers';
Use this tool when you need what INFORMATION_SCHEMA does not provide:
- Cross-workspace aggregation — scan all workspaces from a single entry point
- Recursive group membership resolution — nested groups, users, and SPs via SCIM
- Upstream group detection — grants inherited through ancestor groups
- Redundancy detection — personal grants that duplicate what the group already provides
- REVOKE script generation — automated cleanup recommendations
- Principal reverse lookup — map any identity to all accessible resources across the account
Known Limitations
- Unity Catalog only. The tool reads grants via the UC permission API and workspace object ACLs via the permissions REST API. Hive Metastore permissions and workspace-local table ACLs are not scanned.
- Workspace object scanning is opt-in.
--scan-workspace-objectscovers 13 object types: jobs, clusters, cluster policies, DLT pipelines, SQL warehouses, SQL queries, SQL alerts, Lakeview dashboards, Genie spaces, MLflow experiments, registered models (workspace registry), model serving endpoints, and Databricks Apps. Notebook ACLs are not covered — listing them requires recursively walking the workspace file tree (unbounded API calls). - Redundancy detection is catalog-level. Schema and table grants are reported but not included in the redundancy/REVOKE analysis.
- Stale detection is account-wide, not per-catalog.
system.access.auditdoes not always record per-catalog or per-object access; the stale check flags principals with no any recorded activity, which is a conservative but imprecise signal. - Escalation detection covers UC privileges only.
WORKSPACE_ADMINescalation through group membership is visible in the workspace roles section but not scored as an escalation finding. - Identity-federation workspace limitation.
/workspaces/{id}/permissionassignmentsonly returns explicit assignments. Principals who access a workspace solely through a built-in group like "account users" (no explicit assignment) will show no workspace roles, but the tool still scans their workspace object permissions using all discovered workspaces as a fallback.
Multi-Cloud Support
| Cloud | Account host | Workspace domain |
|---|---|---|
| Azure | accounts.azuredatabricks.net |
.azuredatabricks.net |
| AWS | accounts.cloud.databricks.com |
adb-<workspace-id>.cloud.databricks.com |
| GCP | accounts.gcp.databricks.com |
.gcp.databricks.com |
For AWS workspaces, the tool prefers the deployment_url field returned by the Account API (the canonical adb-<id> format) and falls back to constructing from workspace_id when that field is absent.
Only RUNNING workspaces are scanned. Workspaces in NOT_RUNNING, PROVISIONING, FAILED, BANNED, CANCELLING, or DELETED state are automatically skipped.
Development
pip install -e ".[sdk,dev]"
# Run all tests
pytest
# Run a single test file
pytest tests/test_catalog_scanner.py
# Run a single test by name
pytest tests/test_redundancy.py::test_full_redundancy
# Lint
ruff check .
Tests use the responses library for HTTP mocking — no real Databricks connection is required.
License
Apache 2.0 — see LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file databricks_access_audit-0.18.1.tar.gz.
File metadata
- Download URL: databricks_access_audit-0.18.1.tar.gz
- Upload date:
- Size: 143.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
df2695aa8b7e5a317fa7ee384da4bd8551b643bbbe36b8975c108e4098dd56a6
|
|
| MD5 |
169cfd58dd7e22300def932c1f934fa6
|
|
| BLAKE2b-256 |
ead052fd0223ecdd1512b54cdd5248c43eea377334fc587f5d3461be574ceef2
|
File details
Details for the file databricks_access_audit-0.18.1-py3-none-any.whl.
File metadata
- Download URL: databricks_access_audit-0.18.1-py3-none-any.whl
- Upload date:
- Size: 86.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
14995161a9dd73984d0d2bf4560220ee9ff90464fe52a32908d24ddc5fcbbac9
|
|
| MD5 |
ba60f6ada2ba8834ca29360c96ff8a48
|
|
| BLAKE2b-256 |
44e1ebd9858bd189993bf4ce9eed815b406f67dc84c04cf4f6fc3d45723aa2c2
|