dbt-fabricspark

A Microsoft Fabric Spark adapter plugin for dbt

These details have not been verified by PyPI

Project links

Project description

Python dbt-core License

dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.

dbt is the T in ELT. Organize, cleanse, denormalize, filter, rename, and pre-aggregate the raw data in your warehouse so that it's ready for analysis.

dbt-fabricspark

The dbt-fabricspark package contains all of the code enabling dbt to work with Apache Spark in Microsoft Fabric. This adapter connects to Fabric Lakehouses via Livy endpoints and supports both schema-enabled and non-schema Lakehouse configurations.

Current version: 1.9.3

Key Features

Livy session management with session reuse across dbt runs
Lakehouse with schema support — auto-detects schema-enabled lakehouses and uses three-part naming (lakehouse.schema.table)
Lakehouse without schema — standard two-part naming (lakehouse.table)
Materializations: table, view, incremental (append, merge, insert_overwrite), seed, snapshot
Fabric Environment support via environmentId configuration
Security: credential masking, UUID validation, HTTPS + domain validation, thread-safe token refresh
Resilience: HTTP 5xx retry with exponential backoff, bounded polling with configurable timeouts

Getting started

Install dbt
Read the introduction and viewpoint

Installation

pip install dbt-fabricspark

Configuration

Use a Livy endpoint to connect to Apache Spark in Microsoft Fabric. Configure your profiles.yml to connect via Livy endpoints.

Lakehouse without Schema

For standard Lakehouses (schema not enabled), use two-part naming. The schema field is set to the lakehouse name:

fabric-spark-test:
  target: fabricspark-dev
  outputs:
    fabricspark-dev:
        # Connection
        type: fabricspark
        method: livy
        endpoint: https://api.fabric.microsoft.com/v1
        workspaceid: <your-workspace-id>
        lakehouseid: <your-lakehouse-id>
        lakehouse: my_lakehouse
        schema: my_lakehouse
        threads: 1

        # Authentication (CLI for local dev, SPN for CI/CD)
        authentication: CLI
        # client_id: <your-client-id>        # Required for SPN
        # tenant_id: <your-tenant-id>        # Required for SPN
        # client_secret: <your-client-secret> # Required for SPN

        # Fabric Environment (optional)
        # environmentId: <your-environment-id>

        # Session management
        reuse_session: true
        session_idle_timeout: "30m"
        # session_id_file: ./livy-session-id.txt  # Default path

        # Timeouts
        connect_retries: 1
        connect_timeout: 10
        http_timeout: 120                   # Seconds per HTTP request
        session_start_timeout: 600          # Max wait for session start (10 min)
        statement_timeout: 3600             # Max wait for statement result (1 hour)
        poll_wait: 10                       # Seconds between session start polls
        poll_statement_wait: 5              # Seconds between statement result polls

        # Retry & Shortcuts
        retry_all: true
        # create_shortcuts: false
        # shortcuts_json_str: '<json-string>'

        # Spark configuration (optional)
        # spark_config:
        #   name: "my-spark-session"
        #   spark.executor.memory: "4g"

In this mode:

Tables are referenced as lakehouse.table_name
The schema field should match the lakehouse name
All objects are created directly under the lakehouse

Lakehouse with Schema (Schema-Enabled)

For schema-enabled Lakehouses, you can organize tables into schemas within the lakehouse. The adapter auto-detects whether a lakehouse has schemas enabled via the Fabric REST API (properties.defaultSchema):

fabric-spark-test:
  target: fabricspark-dev
  outputs:
    fabricspark-dev:
        # Connection
        type: fabricspark
        method: livy
        endpoint: https://api.fabric.microsoft.com/v1
        workspaceid: <your-workspace-id>
        lakehouseid: <your-lakehouse-id>
        lakehouse: my_lakehouse
        schema: my_schema              # Different from lakehouse name
        threads: 1

        # Authentication (CLI for local dev, SPN for CI/CD)
        authentication: CLI
        # client_id: <your-client-id>        # Required for SPN
        # tenant_id: <your-tenant-id>        # Required for SPN
        # client_secret: <your-client-secret> # Required for SPN

        # Fabric Environment (optional)
        # environmentId: <your-environment-id>

        # Session management
        reuse_session: true
        session_idle_timeout: "30m"
        # session_id_file: ./livy-session-id.txt  # Default path

        # Timeouts
        connect_retries: 1
        connect_timeout: 10
        http_timeout: 120                   # Seconds per HTTP request
        session_start_timeout: 600          # Max wait for session start (10 min)
        statement_timeout: 3600             # Max wait for statement result (1 hour)
        poll_wait: 10                       # Seconds between session start polls
        poll_statement_wait: 5              # Seconds between statement result polls

        # Retry & Shortcuts
        retry_all: true
        # create_shortcuts: false
        # shortcuts_json_str: '<json-string>'

        # Spark configuration (optional)
        # spark_config:
        #   name: "my-spark-session"
        #   spark.executor.memory: "4g"

In this mode:

Tables are referenced using three-part naming: lakehouse.schema.table_name
The schema field specifies the target schema within the lakehouse
dbt's generate_schema_name and generate_database_name macros are lakehouse-aware
Schemas are created automatically via CREATE DATABASE IF NOT EXISTS lakehouse.schema
Incremental models use persisted staging tables (instead of temp views) to work around Spark's REQUIRES_SINGLE_PART_NAMESPACE limitation

Schema Detection

The adapter detects whether a lakehouse has schemas enabled using two complementary mechanisms:

Runtime detection (Fabric REST API): During connection.open(), the adapter calls the Fabric REST API to fetch lakehouse properties. If the response contains defaultSchema, the lakehouse is treated as schema-enabled and three-part naming is used.
Parse-time detection (profile heuristic): During manifest parsing (before any connection is opened), the adapter checks whether schema differs from lakehouse in your profile. When they differ (e.g., lakehouse: bronze, schema: dbo), the adapter infers schema-enabled mode. This ensures correct schema resolution at compile time.

Important: For schema-enabled lakehouses, always set schema to a value different from lakehouse in your profile (e.g., schema: dbo). If schema equals lakehouse, the adapter cannot distinguish schema-enabled from non-schema mode at parse time, and the lakehouse name will be used as the schema name instead.

Lakehouse Type	`lakehouse`	`schema`	Naming
Without schema	`my_lakehouse`	`my_lakehouse`	`my_lakehouse.table_name`
With schema	`my_lakehouse`	`dbo`	`my_lakehouse.dbo.table_name`

Cross-Lakehouse Writes

A single profile can write to multiple lakehouses using the database config on individual models. The profile's lakehouse is the default target; set database on a model to redirect writes to a different lakehouse in the same workspace.

# profiles.yml — profile targets the "bronze" lakehouse
fabric-spark:
  type: fabricspark
  lakehouse: bronze
  schema: dbo
  # ... other settings

-- models/silver/silver_orders.sql — writes to the "silver" lakehouse
{{ config(
    materialized='table',
    database='silver',
    schema='dbo'
) }}

select * from {{ ref('bronze_orders') }}

In this example:

Seeds and bronze models write to bronze.dbo.* (the default lakehouse)
Silver models write to silver.dbo.* via database='silver'
Gold models write to gold.dbo.* via database='gold'
All three lakehouses must exist in the same Fabric workspace and have schemas enabled

Configuration Reference

Option	Type	Default	Description
`type`	string	—	Must be `fabricspark`
`method`	string	`livy`	Connection method
`endpoint`	string	`https://api.fabric.microsoft.com/v1`	Fabric API endpoint URL
`workspaceid`	string	—	Fabric workspace UUID
`lakehouseid`	string	—	Lakehouse UUID
`lakehouse`	string	—	Lakehouse name
`schema`	string	—	Schema name. Must equal `lakehouse` for non-schema lakehouses, must differ from `lakehouse` for schema-enabled (e.g., `dbo`)
`threads`	int	`1`	Number of threads for parallel execution
Authentication
`authentication`	string	`CLI`	Auth method: `CLI`, `SPN`, or `fabric_notebook`
`client_id`	string	—	Service principal client ID (SPN only)
`tenant_id`	string	—	Azure AD tenant ID (SPN only)
`client_secret`	string	—	Service principal secret (SPN only)
`accessToken`	string	—	Direct access token (optional)
Environment
`environmentId`	string	—	Fabric Environment ID for Spark configuration
`spark_config`	dict	`{}`	Spark session configuration (must include `name` key)
Session Management
`reuse_session`	bool	`false`	Keep Livy sessions alive for reuse across runs
`session_id_file`	string	`./livy-session-id.txt`	Path to file storing session ID for reuse
`session_idle_timeout`	string	`30m`	Livy session idle timeout (e.g. `30m`, `1h`)
Timeouts & Polling
`connect_retries`	int	`1`	Number of connection retries
`connect_timeout`	int	`10`	Connection timeout in seconds
`http_timeout`	int	`120`	Seconds per HTTP request to Fabric API
`session_start_timeout`	int	`600`	Max seconds to wait for session start
`statement_timeout`	int	`3600`	Max seconds to wait for statement result
`poll_wait`	int	`10`	Seconds between session start polls
`poll_statement_wait`	int	`5`	Seconds between statement result polls
Other
`retry_all`	bool	`false`	Retry all operations on failure
`create_shortcuts`	bool	`false`	Enable Fabric shortcut creation
`shortcuts_json_str`	string	—	JSON string defining shortcuts
`livy_mode`	string	`fabric`	`fabric` for Fabric cloud, `local` for local Livy
`livy_url`	string	`http://localhost:8998`	Local Livy URL (local mode only)

Authentication Modes

Mode	Value	Use Case	Required Fields
Azure CLI	`CLI`	Local development. Uses `az login` credentials.	None (run `az login` first)
Service Principal	`SPN`	CI/CD and automation. Uses Azure AD app registration.	`client_id`, `tenant_id`, `client_secret`
Fabric Notebook	`fabric_notebook`	Running dbt inside a Fabric notebook. Uses `notebookutils.credentials`.	None (runs in Fabric runtime)

Reporting bugs and contributing code

Want to report a bug or request a feature? Let us know on Slack, or open an issue
Want to help us build dbt? Check out the Contributing Guide

Join the dbt Community

Be part of the conversation in the dbt Community Slack
Read more on the dbt Community Discourse

Code of Conduct

Everyone interacting in the dbt project's codebases, issue trackers, chat rooms, and mailing lists is expected to follow the dbt Code of Conduct.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.9.5

Apr 12, 2026

This version

1.9.4

Apr 12, 2026

1.9.3

Apr 9, 2026

1.9.1 yanked

Apr 22, 2025

Reason this release was yanked:

Accidental upload

1.9.0

Apr 22, 2025

1.8.0b1 pre-release

Mar 3, 2025

1.7.0rc1 pre-release

May 5, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dbt_fabricspark-1.9.4.tar.gz (215.6 kB view details)

Uploaded Apr 12, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

dbt_fabricspark-1.9.4-py3-none-any.whl (66.2 kB view details)

Uploaded Apr 12, 2026 Python 3

File details

Details for the file dbt_fabricspark-1.9.4.tar.gz.

File metadata

Download URL: dbt_fabricspark-1.9.4.tar.gz
Upload date: Apr 12, 2026
Size: 215.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.11.6 {"installer":{"name":"uv","version":"0.11.6","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for dbt_fabricspark-1.9.4.tar.gz
Algorithm	Hash digest
SHA256	`46a1a179972077a763f588c13298297e9933b43b9b132a02de8b5065a770220e`
MD5	`4d32e39230e99d6d76df7aeccae19854`
BLAKE2b-256	`3d9436ad579305db248365356d312cdd792804217bfe33276978ec433b177b0f`

See more details on using hashes here.

File details

Details for the file dbt_fabricspark-1.9.4-py3-none-any.whl.

File metadata

Download URL: dbt_fabricspark-1.9.4-py3-none-any.whl
Upload date: Apr 12, 2026
Size: 66.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.11.6 {"installer":{"name":"uv","version":"0.11.6","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for dbt_fabricspark-1.9.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`4a887df030566bae86b572847e18c6dc4f7c0e4542453808fae8399467647838`
MD5	`be2ddc9db101542a46c2a407203a3475`
BLAKE2b-256	`1a795e367b7138240db026b5f0e83415ef4550e9a1143a4f20fb11ecacde13bc`

See more details on using hashes here.

dbt-fabricspark 1.9.4

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

dbt-fabricspark

Key Features

Getting started

Installation

Configuration

Lakehouse without Schema

Lakehouse with Schema (Schema-Enabled)

Schema Detection

Cross-Lakehouse Writes

Configuration Reference

Authentication Modes

Reporting bugs and contributing code

Join the dbt Community

Code of Conduct

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes