Open-source PySpark toolkit with connectors and CLI for Azure Storage, Databricks, Microsoft Fabric Lakehouses, Unity Catalog, and Hive Metastore.
Project description
spark-fuse
spark-fuse is an open-source toolkit for PySpark — providing utilities, connectors, and tools to fuse your data workflows across Azure Storage (ADLS Gen2), Databricks, Microsoft Fabric Lakehouses (via OneLake/Delta), Unity Catalog, and Hive Metastore.
Features
- Connectors for ADLS Gen2 (
abfss://), Fabric OneLake (onelake://orabfss://...onelake.dfs.fabric.microsoft.com/...), and Databricks DBFS (dbfs:/). - Unity Catalog and Hive Metastore helpers to create catalogs/schemas and register external Delta tables.
- SparkSession helpers with sensible defaults and environment detection (Databricks/Fabric/local).
- Typer-powered CLI: list connectors, preview datasets, register tables, submit Databricks jobs.
Installation
- Create a virtual environment (recommended)
- macOS/Linux:
python3 -m venv .venvsource .venv/bin/activatepython -m pip install --upgrade pip
- Windows (PowerShell):
python -m venv .venv.\\.venv\\Scripts\\Activate.ps1python -m pip install --upgrade pip
- macOS/Linux:
- From source (dev):
pip install -e ".[dev]" - From PyPI:
pip install spark-fuse
Quickstart
- Create a SparkSession with helpful defaults
from spark_fuse.spark import create_session
spark = create_session(app_name="spark-fuse-quickstart")
- Read a Delta table from ADLS or OneLake
from spark_fuse.io.azure_adls import ADLSGen2Connector
df = ADLSGen2Connector().read(spark, "abfss://container@account.dfs.core.windows.net/path/to/delta")
df.show(5)
- Register an external table in Unity Catalog
from spark_fuse.catalogs import unity
unity.create_catalog(spark, "analytics")
unity.create_schema(spark, catalog="analytics", schema="core")
unity.register_external_delta_table(
spark,
catalog="analytics",
schema="core",
table="events",
location="abfss://container@account.dfs.core.windows.net/path/to/delta",
)
CLI Usage
spark-fuse --helpspark-fuse connectorsspark-fuse read --path abfss://container@account.dfs.core.windows.net/path/to/delta --show 5spark-fuse uc-create --catalog analytics --schema corespark-fuse uc-register-table --catalog analytics --schema core --table events --path abfss://.../deltaspark-fuse hive-register-external --database analytics_core --table events --path abfss://.../deltaspark-fuse fabric-register --table lakehouse_table --path onelake://workspace/lakehouse/Tables/eventsspark-fuse databricks-submit --json job.json
CI
- GitHub Actions runs ruff and pytest for Python 3.9–3.11.
License
- Apache 2.0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file spark_fuse-0.1.7.tar.gz.
File metadata
- Download URL: spark_fuse-0.1.7.tar.gz
- Upload date:
- Size: 16.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e108a71c5519d6812bea57e76a3b451f29ca0e8f8e0ae192be3de0d008f8ccba
|
|
| MD5 |
0fddbcfe5fac75d911bf8697c70f90f3
|
|
| BLAKE2b-256 |
c2a5dc81dc0e0fb25d2fd0adf49cd7a9a32a0f4c9c7cacb7dbf9da6317e6dd50
|
Provenance
The following attestation bundles were made for spark_fuse-0.1.7.tar.gz:
Publisher:
publish.yml on kevinsames/spark-fuse
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
spark_fuse-0.1.7.tar.gz -
Subject digest:
e108a71c5519d6812bea57e76a3b451f29ca0e8f8e0ae192be3de0d008f8ccba - Sigstore transparency entry: 585367294
- Sigstore integration time:
-
Permalink:
kevinsames/spark-fuse@1ca515ced37c700bcfe315bc8763824ea9f1486d -
Branch / Tag:
refs/tags/v.0.1.7 - Owner: https://github.com/kevinsames
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@1ca515ced37c700bcfe315bc8763824ea9f1486d -
Trigger Event:
release
-
Statement type:
File details
Details for the file spark_fuse-0.1.7-py3-none-any.whl.
File metadata
- Download URL: spark_fuse-0.1.7-py3-none-any.whl
- Upload date:
- Size: 25.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d80b54854af32a9bc0eaca081c56dbbaab70ffd0a9a3701efd5ba72e7c098435
|
|
| MD5 |
05fbbca36d1e2f276fc3be8cec8e213a
|
|
| BLAKE2b-256 |
2354c3483d6d28433dc3c99d442d484b3fd85a29866bba90de67bff2ccf399f5
|
Provenance
The following attestation bundles were made for spark_fuse-0.1.7-py3-none-any.whl:
Publisher:
publish.yml on kevinsames/spark-fuse
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
spark_fuse-0.1.7-py3-none-any.whl -
Subject digest:
d80b54854af32a9bc0eaca081c56dbbaab70ffd0a9a3701efd5ba72e7c098435 - Sigstore transparency entry: 585367307
- Sigstore integration time:
-
Permalink:
kevinsames/spark-fuse@1ca515ced37c700bcfe315bc8763824ea9f1486d -
Branch / Tag:
refs/tags/v.0.1.7 - Owner: https://github.com/kevinsames
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@1ca515ced37c700bcfe315bc8763824ea9f1486d -
Trigger Event:
release
-
Statement type: