Skip to main content

Open-source PySpark toolkit with connectors and CLI for Azure Storage, Databricks, Microsoft Fabric Lakehouses, Unity Catalog, and Hive Metastore.

Project description

spark-fuse

CI License

spark-fuse is an open-source toolkit for PySpark — providing utilities, connectors, and tools to fuse your data workflows across Azure Storage (ADLS Gen2), Databricks, Microsoft Fabric Lakehouses (via OneLake/Delta), Unity Catalog, and Hive Metastore.

Features

  • Connectors for ADLS Gen2 (abfss://), Fabric OneLake (onelake:// or abfss://...onelake.dfs.fabric.microsoft.com/...), and Databricks DBFS (dbfs:/).
  • Unity Catalog and Hive Metastore helpers to create catalogs/schemas and register external Delta tables.
  • SparkSession helpers with sensible defaults and environment detection (Databricks/Fabric/local).
  • Typer-powered CLI: list connectors, preview datasets, register tables, submit Databricks jobs.

Installation

  • Create a virtual environment (recommended)
    • macOS/Linux:
      • python3 -m venv .venv
      • source .venv/bin/activate
      • python -m pip install --upgrade pip
    • Windows (PowerShell):
      • python -m venv .venv
      • .\\.venv\\Scripts\\Activate.ps1
      • python -m pip install --upgrade pip
  • From source (dev): pip install -e ".[dev]"
  • From PyPI: pip install spark-fuse

Quickstart

  1. Create a SparkSession with helpful defaults
from spark_fuse.spark import create_session
spark = create_session(app_name="spark-fuse-quickstart")
  1. Read a Delta table from ADLS or OneLake
from spark_fuse.io.azure_adls import ADLSGen2Connector

df = ADLSGen2Connector().read(spark, "abfss://container@account.dfs.core.windows.net/path/to/delta")
df.show(5)
  1. Register an external table in Unity Catalog
from spark_fuse.catalogs import unity

unity.create_catalog(spark, "analytics")
unity.create_schema(spark, catalog="analytics", schema="core")
unity.register_external_delta_table(
    spark,
    catalog="analytics",
    schema="core",
    table="events",
    location="abfss://container@account.dfs.core.windows.net/path/to/delta",
)

CLI Usage

  • spark-fuse --help
  • spark-fuse connectors
  • spark-fuse read --path abfss://container@account.dfs.core.windows.net/path/to/delta --show 5
  • spark-fuse uc-create --catalog analytics --schema core
  • spark-fuse uc-register-table --catalog analytics --schema core --table events --path abfss://.../delta
  • spark-fuse hive-register-external --database analytics_core --table events --path abfss://.../delta
  • spark-fuse fabric-register --table lakehouse_table --path onelake://workspace/lakehouse/Tables/events
  • spark-fuse databricks-submit --json job.json

CI

  • GitHub Actions runs ruff and pytest for Python 3.9–3.11.

License

  • Apache 2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spark_fuse-0.1.6.tar.gz (14.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

spark_fuse-0.1.6-py3-none-any.whl (22.6 kB view details)

Uploaded Python 3

File details

Details for the file spark_fuse-0.1.6.tar.gz.

File metadata

  • Download URL: spark_fuse-0.1.6.tar.gz
  • Upload date:
  • Size: 14.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for spark_fuse-0.1.6.tar.gz
Algorithm Hash digest
SHA256 17c72debaa3a49230e4d89211a7bcbe3c7528ab302295032bc2c8d5d0fd6a391
MD5 d397bc70e40ff330696fca2a4bb41353
BLAKE2b-256 e1e5dd74d8aee809f9e1e461b78638c58ec0d30c66360b850d8dbd6fa9acf17c

See more details on using hashes here.

Provenance

The following attestation bundles were made for spark_fuse-0.1.6.tar.gz:

Publisher: publish.yml on kevinsames/spark-fuse

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file spark_fuse-0.1.6-py3-none-any.whl.

File metadata

  • Download URL: spark_fuse-0.1.6-py3-none-any.whl
  • Upload date:
  • Size: 22.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for spark_fuse-0.1.6-py3-none-any.whl
Algorithm Hash digest
SHA256 c1c2c2351f1cc672962d394c8b65bf9d2e0ab4ddfdc156e039f2a66a3ceba67c
MD5 c70160542d51548cc9fd424046390d1f
BLAKE2b-256 c3cd6ee9a0cd28c67a3c3e7064a006178fb892407e984aa82990677c7381614f

See more details on using hashes here.

Provenance

The following attestation bundles were made for spark_fuse-0.1.6-py3-none-any.whl:

Publisher: publish.yml on kevinsames/spark-fuse

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page