Skip to main content

Open-source PySpark toolkit with connectors and CLI for Azure Storage, Databricks, Microsoft Fabric Lakehouses, Unity Catalog, and Hive Metastore.

Project description

spark-fuse

CI License

spark-fuse is an open-source toolkit for PySpark — providing utilities, connectors, and tools to fuse your data workflows across Azure Storage (ADLS Gen2), Databricks, Microsoft Fabric Lakehouses (via OneLake/Delta), Unity Catalog, and Hive Metastore.

Features

  • Connectors for ADLS Gen2 (abfss://), Fabric OneLake (onelake:// or abfss://...onelake.dfs.fabric.microsoft.com/...), and Databricks DBFS (dbfs:/).
  • Unity Catalog and Hive Metastore helpers to create catalogs/schemas and register external Delta tables.
  • SparkSession helpers with sensible defaults and environment detection (Databricks/Fabric/local).
  • Typer-powered CLI: list connectors, preview datasets, register tables, submit Databricks jobs.

Installation

  • Create a virtual environment (recommended)
    • macOS/Linux:
      • python3 -m venv .venv
      • source .venv/bin/activate
      • python -m pip install --upgrade pip
    • Windows (PowerShell):
      • python -m venv .venv
      • .\\.venv\\Scripts\\Activate.ps1
      • python -m pip install --upgrade pip
  • From source (dev): pip install -e ".[dev]"
  • From PyPI: pip install spark-fuse

Quickstart

  1. Create a SparkSession with helpful defaults
from spark_fuse.spark import create_session
spark = create_session(app_name="spark-fuse-quickstart")
  1. Read a Delta table from ADLS or OneLake
from spark_fuse.io.azure_adls import ADLSGen2Connector

df = ADLSGen2Connector().read(spark, "abfss://container@account.dfs.core.windows.net/path/to/delta")
df.show(5)
  1. Register an external table in Unity Catalog
from spark_fuse.catalogs import unity

unity.create_catalog(spark, "analytics")
unity.create_schema(spark, catalog="analytics", schema="core")
unity.register_external_delta_table(
    spark,
    catalog="analytics",
    schema="core",
    table="events",
    location="abfss://container@account.dfs.core.windows.net/path/to/delta",
)

CLI Usage

  • spark-fuse --help
  • spark-fuse connectors
  • spark-fuse read --path abfss://container@account.dfs.core.windows.net/path/to/delta --show 5
  • spark-fuse uc-create --catalog analytics --schema core
  • spark-fuse uc-register-table --catalog analytics --schema core --table events --path abfss://.../delta
  • spark-fuse hive-register-external --database analytics_core --table events --path abfss://.../delta
  • spark-fuse fabric-register --table lakehouse_table --path onelake://workspace/lakehouse/Tables/events
  • spark-fuse databricks-submit --json job.json

CI

  • GitHub Actions runs ruff and pytest for Python 3.9–3.11.

License

  • Apache 2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spark_fuse-0.1.5.tar.gz (13.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

spark_fuse-0.1.5-py3-none-any.whl (22.6 kB view details)

Uploaded Python 3

File details

Details for the file spark_fuse-0.1.5.tar.gz.

File metadata

  • Download URL: spark_fuse-0.1.5.tar.gz
  • Upload date:
  • Size: 13.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for spark_fuse-0.1.5.tar.gz
Algorithm Hash digest
SHA256 fad5e24253085ec36bca17109e8d119b4ce2efaf71e450b6f2968c4473f05b24
MD5 b6f6e191b77271d048382603d19e90d0
BLAKE2b-256 9903ec3c7d6f8b0377342ec4036c9694566a472dc2bda8807dce6d9bf4d1b65c

See more details on using hashes here.

Provenance

The following attestation bundles were made for spark_fuse-0.1.5.tar.gz:

Publisher: publish.yml on kevinsames/spark-fuse

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file spark_fuse-0.1.5-py3-none-any.whl.

File metadata

  • Download URL: spark_fuse-0.1.5-py3-none-any.whl
  • Upload date:
  • Size: 22.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for spark_fuse-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 435f72af9270d6202e1534486a6f44b7963f9337cf1e55b390f530545bf2d8db
MD5 31aaa20aa1409bc129c672b975d6b710
BLAKE2b-256 2b81c871dd0884587c021cb9d820df3521f6e0d585d479494f38da51236b8293

See more details on using hashes here.

Provenance

The following attestation bundles were made for spark_fuse-0.1.5-py3-none-any.whl:

Publisher: publish.yml on kevinsames/spark-fuse

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page