Skip to main content

Open-source PySpark toolkit with connectors and CLI for Azure Storage, Databricks, Microsoft Fabric Lakehouses, Unity Catalog, and Hive Metastore.

Project description

spark-fuse

CI License

spark-fuse is an open-source toolkit for PySpark — providing utilities, connectors, and tools to fuse your data workflows across Azure Storage (ADLS Gen2), Databricks, Microsoft Fabric Lakehouses (via OneLake/Delta), Unity Catalog, and Hive Metastore.

Features

  • Connectors for ADLS Gen2 (abfss://), Fabric OneLake (onelake:// or abfss://...onelake.dfs.fabric.microsoft.com/...), and Databricks DBFS (dbfs:/).
  • Unity Catalog and Hive Metastore helpers to create catalogs/schemas and register external Delta tables.
  • SparkSession helpers with sensible defaults and environment detection (Databricks/Fabric/local).
  • Typer-powered CLI: list connectors, preview datasets, register tables, submit Databricks jobs.

Installation

  • Create a virtual environment (recommended)
    • macOS/Linux:
      • python3 -m venv .venv
      • source .venv/bin/activate
      • python -m pip install --upgrade pip
    • Windows (PowerShell):
      • python -m venv .venv
      • .\\.venv\\Scripts\\Activate.ps1
      • python -m pip install --upgrade pip
  • From source (dev): pip install -e ".[dev]"
  • From PyPI: pip install spark-fuse

Quickstart

  1. Create a SparkSession with helpful defaults
from spark_fuse.spark import create_session
spark = create_session(app_name="spark-fuse-quickstart")
  1. Read a Delta table from ADLS or OneLake
from spark_fuse.io.azure_adls import ADLSGen2Connector

df = ADLSGen2Connector().read(spark, "abfss://container@account.dfs.core.windows.net/path/to/delta")
df.show(5)
  1. Register an external table in Unity Catalog
from spark_fuse.catalogs import unity

unity.create_catalog(spark, "analytics")
unity.create_schema(spark, catalog="analytics", schema="core")
unity.register_external_delta_table(
    spark,
    catalog="analytics",
    schema="core",
    table="events",
    location="abfss://container@account.dfs.core.windows.net/path/to/delta",
)

CLI Usage

  • spark-fuse --help
  • spark-fuse connectors
  • spark-fuse read --path abfss://container@account.dfs.core.windows.net/path/to/delta --show 5
  • spark-fuse uc-create --catalog analytics --schema core
  • spark-fuse uc-register-table --catalog analytics --schema core --table events --path abfss://.../delta
  • spark-fuse hive-register-external --database analytics_core --table events --path abfss://.../delta
  • spark-fuse fabric-register --table lakehouse_table --path onelake://workspace/lakehouse/Tables/events
  • spark-fuse databricks-submit --json job.json

CI

  • GitHub Actions runs ruff and pytest for Python 3.9–3.11.

License

  • Apache 2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spark_fuse-0.1.3.tar.gz (13.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

spark_fuse-0.1.3-py3-none-any.whl (22.6 kB view details)

Uploaded Python 3

File details

Details for the file spark_fuse-0.1.3.tar.gz.

File metadata

  • Download URL: spark_fuse-0.1.3.tar.gz
  • Upload date:
  • Size: 13.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for spark_fuse-0.1.3.tar.gz
Algorithm Hash digest
SHA256 043f15f88f2ef259dcb77b32140d1cd2ac6549bd3f50fdd7e2fc2b7d7464d2b6
MD5 3fa8b99c71310e85dd39a7e4720e947b
BLAKE2b-256 443a0149967c7966205789c07615de57a71819cce19a32a7c4368df62637bc3a

See more details on using hashes here.

File details

Details for the file spark_fuse-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: spark_fuse-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 22.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for spark_fuse-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 d3b1cf3ce6cf21ff0cc3e0921cf6b7847f73e312354b242644d1a24d2a46f40b
MD5 8f849a7b8d16e44658ff6bf11b348f48
BLAKE2b-256 d4e29d7edb347b717c918880b63142f2b426db2ef4dcf3c23ed3dcd2fc24584c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page