Skip to main content

ADBC (Arrow Database Connectivity) driver for Apache Spark Connect

Project description

adbc-driver-spark (Python)

Apache Arrow ADBC driver for Apache Spark Connect.

It lets you run SQL against a Spark Connect server and get results back as Apache Arrow, with a standard ADBC and DBAPI 2.0 (PEP 249) interface. The package bundles a native shared library built from Go, so there is no JVM and no PySpark dependency.

Install

pip install adbc-driver-spark

Optional extras: adbc-driver-spark[pandas] for fetch_df(), adbc-driver-spark[polars] for Polars output.

Quickstart (DBAPI 2.0)

import adbc_driver_spark.dbapi as dbapi

with dbapi.connect("sc://localhost:15002") as conn:
    with conn.cursor() as cur:
        cur.execute("SELECT id, id * 2 AS doubled FROM range(5)")
        print(cur.fetchall())
        # Arrow / pandas in one shot:
        cur.execute("SELECT * FROM range(1000)")
        table = cur.fetch_arrow_table()   # pyarrow.Table
        df = cur.fetch_df()               # pandas.DataFrame (needs [pandas])

Connect to a secured server with a bearer token (TLS is implied):

conn = dbapi.connect("sc://my-host:443", token="my-jwt-token")

Low level ADBC

import adbc_driver_spark

db = adbc_driver_spark.connect(
    "sc://localhost:15002",
    db_kwargs={adbc_driver_spark.DatabaseOptions.USER_AGENT.value: "my-app/1.0"},
)
db.close()

Options

See adbc_driver_spark.DatabaseOptions. Everything that can go in the connection string (sc://host:port/;token=...;use_ssl=true) can also be passed via db_kwargs.

Development

The native library libadbc_driver_spark.{so,dylib,dll} must sit inside the adbc_driver_spark/ package directory (or on the loader path) at runtime. From a source checkout:

make python-dev      # builds the Go shared lib, copies it into the package,
                     # and `pip install -e python`
pytest python/tests  # unit tests run without a server; integration tests
                     # are skipped unless SPARK_CONNECT_URI is set

See the project documentation for the full guide.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

adbc_driver_spark-0.2.0.tar.gz (9.9 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

adbc_driver_spark-0.2.0-py3-none-win_amd64.whl (6.5 MB view details)

Uploaded Python 3Windows x86-64

adbc_driver_spark-0.2.0-py3-none-manylinux_2_34_x86_64.whl (6.8 MB view details)

Uploaded Python 3manylinux: glibc 2.34+ x86-64

adbc_driver_spark-0.2.0-py3-none-manylinux_2_34_aarch64.whl (6.1 MB view details)

Uploaded Python 3manylinux: glibc 2.34+ ARM64

adbc_driver_spark-0.2.0-py3-none-macosx_15_0_universal2.whl (6.4 MB view details)

Uploaded Python 3macOS 15.0+ universal2 (ARM64, x86-64)

adbc_driver_spark-0.2.0-py3-none-macosx_14_0_universal2.whl (5.9 MB view details)

Uploaded Python 3macOS 14.0+ universal2 (ARM64, x86-64)

File details

Details for the file adbc_driver_spark-0.2.0.tar.gz.

File metadata

  • Download URL: adbc_driver_spark-0.2.0.tar.gz
  • Upload date:
  • Size: 9.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for adbc_driver_spark-0.2.0.tar.gz
Algorithm Hash digest
SHA256 deb104df6a7c8160baa73b658da0e058b1814b512bdeb8ea93257a4920bc22bf
MD5 1dad2e0b755a6a808dd6062464c44821
BLAKE2b-256 795becac9238f5ca0a9ffc52d1c670519e14847460439e0997e2f97c6eda210e

See more details on using hashes here.

File details

Details for the file adbc_driver_spark-0.2.0-py3-none-win_amd64.whl.

File metadata

File hashes

Hashes for adbc_driver_spark-0.2.0-py3-none-win_amd64.whl
Algorithm Hash digest
SHA256 ef8f15da5be34870912b93482405e9d9544b08d389f7cd66bedd06da2bfa1d8e
MD5 0254c3fe46c08d0ab6bb2384dc2f456a
BLAKE2b-256 14a0d9669025559b1615a89bd04ff6880358c329d6827e06005238493755066f

See more details on using hashes here.

File details

Details for the file adbc_driver_spark-0.2.0-py3-none-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for adbc_driver_spark-0.2.0-py3-none-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 705750c01e52c71d22793d1a3926c7437455472a5208c34fe6c47c30061060f2
MD5 aa3cfec415216163572284297377a757
BLAKE2b-256 7498219731bd121e27d0a9792bda8fe724804a71ae71db77ba8a5c3773f693e6

See more details on using hashes here.

File details

Details for the file adbc_driver_spark-0.2.0-py3-none-manylinux_2_34_aarch64.whl.

File metadata

File hashes

Hashes for adbc_driver_spark-0.2.0-py3-none-manylinux_2_34_aarch64.whl
Algorithm Hash digest
SHA256 aab82f88c3951998c5360062fadee6f37bddaea3543a7391eb3081792dd8a691
MD5 fd93bd90c31cb90ba863067fc69940c7
BLAKE2b-256 c40b28039c33a744b9d192f5ef0dbfe51d7b17226c1ed5a81e61d060a5185255

See more details on using hashes here.

File details

Details for the file adbc_driver_spark-0.2.0-py3-none-macosx_15_0_universal2.whl.

File metadata

File hashes

Hashes for adbc_driver_spark-0.2.0-py3-none-macosx_15_0_universal2.whl
Algorithm Hash digest
SHA256 2e885c37bac16025a0e492e9486b86dcdb6a1ca133c8fa2a758cee82721e5556
MD5 a5f8b30f476fcf34eb3b01d340d7dd2c
BLAKE2b-256 12d6b00b8eb66e34c303bea73d0335905a162c94dc4a21127e8e8cca9983921c

See more details on using hashes here.

File details

Details for the file adbc_driver_spark-0.2.0-py3-none-macosx_14_0_universal2.whl.

File metadata

File hashes

Hashes for adbc_driver_spark-0.2.0-py3-none-macosx_14_0_universal2.whl
Algorithm Hash digest
SHA256 e0cdb0ead4eb1cf657358f09a5512d4a5b86b05e18f62d171aacc4265a9a69fe
MD5 f150e5a23932d92eeb0469aebad41662
BLAKE2b-256 508152d39c3a19d34b7e589c1479263974ecd0c712697c227ecd41a7ca7730f7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page