Skip to main content

ADBC (Arrow Database Connectivity) driver for Apache Spark Connect

Project description

adbc-driver-spark (Python)

Apache Arrow ADBC driver for Apache Spark Connect.

It lets you run SQL against a Spark Connect server and get results back as Apache Arrow, with a standard ADBC and DBAPI 2.0 (PEP 249) interface. The package bundles a native shared library built from Go, so there is no JVM and no PySpark dependency.

Install

pip install adbc-driver-spark

Optional extras: adbc-driver-spark[pandas] for fetch_df(), adbc-driver-spark[polars] for Polars output.

Quickstart (DBAPI 2.0)

import adbc_driver_spark.dbapi as dbapi

with dbapi.connect("sc://localhost:15002") as conn:
    with conn.cursor() as cur:
        cur.execute("SELECT id, id * 2 AS doubled FROM range(5)")
        print(cur.fetchall())
        # Arrow / pandas in one shot:
        cur.execute("SELECT * FROM range(1000)")
        table = cur.fetch_arrow_table()   # pyarrow.Table
        df = cur.fetch_df()               # pandas.DataFrame (needs [pandas])

Connect to a secured server with a bearer token (TLS is implied):

conn = dbapi.connect("sc://my-host:443", token="my-jwt-token")

Low level ADBC

import adbc_driver_spark

db = adbc_driver_spark.connect(
    "sc://localhost:15002",
    db_kwargs={adbc_driver_spark.DatabaseOptions.USER_AGENT.value: "my-app/1.0"},
)
db.close()

Options

See adbc_driver_spark.DatabaseOptions, ConnectionOptions, and StatementOptions. Everything that can go in the connection string (sc://host:port/;token=...;use_ssl=true) can also be passed via db_kwargs.

Development

The native library libadbc_driver_spark.{so,dylib,dll} must sit inside the adbc_driver_spark/ package directory (or on the loader path) at runtime. From a source checkout:

make python-dev      # builds the Go shared lib, copies it into the package,
                     # and `pip install -e python`
pytest python/tests  # unit tests run without a server; integration tests
                     # are skipped unless SPARK_CONNECT_URI is set

See the project documentation for the full guide.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

adbc_driver_spark-0.1.0.tar.gz (9.7 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

adbc_driver_spark-0.1.0-py3-none-win_amd64.whl (6.5 MB view details)

Uploaded Python 3Windows x86-64

adbc_driver_spark-0.1.0-py3-none-manylinux_2_34_x86_64.whl (6.8 MB view details)

Uploaded Python 3manylinux: glibc 2.34+ x86-64

adbc_driver_spark-0.1.0-py3-none-manylinux_2_34_aarch64.whl (6.1 MB view details)

Uploaded Python 3manylinux: glibc 2.34+ ARM64

adbc_driver_spark-0.1.0-py3-none-macosx_15_0_universal2.whl (6.4 MB view details)

Uploaded Python 3macOS 15.0+ universal2 (ARM64, x86-64)

adbc_driver_spark-0.1.0-py3-none-macosx_14_0_universal2.whl (5.9 MB view details)

Uploaded Python 3macOS 14.0+ universal2 (ARM64, x86-64)

File details

Details for the file adbc_driver_spark-0.1.0.tar.gz.

File metadata

  • Download URL: adbc_driver_spark-0.1.0.tar.gz
  • Upload date:
  • Size: 9.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for adbc_driver_spark-0.1.0.tar.gz
Algorithm Hash digest
SHA256 5c62c372478a25474d52145139d7e8c357b269be3df2781633f31357f8bad840
MD5 d86408226cdf638d207962e1b385851a
BLAKE2b-256 98eb2d9a9a7695856bc9adc7bbd8e3abc2bfd0c55f592584dbb3a5254027b47a

See more details on using hashes here.

File details

Details for the file adbc_driver_spark-0.1.0-py3-none-win_amd64.whl.

File metadata

File hashes

Hashes for adbc_driver_spark-0.1.0-py3-none-win_amd64.whl
Algorithm Hash digest
SHA256 e7b437c18b8f10c6e9d5299e35d93c493091733d29129605610675d922de7f1e
MD5 3c6d62daf2828f0903f5c8d3c9f9ec47
BLAKE2b-256 50e8e0631530771402accaa37742d623a2205db07f2e37a4c1772b7cb9e8ebba

See more details on using hashes here.

File details

Details for the file adbc_driver_spark-0.1.0-py3-none-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for adbc_driver_spark-0.1.0-py3-none-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 bfa2295be97d5aa8195219c05ba513c40609cff8aff9b191bb18fbffabf20bb8
MD5 f98328848a8703d0f2e184d7ce98739e
BLAKE2b-256 dd2fc05982c67150c4ea2649aa80f5e02f62fab28d4a0e4c453ebc97db4d62a4

See more details on using hashes here.

File details

Details for the file adbc_driver_spark-0.1.0-py3-none-manylinux_2_34_aarch64.whl.

File metadata

File hashes

Hashes for adbc_driver_spark-0.1.0-py3-none-manylinux_2_34_aarch64.whl
Algorithm Hash digest
SHA256 f28660f99d61125dace32dc686518ab13dbaf96c41fb9ac76d54ef7b2d5f812c
MD5 09f8b5d31e6c46a5a8a83e67fdd4dfb2
BLAKE2b-256 a7f4065df7341701b77c088e72069f9eb410f84621dd1ab114e0d31c97d4396a

See more details on using hashes here.

File details

Details for the file adbc_driver_spark-0.1.0-py3-none-macosx_15_0_universal2.whl.

File metadata

File hashes

Hashes for adbc_driver_spark-0.1.0-py3-none-macosx_15_0_universal2.whl
Algorithm Hash digest
SHA256 7116b96a91ef449e8b0d8453f9095b60a73949be7f899a63e8cdec8bd9ed7947
MD5 a8869084dfab1586d3823b72f450eba8
BLAKE2b-256 0d90f4fab324cad51204d69031f83545b763d8bc7b74b7719b3d82c71132ff04

See more details on using hashes here.

File details

Details for the file adbc_driver_spark-0.1.0-py3-none-macosx_14_0_universal2.whl.

File metadata

File hashes

Hashes for adbc_driver_spark-0.1.0-py3-none-macosx_14_0_universal2.whl
Algorithm Hash digest
SHA256 9755714427be26ed1ea7e7e9bf8c120ec27e439e076cda8f2bdba425f13212ba
MD5 03b783efe1d0528f44e2a6415f7a49a7
BLAKE2b-256 2d2b694eb7d511d29b581aee2b2171a8059acf58fbd2d3b5148cd36345cfe613

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page