Skip to main content

A custom PySpark extension for writing data to DuckDB

Project description

DuckDB Extension for PySpark

Since DuckDB supports only a single writer at a time, writing directly from PySpark can lead to locking errors due to Spark's multi-worker write process.

This custom PySpark extension provides a reliable way to write DataFrames to DuckDB, ensuring smooth data transfer without concurrency issues.

Features

  • Seamlessly write PySpark DataFrames to DuckDB
  • Supports overwrite and append modes
  • Automatically detects and adds new columns when appending data
  • Simple integration with PySpark's DataFrameWriter API

Installation

You can install the package using pip:

pip install duckdb-spark


## Usage

Here’s how you can use the extension:

```python
from pyspark.sql import SparkSession
from duckdb_extension import DuckDBWriter

# Initialize Spark Session
spark = SparkSession.builder.appName("DuckDBExample").getOrCreate()

# Create Sample DataFrame
df = spark.createDataFrame([(1, "Alice"), (2, "Bob")], ["id", "name"])

# Write to DuckDB using the custom extension
df.write.duckdb_extension("my_db.duckdb", "users", mode="overwrite")

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

duckdb_spark-1.0.2.tar.gz (3.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

duckdb_spark-1.0.2-py3-none-any.whl (4.1 kB view details)

Uploaded Python 3

File details

Details for the file duckdb_spark-1.0.2.tar.gz.

File metadata

  • Download URL: duckdb_spark-1.0.2.tar.gz
  • Upload date:
  • Size: 3.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.12

File hashes

Hashes for duckdb_spark-1.0.2.tar.gz
Algorithm Hash digest
SHA256 07dfbf7453ec0b6d69293567236cdc915b201323dca122715b3b78cec28f11f4
MD5 84241e49f15531cceb0ddc5e95327194
BLAKE2b-256 938d065f4e01d1673912ae3d144cfa93d6f942caade801121647f13a28c5d566

See more details on using hashes here.

File details

Details for the file duckdb_spark-1.0.2-py3-none-any.whl.

File metadata

  • Download URL: duckdb_spark-1.0.2-py3-none-any.whl
  • Upload date:
  • Size: 4.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.12

File hashes

Hashes for duckdb_spark-1.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 2325267a4fded8d544294c563692f09820877a62d38799632e88dc10cba4216a
MD5 35172619135ea885fd0ff4397633f37b
BLAKE2b-256 a372df11fec2ee0d99ed2f2d10005ac3039d049c07f9f17ace923897fb919932

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page