Skip to main content

A custom PySpark extension for writing data to DuckDB

Project description

DuckDB Extension for PySpark

Since DuckDB supports only a single writer at a time, writing directly from PySpark can lead to locking errors due to Spark's multi-worker write process.

This custom PySpark extension provides a reliable way to write DataFrames to DuckDB, ensuring smooth data transfer without concurrency issues.

Features

  • Seamlessly write PySpark DataFrames to DuckDB
  • Supports overwrite and append modes
  • Automatically detects and adds new columns when appending data
  • Simple integration with PySpark's DataFrameWriter API

Installation

You can install the package using pip:

pip install duckdb-spark


## Usage

```bash
from pyspark.sql import SparkSession
from duckdb_extension import register_duckdb_extension

spark = SparkSession.builder.appName("DuckDB Example").getOrCreate()

# Register the DuckDB extension
register_duckdb_extension(spark)

df=spark.read.csv("employe.csv",header=True)

# Use the custom extension to write the DataFrame to DuckDB and specify the table name
df.write.duckdb_extension(database="./company_database.duckdb", table_name="employe_tbl", mode="append")

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

duckdb_spark-1.0.3.tar.gz (3.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

duckdb_spark-1.0.3-py3-none-any.whl (4.1 kB view details)

Uploaded Python 3

File details

Details for the file duckdb_spark-1.0.3.tar.gz.

File metadata

  • Download URL: duckdb_spark-1.0.3.tar.gz
  • Upload date:
  • Size: 3.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.12

File hashes

Hashes for duckdb_spark-1.0.3.tar.gz
Algorithm Hash digest
SHA256 288c2fb226a0d438dc3c19ccb32567dc03c103b023aa75708c4041e7f8e2226d
MD5 1f110d5b2c92689f2b73cddbd348f4de
BLAKE2b-256 5d0e256710ac15ffc35ee04a0385e6c096246e2986847a920e0551cadd8ce6e6

See more details on using hashes here.

File details

Details for the file duckdb_spark-1.0.3-py3-none-any.whl.

File metadata

  • Download URL: duckdb_spark-1.0.3-py3-none-any.whl
  • Upload date:
  • Size: 4.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.12

File hashes

Hashes for duckdb_spark-1.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 e330baf55bacbebed5fdf66be935750294f21e1dc4613e509a15fe1706078fd3
MD5 5c32299e5d3b2ca007547ae2462f10e7
BLAKE2b-256 079b752e445f35d722c20247cf257cac77dfa3fdbcdc08cc101a4d731b26baab

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page