spark structured streaming connector utility for gcp bigquery using storage api services.

Project description

bigquery-streaming-connector

This is a library for bigquery streaming connector for pyspark.

The underlying connector uses bigquery storage api services to pull bigquery table data at scale using spark workers.

The storage api services is cheaper and faster than the traditional Bigquery query api services enabling faster & cheaper Bigquery migration incrementally in a continuous fashion.

Pre-requisite

pip install build.whl

Pyspark usage:

from streaming_connector import bq_stream_register query=(spark.readStream.format("bigquery-streaming") .option("project_id", <bq_project_id>) .option("incremental_checkpoint_field",<table_incremental_ts_based_col>) .option("dataset",<bq_dataset_name>) .option("table",<bq_table_name>) .load() ## The above will ingest table data incrementally using the provided timestamp based field and latest value is checkpointed using offset semantics. ## Without the incremental input field full table ingestion is done.

Project details

Release history Release notifications | RSS feed

0.7.0

Oct 16, 2024

0.6.0

Oct 15, 2024

0.5.0

Sep 23, 2024

0.4.0

Sep 23, 2024

0.3.0

Sep 23, 2024

0.2.0

Sep 19, 2024

This version

0.1.0

Sep 19, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

bigquery_spark_streaming_connector-0.1.0-py3-none-any.whl (6.0 kB view details)

Uploaded Sep 19, 2024 Python 3

File details

Details for the file bigquery_spark_streaming_connector-0.1.0-py3-none-any.whl.

File metadata

Download URL: bigquery_spark_streaming_connector-0.1.0-py3-none-any.whl
Upload date: Sep 19, 2024
Size: 6.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.12.2

File hashes

Hashes for bigquery_spark_streaming_connector-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`feb8f34d96acd1c56e492e803dab5bd143a7540ca5b8ac8449d5bfeb056c4ad3`
MD5	`be0922e374b964102873ce27ac6a6884`
BLAKE2b-256	`137e7946cb8700a76cd4f3c759c99885c23e10e4015b8329a460bb29bfd94a0c`