spark structured streaming connector utility for gcp bigquery using storage api services.
Project description
bigquery-streaming-connector
This is a library for bigquery streaming connector for pyspark structured streaming
The underlying connector uses bigquery storage api services to pull bigquery table data at scale using spark workers.
The storage api services is cheaper and faster than the traditional Bigquery query api services enabling faster & cheaper Bigquery migration incrementally in a continuous fashion.
Pre-requisite
Need spark 4.0.0 or Databricks runtime version 15.3 & above.
pip install bigquery-spark-streaming-connector
Pyspark usage:
from streaming_connector import bq_stream_register
query=(spark.readStream.format("bigquery-streaming")
.option("project_id", <bq_project_id>)
.option("incremental_checkpoint_field",<table_incremental_ts_based_col>)
.option("dataset",<bq_dataset_name>)
.option("table",<bq_table_name>)
.option("service_auth_json_file_name",<service_account_json_file_name>)
.option("max_parallel_conn",<max_parallel_threads_to_pull_data>) #defaults max 1000
.load()
## The above will ingest table data incrementally using the provided timestamp based field and latest value is checkpointed using offset semantics.
## Without the incremental input field full table ingestion is done.
## The service_account_json files need to be available to every spark executor workers in the '/home' folder using --files /home/<file_name>.json options or using init script (using Databricks spark)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
No source distribution files available for this release.See tutorial on generating distribution archives.
Built Distribution
File details
Details for the file bigquery_spark_streaming_connector-0.5.0-py3-none-any.whl
.
File metadata
- Download URL: bigquery_spark_streaming_connector-0.5.0-py3-none-any.whl
- Upload date:
- Size: 6.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8385fb653fd80afa10f2ace58353bee837f6a5020a88e89f7aaea08d241badb0 |
|
MD5 | c2a76cc41c871ee2e1be6c732ab8e295 |
|
BLAKE2b-256 | a07d56c2fb0e4fb3fe6d2a2c9ff3fea09c56c781e7f734f09d2fef771e17a522 |