Snowflake loader for mkpipe.
Project description
mkpipe-loader-snowflake
Snowflake loader plugin for MkPipe. Writes Spark DataFrames into Snowflake tables using the native spark-snowflake connector, which stages data via internal cloud storage (S3/Azure/GCS) — significantly faster than JDBC for large datasets.
Documentation
For more detailed documentation, please visit the GitHub repository.
License
This project is licensed under the Apache 2.0 License - see the LICENSE file for details.
Connection Configuration
connections:
snowflake_target:
variant: snowflake
host: myaccount.snowflakecomputing.com
port: 443
database: MY_DATABASE
schema: MY_SCHEMA
user: myuser
password: mypassword
warehouse: MY_WAREHOUSE
With RSA key pair authentication:
connections:
snowflake_target:
variant: snowflake
host: myaccount.snowflakecomputing.com
port: 443
database: MY_DATABASE
schema: MY_SCHEMA
user: myuser
warehouse: MY_WAREHOUSE
private_key_file: /path/to/rsa_key.p8
private_key_file_pwd: mypassphrase
Table Configuration
pipelines:
- name: pg_to_snowflake
source: pg_source
destination: snowflake_target
tables:
- name: public.events
target_name: STG_EVENTS
replication_method: full
batchsize: 50000
Write Parallelism & Throughput
Snowflake loader inherits from JdbcLoader. Two parameters control write performance:
- name: public.events
target_name: STG_EVENTS
replication_method: full
batchsize: 50000 # rows per JDBC batch insert (default: 10000)
write_partitions: 4 # coalesce DataFrame to N partitions before writing
How they work
batchsize: number of rows buffered before sending a singleINSERTto Snowflake. Larger batches reduce round-trips and staging overhead.write_partitions: callscoalesce(N)on the DataFrame before writing, reducing the number of concurrent JDBC connections to Snowflake.
Performance Notes
- Snowflake Warehouse size is the primary write performance lever. A larger warehouse processes inserts faster regardless of partition count.
- JDBC writes to Snowflake stage data internally before committing. Large
batchsize(50,000+) reduces staging overhead. - For very large loads, consider using Snowflake's native
COPY INTOvia an external stage (S3/GCS) instead of JDBC — that is significantly faster but requires additional infrastructure. write_partitions: 4–8is a good default to balance throughput and connection count.
All Table Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
name |
string | required | Source table name |
target_name |
string | required | Snowflake destination table name |
replication_method |
full / incremental |
full |
Replication strategy |
batchsize |
int | 10000 |
Rows per JDBC batch insert |
write_partitions |
int | — | Coalesce DataFrame to N partitions before writing |
dedup_columns |
list | — | Columns used for mkpipe_id hash deduplication |
tags |
list | [] |
Tags for selective pipeline execution |
pass_on_error |
bool | false |
Skip table on error instead of failing |
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mkpipe_loader_snowflake-0.5.0.tar.gz.
File metadata
- Download URL: mkpipe_loader_snowflake-0.5.0.tar.gz
- Upload date:
- Size: 7.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
176344902d53f0caf79141e25d2bba4d4b6d3dd43d25d25f8db1b14b07c21b47
|
|
| MD5 |
8c46148ddc2b731cc7d216f5a7c8709d
|
|
| BLAKE2b-256 |
4df945373b3622da20012bdf4f449a82ce17a0ace61aa32dd07572e615cb033f
|
File details
Details for the file mkpipe_loader_snowflake-0.5.0-py3-none-any.whl.
File metadata
- Download URL: mkpipe_loader_snowflake-0.5.0-py3-none-any.whl
- Upload date:
- Size: 8.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ebc6d90fa9eb67ef6c3e64518e2b58abd6a8bcf141b154e1900b4f89b5c2f8cd
|
|
| MD5 |
c4da45dc0268d092727fbcfa4d0a4de2
|
|
| BLAKE2b-256 |
2b7a2f075ff2744680b78f12c60973184ecfe6eaa2ddff3dddd4f343a92ef499
|