Snowflake loader for mkpipe.
Project description
mkpipe-loader-snowflake
Snowflake loader plugin for MkPipe. Writes Spark DataFrames into Snowflake tables using the native Snowflake Spark connector (spark-snowflake), which stages data via internal cloud storage (S3/Azure/GCS) — significantly faster than JDBC for large datasets.
Documentation
For more detailed documentation, please visit the GitHub repository.
License
This project is licensed under the Apache 2.0 License - see the LICENSE file for details.
Connection Configuration
connections:
snowflake_target:
variant: snowflake
host: myaccount.snowflakecomputing.com
port: 443
database: MY_DATABASE
schema: MY_SCHEMA
user: myuser
password: mypassword
warehouse: MY_WAREHOUSE
With RSA key pair authentication:
connections:
snowflake_target:
variant: snowflake
host: myaccount.snowflakecomputing.com
port: 443
database: MY_DATABASE
schema: MY_SCHEMA
user: myuser
warehouse: MY_WAREHOUSE
private_key_file: /path/to/rsa_key.p8
private_key_file_pwd: mypassphrase
Table Configuration
pipelines:
- name: pg_to_snowflake
source: pg_source
destination: snowflake_target
tables:
- name: public.events
target_name: STG_EVENTS
replication_method: full
batchsize: 50000
Write Strategy
Control how data is written to Snowflake:
- name: public.events
target_name: STG_EVENTS
write_strategy: upsert # append | replace | upsert | merge
write_key: [id] # required for upsert/merge
| Strategy | Snowflake Behavior |
|---|---|
append |
Insert via Spark connector (default for incremental) |
replace |
Overwrite table via Spark connector (default for full). Use if_exists: append to preserve existing table |
upsert |
Write to temp table, then MERGE INTO target USING temp ON ... WHEN MATCHED THEN UPDATE ... WHEN NOT MATCHED THEN INSERT ... |
merge |
Same as upsert for Snowflake |
Note:
upsert/mergerequireswrite_key. The loader creates a temp table, writes data there, executes a MERGE statement, then drops the temp table.
Write Parallelism & Throughput
Snowflake loader uses the native Spark connector. Two parameters control write performance:
- name: public.events
target_name: STG_EVENTS
replication_method: full
batchsize: 50000 # rows per batch insert (default: 10000)
write_partitions: 4 # coalesce DataFrame to N partitions before writing
How they work
batchsize: number of rows buffered before sending to Snowflake. Larger batches reduce round-trips and staging overhead.write_partitions: callscoalesce(N)on the DataFrame before writing, controlling the number of concurrent write operations to Snowflake.
Performance Notes
- Snowflake Warehouse size is the primary write performance lever. A larger warehouse processes inserts faster regardless of partition count.
- The Spark connector stages data internally before committing. Large
batchsize(50,000+) reduces staging overhead. - For very large loads, consider using Snowflake's native
COPY INTOvia an external stage (S3/GCS) instead — that is significantly faster but requires additional infrastructure. write_partitions: 4–8is a good default to balance throughput and connection count.
All Table Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
name |
string | required | Source table name |
target_name |
string | required | Snowflake destination table name |
replication_method |
full / incremental |
full |
Replication strategy |
batchsize |
int | 10000 |
Rows per batch insert |
write_partitions |
int | — | Coalesce DataFrame to N partitions before writing |
write_strategy |
string | — | append, replace, upsert, merge |
write_key |
list | — | Key columns for upsert/merge (required) |
if_exists |
string | — | replace (drop+create) or append (preserve table). Inherits from settings |
dedup_columns |
list | — | Columns used for mkpipe_id hash deduplication |
tags |
list | [] |
Tags for selective pipeline execution |
pass_on_error |
bool | false |
Skip table on error instead of failing |
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mkpipe_loader_snowflake-0.7.0.tar.gz.
File metadata
- Download URL: mkpipe_loader_snowflake-0.7.0.tar.gz
- Upload date:
- Size: 9.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
66884c82df751a0fba460935a2efeeeb9367c4fa9392479ce5e0c320bfe0cf24
|
|
| MD5 |
1f867d71497c4632719721a07521bac3
|
|
| BLAKE2b-256 |
863cfd7df1fccfd84abe8acc428762a3ae564722e79f18407fe26f18a23cfcef
|
File details
Details for the file mkpipe_loader_snowflake-0.7.0-py3-none-any.whl.
File metadata
- Download URL: mkpipe_loader_snowflake-0.7.0-py3-none-any.whl
- Upload date:
- Size: 10.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0b12036e5b3326f580205b3ace577c9ef017ceab5b96bcdd2c1e462b2e7d2e54
|
|
| MD5 |
9f20bb3c1ec4c0e0ae6be7f341e97ff1
|
|
| BLAKE2b-256 |
4927e49eae684f458094c000acbfc91a5a618da9ebb9954c2044ffbf5278ca25
|