PostgreSQL loader for mkpipe.
Project description
mkpipe-loader-postgres
PostgreSQL loader plugin for MkPipe. Writes Spark DataFrames into PostgreSQL tables via JDBC.
Documentation
For more detailed documentation, please visit the GitHub repository.
License
This project is licensed under the Apache 2.0 License - see the LICENSE file for details.
Connection Configuration
connections:
pg_target:
variant: postgres
host: localhost
port: 5432
database: mydb
schema: public
user: myuser
password: mypassword
Table Configuration
pipelines:
- name: source_to_pg
source: my_source
destination: pg_target
tables:
- name: source_table
target_name: public.stg_table
replication_method: full
batchsize: 10000
- name: source_table
target_name: public.stg_table
replication_method: incremental
iterate_column: updated_at
write_strategy: upsert
write_key: [id]
Write Strategy
Control how data is written to PostgreSQL:
- name: source_table
target_name: public.stg_table
write_strategy: upsert # append | replace | upsert | merge
write_key: [id] # required for upsert/merge
| Strategy | PostgreSQL Behavior |
|---|---|
append |
Plain INSERT via JDBC (default for incremental) |
replace |
Drop and recreate table, then insert (default for full) |
upsert |
INSERT ... ON CONFLICT (write_key) DO UPDATE via temp table |
merge |
Same as upsert for PostgreSQL |
Note:
upsert/mergerequireswrite_key. The loader writes to a temp table first, then executes a singleINSERT ... ON CONFLICTstatement to merge into the target.
Write Parallelism & Throughput
Two parameters control write performance:
- name: source_table
target_name: public.stg_table
replication_method: full
batchsize: 10000 # rows per JDBC batch insert (default: 10000)
write_partitions: 4 # coalesce DataFrame to N partitions before writing
How they work
batchsize: rows buffered before sending oneINSERTstatement. PostgreSQL handles 5,000–10,000 well; very large batches (>100K) can increase memory pressure.write_partitions: callscoalesce(N)on the DataFrame, reducing concurrent JDBC connections to PostgreSQL.
Performance Notes
- PostgreSQL's
COPYprotocol is faster than JDBC for bulk loads, but mkpipe uses JDBC for portability. - For large loads,
write_partitions: 4–8withbatchsize: 10000is a reliable baseline. - If the target table has many indexes or constraints, writes will be slower — consider disabling indexes during bulk loads.
All Table Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
name |
string | required | Source table name |
target_name |
string | required | PostgreSQL destination table name |
replication_method |
full / incremental |
full |
Replication strategy |
batchsize |
int | 10000 |
Rows per JDBC batch insert |
write_partitions |
int | — | Coalesce DataFrame to N partitions before writing |
write_strategy |
string | — | append, replace, upsert, merge |
write_key |
list | — | Key columns for upsert/merge (required) |
dedup_columns |
list | — | Columns used for mkpipe_id hash deduplication |
tags |
list | [] |
Tags for selective pipeline execution |
pass_on_error |
bool | false |
Skip table on error instead of failing |
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mkpipe_loader_postgres-0.6.0.tar.gz.
File metadata
- Download URL: mkpipe_loader_postgres-0.6.0.tar.gz
- Upload date:
- Size: 7.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c06e351891e001ec48d36bbdc8dc8b94463ad44c1512956d84caadfd69b3d411
|
|
| MD5 |
dcd530383e363c34980ab995c86ca31d
|
|
| BLAKE2b-256 |
89b800aa19b0044fb3d26cf1b9d8c5058dfed2fee4ff965b688497c14399015a
|
File details
Details for the file mkpipe_loader_postgres-0.6.0-py3-none-any.whl.
File metadata
- Download URL: mkpipe_loader_postgres-0.6.0-py3-none-any.whl
- Upload date:
- Size: 8.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
917859791579c3fe4e902715a2d361e94266749fa0d6c4cbd019a3ad95c33e3e
|
|
| MD5 |
fc3facef53fc79f1a702e5d1310aa9f2
|
|
| BLAKE2b-256 |
4df00c73ed134aaefc2d24a3f06847e508288f5ff102a5477259c8b2500ca476
|