Skip to main content

PostgreSQL loader for mkpipe.

Project description

mkpipe-loader-postgres

PostgreSQL loader plugin for MkPipe. Writes Spark DataFrames into PostgreSQL tables via JDBC.

Documentation

For more detailed documentation, please visit the GitHub repository.

License

This project is licensed under the Apache 2.0 License - see the LICENSE file for details.


Connection Configuration

connections:
  pg_target:
    variant: postgres
    host: localhost
    port: 5432
    database: mydb
    schema: public
    user: myuser
    password: mypassword

Table Configuration

pipelines:
  - name: source_to_pg
    source: my_source
    destination: pg_target
    tables:
      - name: source_table
        target_name: public.stg_table
        replication_method: full
        batchsize: 10000

      - name: source_table
        target_name: public.stg_table
        replication_method: incremental
        iterate_column: updated_at
        write_strategy: upsert
        write_key: [id]

Write Strategy

Control how data is written to PostgreSQL:

      - name: source_table
        target_name: public.stg_table
        write_strategy: upsert       # append | replace | upsert | merge
        write_key: [id]              # required for upsert/merge
Strategy PostgreSQL Behavior
append Plain INSERT via JDBC (default for incremental)
replace Drop and recreate table, then insert (default for full)
upsert INSERT ... ON CONFLICT (write_key) DO UPDATE via temp table
merge Same as upsert for PostgreSQL

Note: upsert/merge requires write_key. The loader writes to a temp table first, then executes a single INSERT ... ON CONFLICT statement to merge into the target.


Write Parallelism & Throughput

Two parameters control write performance:

      - name: source_table
        target_name: public.stg_table
        replication_method: full
        batchsize: 10000        # rows per JDBC batch insert (default: 10000)
        write_partitions: 4     # coalesce DataFrame to N partitions before writing

How they work

  • batchsize: rows buffered before sending one INSERT statement. PostgreSQL handles 5,000–10,000 well; very large batches (>100K) can increase memory pressure.
  • write_partitions: calls coalesce(N) on the DataFrame, reducing concurrent JDBC connections to PostgreSQL.

Performance Notes

  • PostgreSQL's COPY protocol is faster than JDBC for bulk loads, but mkpipe uses JDBC for portability.
  • For large loads, write_partitions: 4–8 with batchsize: 10000 is a reliable baseline.
  • If the target table has many indexes or constraints, writes will be slower — consider disabling indexes during bulk loads.

All Table Parameters

Parameter Type Default Description
name string required Source table name
target_name string required PostgreSQL destination table name
replication_method full / incremental full Replication strategy
batchsize int 10000 Rows per JDBC batch insert
write_partitions int Coalesce DataFrame to N partitions before writing
write_strategy string append, replace, upsert, merge
write_key list Key columns for upsert/merge (required)
dedup_columns list Columns used for mkpipe_id hash deduplication
tags list [] Tags for selective pipeline execution
pass_on_error bool false Skip table on error instead of failing

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mkpipe_loader_postgres-0.6.0.tar.gz (7.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mkpipe_loader_postgres-0.6.0-py3-none-any.whl (8.4 kB view details)

Uploaded Python 3

File details

Details for the file mkpipe_loader_postgres-0.6.0.tar.gz.

File metadata

  • Download URL: mkpipe_loader_postgres-0.6.0.tar.gz
  • Upload date:
  • Size: 7.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for mkpipe_loader_postgres-0.6.0.tar.gz
Algorithm Hash digest
SHA256 c06e351891e001ec48d36bbdc8dc8b94463ad44c1512956d84caadfd69b3d411
MD5 dcd530383e363c34980ab995c86ca31d
BLAKE2b-256 89b800aa19b0044fb3d26cf1b9d8c5058dfed2fee4ff965b688497c14399015a

See more details on using hashes here.

File details

Details for the file mkpipe_loader_postgres-0.6.0-py3-none-any.whl.

File metadata

File hashes

Hashes for mkpipe_loader_postgres-0.6.0-py3-none-any.whl
Algorithm Hash digest
SHA256 917859791579c3fe4e902715a2d361e94266749fa0d6c4cbd019a3ad95c33e3e
MD5 fc3facef53fc79f1a702e5d1310aa9f2
BLAKE2b-256 4df00c73ed134aaefc2d24a3f06847e508288f5ff102a5477259c8b2500ca476

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page