Skip to main content

PostgreSQL loader for mkpipe.

Project description

mkpipe-loader-postgres

PostgreSQL loader plugin for MkPipe. Writes Spark DataFrames into PostgreSQL tables via JDBC.

Documentation

For more detailed documentation, please visit the GitHub repository.

License

This project is licensed under the Apache 2.0 License - see the LICENSE file for details.


Connection Configuration

connections:
  pg_target:
    variant: postgres
    host: localhost
    port: 5432
    database: mydb
    schema: public
    user: myuser
    password: mypassword

Table Configuration

pipelines:
  - name: source_to_pg
    source: my_source
    destination: pg_target
    tables:
      - name: source_table
        target_name: public.stg_table
        replication_method: full
        batchsize: 10000

      - name: source_table
        target_name: public.stg_table
        replication_method: incremental
        iterate_column: updated_at
        write_strategy: upsert
        write_key: [id]

Write Strategy

Control how data is written to PostgreSQL:

      - name: source_table
        target_name: public.stg_table
        write_strategy: upsert       # append | replace | upsert | merge
        write_key: [id]              # required for upsert/merge
Strategy PostgreSQL Behavior
append Plain INSERT via JDBC (default for incremental)
replace Drop and recreate table, then insert (default for full). Use if_exists: append to preserve existing table
upsert INSERT ... ON CONFLICT (write_key) DO UPDATE via temp table
merge Same as upsert for PostgreSQL

Note: upsert/merge requires write_key. The loader writes to a temp table first, then executes a single INSERT ... ON CONFLICT statement to merge into the target.


Write Parallelism & Throughput

Two parameters control write performance:

      - name: source_table
        target_name: public.stg_table
        replication_method: full
        batchsize: 10000        # rows per JDBC batch insert (default: 10000)
        write_partitions: 4     # coalesce DataFrame to N partitions before writing

How they work

  • batchsize: rows buffered before sending one INSERT statement. PostgreSQL handles 5,000–10,000 well; very large batches (>100K) can increase memory pressure.
  • write_partitions: calls coalesce(N) on the DataFrame, reducing concurrent JDBC connections to PostgreSQL.

Performance Notes

  • PostgreSQL's COPY protocol is faster than JDBC for bulk loads, but mkpipe uses JDBC for portability.
  • For large loads, write_partitions: 4–8 with batchsize: 10000 is a reliable baseline.
  • If the target table has many indexes or constraints, writes will be slower — consider disabling indexes during bulk loads.

All Table Parameters

Parameter Type Default Description
name string required Source table name
target_name string required PostgreSQL destination table name
replication_method full / incremental full Replication strategy
batchsize int 10000 Rows per JDBC batch insert
write_partitions int Coalesce DataFrame to N partitions before writing
write_strategy string append, replace, upsert, merge
write_key list Key columns for upsert/merge (required)
if_exists string replace (drop+create) or append (preserve table). Inherits from settings
dedup_columns list Columns used for mkpipe_id hash deduplication
tags list [] Tags for selective pipeline execution
pass_on_error bool false Skip table on error instead of failing

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mkpipe_loader_postgres-0.7.0.tar.gz (7.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mkpipe_loader_postgres-0.7.0-py3-none-any.whl (8.5 kB view details)

Uploaded Python 3

File details

Details for the file mkpipe_loader_postgres-0.7.0.tar.gz.

File metadata

  • Download URL: mkpipe_loader_postgres-0.7.0.tar.gz
  • Upload date:
  • Size: 7.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for mkpipe_loader_postgres-0.7.0.tar.gz
Algorithm Hash digest
SHA256 f23c709a67c00b2fa89923f5f127ed4ad5a7f7cc7ca32a9d041498d45c45ee12
MD5 447415f590d5f587ccd565600af2b681
BLAKE2b-256 372b737462ad06cfc9c875beab47cdcc7cd763d9e94dc50a1fb469a79c5f63ef

See more details on using hashes here.

File details

Details for the file mkpipe_loader_postgres-0.7.0-py3-none-any.whl.

File metadata

File hashes

Hashes for mkpipe_loader_postgres-0.7.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d543353b32c4c4084e7788d9e868917d9cf26fc5ca6347fb80205eeaf4c086c5
MD5 98ff47700a406006e4132d457701e103
BLAKE2b-256 2453364778a1ea6604e46d5461babf3d4ffcdb8a67533e1d967f2697cb60cd68

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page