Skip to main content

PostgreSQL loader for mkpipe.

Project description

mkpipe-loader-postgres

PostgreSQL loader plugin for MkPipe. Writes Spark DataFrames into PostgreSQL tables via JDBC.

Documentation

For more detailed documentation, please visit the GitHub repository.

License

This project is licensed under the Apache 2.0 License - see the LICENSE file for details.


Connection Configuration

connections:
  pg_target:
    variant: postgres
    host: localhost
    port: 5432
    database: mydb
    schema: public
    user: myuser
    password: mypassword

Table Configuration

pipelines:
  - name: source_to_pg
    source: my_source
    destination: pg_target
    tables:
      - name: source_table
        target_name: public.stg_table
        replication_method: full
        batchsize: 10000

Write Parallelism & Throughput

Two parameters control write performance:

      - name: source_table
        target_name: public.stg_table
        replication_method: full
        batchsize: 10000        # rows per JDBC batch insert (default: 10000)
        write_partitions: 4     # coalesce DataFrame to N partitions before writing

How they work

  • batchsize: rows buffered before sending one INSERT statement. PostgreSQL handles 5,000–10,000 well; very large batches (>100K) can increase memory pressure.
  • write_partitions: calls coalesce(N) on the DataFrame, reducing concurrent JDBC connections to PostgreSQL.

Performance Notes

  • PostgreSQL's COPY protocol is faster than JDBC for bulk loads, but mkpipe uses JDBC for portability.
  • For large loads, write_partitions: 4–8 with batchsize: 10000 is a reliable baseline.
  • If the target table has many indexes or constraints, writes will be slower — consider disabling indexes during bulk loads.

All Table Parameters

Parameter Type Default Description
name string required Source table name
target_name string required PostgreSQL destination table name
replication_method full / incremental full Replication strategy
batchsize int 10000 Rows per JDBC batch insert
write_partitions int Coalesce DataFrame to N partitions before writing
dedup_columns list Columns used for mkpipe_id hash deduplication
tags list [] Tags for selective pipeline execution
pass_on_error bool false Skip table on error instead of failing

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mkpipe_loader_postgres-0.5.0.tar.gz (7.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mkpipe_loader_postgres-0.5.0-py3-none-any.whl (8.1 kB view details)

Uploaded Python 3

File details

Details for the file mkpipe_loader_postgres-0.5.0.tar.gz.

File metadata

  • Download URL: mkpipe_loader_postgres-0.5.0.tar.gz
  • Upload date:
  • Size: 7.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for mkpipe_loader_postgres-0.5.0.tar.gz
Algorithm Hash digest
SHA256 c1ad232d516a27d3482fc05b70bb88abfa78e64754529f4299389cb7b198ebdc
MD5 c4e34bc0f2a769843164c1ed0f2ffb26
BLAKE2b-256 78e753dd6a4ef05ac6a5b7ce3dd5e48f0ad77cc79d909444282486b02b7aecfa

See more details on using hashes here.

File details

Details for the file mkpipe_loader_postgres-0.5.0-py3-none-any.whl.

File metadata

File hashes

Hashes for mkpipe_loader_postgres-0.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b4f074a00f68a3e134a5ee6f3cef2d9c2b5af435d149d62b3c8b1adcc1c63c6b
MD5 51373aaf4dd61d5dc3c3b0f31bcda7b7
BLAKE2b-256 8914cbc0cf937ed769e5ed96b5c85942c87401bb58703fd58d6c255a949f6d38

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page