A collection of Python utility functions for ingesting data into SQLAlchemy-defined PostgreSQL tables, automatically migrating them as needed, and minimising locking
Project description
pg-bulk-ingest
A Python utility function for ingesting data into SQLAlchemy-defined PostgreSQL tables, automatically migrating them as needed, allowing concurrent reads as much as possible.
Allowing concurrent writes is not an aim of pg-bulk-ingest. It is designed for use in ETL pipelines where PostgreSQL is used as a data warehouse, and the only writes to the table are from pg-bulk-ingest. It is assumed that there is only one pg-bulk-ingest running against a given table at any one time.
Features
pg-bulk-ingest exposes a single function as its API that:
- Creates the tables if necessary
- Migrates any existing tables if necessary, minimising locking
- Ingests data in batches, where each batch is ingested in its own transaction
- Handles "high-watermarking" to carry on from where a previous ingest finished or errored
- Optionally performs an "upsert", matching rows on primary key
- Optionally deletes all existing rows before ingestion
- Optionally calls a callback just before each batch is visible to other database clients
Visit the pg-bulk-ingest documentation for usage instructions.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for pg_bulk_ingest-0.0.49-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5cc58f73d67aec11e371ff8395827727ad793f6d3efc444e0988c15bbd5dba54 |
|
MD5 | d06ada84eec16e7e2fe06b997d3ddd50 |
|
BLAKE2b-256 | 2f3cd65e4cfdcc0857f7e4baf08d16ab90729250ad419b5a0a950616c21cbcf8 |