Skip to main content

A Python library for upserting data into postgres.

Project description

pg_upsert

ci/cd Documentation Status PyPI Latest Release PyPI Downloads Python Version Support

pg_upsert is a Python package that provides a method to interactively update and insert (upsert) rows of a base table or base tables from the staging table(s) of the same name. The package is designed to work exclusively with PostgreSQL databases.

The program will perform initial table checks in the form of not-null, primary key, foreign key, and check constraint checks. If any of these checks fail, the program will exit with an error message. If all checks pass, the program will display the number of rows to be inserted and updated, and ask for confirmation before proceeding (when the interactive flag is set to True). If the user confirms, the program will perform the upserts and display the number of rows inserted and updated. If the user does not confirm, the program will exit without performing any upserts.

Credits

This project was created using inspiration from ExecSQL and the example script pg_upsert.sql. The goal of this project is to provide a Python implementation of pg_upsert.sql without the need for ExecSQL.

Usage

Python

import logging

from pg_upsert import PgUpsert

logger = logging.getLogger("pg_upsert")
logger.setLevel(logging.INFO)
logger.addHandler(logging.StreamHandler())

# Run PgUpsert using a URI
PgUpsert(
    uri="postgresql://user@localhost:5432/database", # Note the missing password. pg_upsert will prompt for the password.
    encoding="utf-8",
    tables=("genres", "books", "authors", "book_authors"),
    stg_schema="staging",
    base_schema="public",
    do_commit=True,
    upsert_method="upsert",
    interactive=True,
    exclude_cols=("rev_user", "rev_time", "created_at", "updated_at"),
    exclude_null_check_cols=("rev_user", "rev_time", "created_at", "updated_at", "alias"),
).run()


# Run PgUpsert using an existing connection
conn = psycopg2.connect(
    host="localhost",
    port=5432,
    dbname="database",
    user="user",
    password="password",
)

PgUpsert(
    conn=conn,
    encoding="utf-8",
    tables=("genres", "books", "authors", "book_authors"),
    stg_schema="staging",
    base_schema="public",
    do_commit=True,
    upsert_method="upsert",
    interactive=True,
    exclude_cols=("rev_user", "rev_time", "created_at", "updated_at"),
    exclude_null_check_cols=("rev_user", "rev_time", "created_at", "updated_at", "alias"),
).run()

CLI

pg_upsert can be run from the command line. There are two key ways to run pg_upsert from the command line: using a configuration file or using command line arguments.

Command Line Arguments

Running pg_upsert --help will display the following help message:

usage: pg_upsert [--help] [--version] [--debug] [--docs] [-q] [-l LOGFILE] [-e EXCLUDE] [-n NULL] [-c] [-i] [-m {upsert,update,insert}] [-h HOST] [-p PORT] [-d DATABASE] [-u USER] [-s STG_SCHEMA] [-b BASE_SCHEMA]
                 [--encoding ENCODING] [-f CONFIG_FILE] [-t TABLE [TABLE ...]]

Run not-NULL, Primary Key, Foreign Key, and Check Constraint checks on staging tables then update and insert (upsert) data from staging tables to base tables.

options:
  --help                show this help message and exit
  --version             show program's version number and exit
  --debug               display debug output
  --docs                open the documentation in a web browser
  -q, --quiet           suppress all console output
  -l LOGFILE, --logfile LOGFILE
                        write log to LOGFILE
  -e EXCLUDE, --exclude-columns EXCLUDE
                        comma-separated list of columns to exclude from null checks
  -n NULL, --null-columns NULL
                        comma-separated list of columns to exclude from null checks
  -c, --commit          commit changes to database
  -i, --interactive     display interactive GUI of important table information
  -m {upsert,update,insert}, --upsert-method {upsert,update,insert}
                        method to use for upsert
  -h HOST, --host HOST  database host
  -p PORT, --port PORT  database port
  -d DATABASE, --database DATABASE
                        database name
  -u USER, --user USER  database user
  -s STG_SCHEMA, --staging-schema STG_SCHEMA
                        staging schema name
  -b BASE_SCHEMA, --base-schema BASE_SCHEMA
                        base schema name
  --encoding ENCODING   encoding of the database
  -f CONFIG_FILE, --config-file CONFIG_FILE
                        path to configuration yaml file
  -t TABLE [TABLE ...], --table TABLE [TABLE ...]
                        table name(s)

Configuration File

To use a configuration file, create a YAML file with the format below. This example is also provided in the pg_upsert.example.yaml file. The configuration file can be passed to pg_upsert using the -f or --config-file flag.

debug: false
quiet: false
commit: false
interactive: false
upsert_method: "upsert"
logfile: "pg_upsert.log"
host: "localhost"
port: 5432
user: "postgres"
database: "postgres"
staging_schema: "staging"
base_schema: "public"
encoding: "utf-8"
table:
  - "authors"
  - "publishers"
  - "books"
  - "book_authors"
  - "genres"
exclude_columns:
  - "alias"
  - "rev_time"
  - "rev_user"
null_columns:
  - "alias"
  - "created_at"
  - "updated_at"

Then, run pg_upsert -f pg_upsert.yaml.

If the user specifies a configuration file and command line arguments, the configuration file will override any command line arguments specified.

Docker

docker pull ghcr.io/geocoug/pg_upsert:latest

Once the image is pulled, you can run the image using either of the cli options. Below is an example:

docker run -it --rm ghcr.io/geocoug/pg_upsert:latest -v $(pwd):/app pg_upsert --help

Contributing

  1. Fork the repository
  2. Create a new branch (git checkout -b feature-branch)
  3. Create a Python virtual environment (python -m venv .venv)
  4. Activate the virtual environment (source .venv/bin/activate)
  5. Install dependencies (pip install -r requirements.txt)
  6. Install pre-commit hooks (python -m pre-commit install --install-hooks)
  7. Make your changes and run tests (make test)
  8. Push your changes to the branch (git push origin feature-branch)
  9. Create a pull request

Running Tests Locally

Running tests locally requires a PostgreSQL database. The easiest way to set up a PostgreSQL database is to use Docker. The following command will create a PostgreSQL database called dev with the user docker and password docker.

docker run --name postgres -e POSTGRES_USER=docker -e POSTGRES_PASSWORD=docker -e POSTGRES_DB=dev -p 5432:5432 -d postgres

Once initialized, import the test data by running the following command.

docker exec -i postgres psql -U docker -d dev < tests/data.sql

Create a .env file in the root directory with the following content, modifying the values as needed.

POSTGRES_HOST=localhost
POSTGRES_PORT=5432
POSTGRES_DB=dev
POSTGRES_USER=docker
POSTGRES_PASSWORD=docker

Now you can run the tests using make test.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pg_upsert-1.4.3.tar.gz (67.5 kB view details)

Uploaded Source

Built Distribution

pg_upsert-1.4.3-py3-none-any.whl (53.1 kB view details)

Uploaded Python 3

File details

Details for the file pg_upsert-1.4.3.tar.gz.

File metadata

  • Download URL: pg_upsert-1.4.3.tar.gz
  • Upload date:
  • Size: 67.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for pg_upsert-1.4.3.tar.gz
Algorithm Hash digest
SHA256 db5e42998660d38291f4ffd2beebe4f3b876574ccfbeea8906ad7beb2204e9cb
MD5 34124d92df60634fda170777e72f024e
BLAKE2b-256 4e56a0995a15b4f2e63bbb456941cac666ae21d8e1b2ab2c805c2defeb037dc5

See more details on using hashes here.

Provenance

The following attestation bundles were made for pg_upsert-1.4.3.tar.gz:

Publisher: ci-cd.yml on geocoug/pg-upsert

Attestations:

File details

Details for the file pg_upsert-1.4.3-py3-none-any.whl.

File metadata

  • Download URL: pg_upsert-1.4.3-py3-none-any.whl
  • Upload date:
  • Size: 53.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for pg_upsert-1.4.3-py3-none-any.whl
Algorithm Hash digest
SHA256 be905ad2dda1d3a451948007b53d7293d72da79ddd6c2fd9331d113d1f98956a
MD5 f7dc0cf49bf86312d873985ee4605b57
BLAKE2b-256 4f97bca9cf2bda9ef1c7abc9d4652cc5c6bc2e9dd840b8abaf6cfc7e5db18692

See more details on using hashes here.

Provenance

The following attestation bundles were made for pg_upsert-1.4.3-py3-none-any.whl:

Publisher: ci-cd.yml on geocoug/pg-upsert

Attestations:

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page