Skip to main content

A Python library for validating and upserting data into PostgreSQL with automated QA checks.

Project description

pg-upsert

ci/cd codecov Documentation Status PyPI Latest Release PyPI Downloads Python Version Support

pg-upsert is a Python package for validating and upserting data from staging tables into base tables in PostgreSQL. It runs automated QA checks, reports errors with rich formatted output, and performs dependency-aware upserts.

pg-upsert terminal output with interactive compare table

Why Use pg-upsert?

  • 7 Automated QA Checks – Validates NOT NULL, PRIMARY KEY, UNIQUE, FOREIGN KEY, CHECK CONSTRAINT, column existence, and column type compatibility before any modifications occur.
  • Interactive Confirmation – Two UI backends: Textual TUI (terminal) and Tkinter (desktop). Auto-detected or choose with --ui auto|textual|tkinter.
  • Structured Resultsrun() returns an UpsertResult with per-table stats, QA errors, and JSON serialization (--output=json for CI/CD pipelines).
  • Schema Validation--check-schema flag validates column existence and type compatibility without running data checks or upserts.
  • Flexible Upsert Strategies – Supports upsert, update, and insert methods.
  • Dependency-Aware Ordering – Tables are processed in FK dependency order automatically.
  • Rich Output – Colored pass/fail indicators, formatted tables, and dual console+logfile output.

Usage

Python API

from pg_upsert import PgUpsert

result = PgUpsert(
    uri="postgresql://user@localhost:5432/mydb",
    tables=("genres", "publishers", "books", "authors", "book_authors"),
    staging_schema="staging",
    base_schema="public",
    do_commit=True,
    upsert_method="upsert",
    exclude_cols=("rev_user", "rev_time"),
    exclude_null_check_cols=("book_alias",),
).run()

# UpsertResult provides structured access to results
print(result.qa_passed)       # True if all QA checks passed
print(result.committed)       # True if changes were committed
print(result.total_updated)   # Total rows updated across all tables
print(result.total_inserted)  # Total rows inserted across all tables
print(result.to_json())       # JSON serialization for CI/CD

Using an existing connection:

import psycopg2
from pg_upsert import PgUpsert

conn = psycopg2.connect(host="localhost", port=5432, dbname="mydb", user="user", password="pass")

ups = PgUpsert(
    conn=conn,
    tables=("genres", "publishers", "books"),
    staging_schema="staging",
    base_schema="public",
    do_commit=True,
)
result = ups.run()

# Drop all ups_* temp objects (connection stays open)
ups.cleanup()

QA-only mode (no upsert):

from pg_upsert import PgUpsert

ups = PgUpsert(
    uri="postgresql://user@localhost:5432/mydb",
    tables=("genres", "books"),
    staging_schema="staging",
    base_schema="public",
).qa_all()

if not ups.qa_passed:
    for err in ups.qa_errors:
        print(f"{err.table}: {err.check_type.value}{err.details}")

Schema compatibility check (column existence and type mismatches only):

from pg_upsert import PgUpsert

ups = PgUpsert(
    uri="postgresql://user@localhost:5432/mydb",
    tables=("genres", "books"),
    staging_schema="staging",
    base_schema="public",
).qa_column_existence().qa_type_mismatch()

if ups.qa_errors:
    for err in ups.qa_errors:
        print(f"{err.table}: {err.check_type.value}{err.details}")
else:
    print("Schemas are compatible")

Pipeline callbacks (per-table progress):

from pg_upsert import PgUpsert, CallbackEvent

def on_event(event):
    if event.event == CallbackEvent.QA_TABLE_COMPLETE:
        print(f"QA {'passed' if event.qa_passed else 'failed'} for {event.table}")
    elif event.event == CallbackEvent.UPSERT_TABLE_COMPLETE:
        print(f"{event.table}: {event.rows_inserted} inserted, {event.rows_updated} updated")

result = PgUpsert(
    uri="postgresql://user@localhost:5432/mydb",
    tables=("genres", "books"),
    staging_schema="staging",
    base_schema="public",
    do_commit=True,
    callback=on_event,
).run()

CLI

pg-upsert -h localhost -p 5432 -d mydb -u user \
  -s staging -b public \
  -t genres -t publishers -t books -t authors -t book_authors \
  -x rev_user -x rev_time \
  --commit
Option Description
-h, --host Database host
-p, --port Database port (default: 5432)
-d, --database Database name
-u, --user Database user (see Authentication)
-s, --staging-schema Staging schema name (default: staging)
-b, --base-schema Base schema name (default: public)
-e, --encoding Database connection encoding (default: utf-8)
-t, --table Table name to process (repeatable)
-x, --exclude-columns Columns to exclude from upsert (repeatable)
-n, --null-columns Columns to skip during NOT NULL checks (repeatable)
-m, --upsert-method upsert, update, or insert (default: upsert)
-c, --commit Commit changes (default: roll back)
-i, --interactive Prompt for confirmation at each step
-l, --logfile Write log to file (appends, does not overwrite)
-o, --output Output format: text (default) or json
--check-schema Validate column existence and types only, then exit
--compact Use compact grid format for QA summary
--ui Interactive UI: auto (default), textual, or tkinter
-f, --config-file Path to YAML configuration file
-g, --generate-config Generate a template config file
-v, --version Show version and exit
--docs Open documentation in browser
--debug Enable debug output

[!NOTE] CLI arguments take precedence over configuration file values. Explicit CLI flags are never overridden by the config file.

Configuration File

Create a YAML config file (see pg-upsert.example.yaml):

debug: false
commit: false
interactive: false
upsert_method: "upsert"  # Options: "upsert", "insert", "update"
logfile: "pg_upsert.log"
host: "localhost"
port: 5432
user: "docker"
database: "dev"
staging_schema: "staging"
base_schema: "public"
encoding: "utf-8"
tables:
  - "authors"
  - "publishers"
  - "books"
  - "book_authors"
  - "genres"
exclude_columns:
  - "rev_time"
  - "rev_user"
null_columns:
  - "book_alias"
output: "text"  # Options: "text", "json"
check_schema: false
compact: false
ui_mode: "auto"  # Options: "auto", "textual", "tkinter"

Run with: pg-upsert -f config.yaml

Docker

docker run -it --rm \
  -v $(pwd):/app \
  ghcr.io/geocoug/pg-upsert:latest \
  -h host.docker.internal -p 5432 -d dev -u docker \
  -s staging -b public -t genres --commit

QA Checks

pg-upsert runs 7 types of QA checks on staging data before upserting:

Check What it validates
Column Existence All base table columns exist in the staging table (respects --exclude-columns)
Column Type No hard type incompatibilities between staging and base (uses PostgreSQL's pg_cast catalog)
NOT NULL Non-nullable base columns have no NULL values in staging
Primary Key No duplicate values in PK columns
Unique No duplicate values in UNIQUE-constrained columns (NULLs allowed per PostgreSQL semantics)
Foreign Key All FK references point to existing rows in the referenced table
Check Constraint All CHECK constraint expressions evaluate to true

See the QA Checks Reference for detailed documentation.

Authentication

pg-upsert resolves the database password in this order:

  1. Password in URI (Python API only) — postgresql://user:pass@host/db
  2. PGPASSWORD environment variable — standard PostgreSQL convention, works with both CLI and API
  3. Interactive prompt — if neither of the above is set

For CI/CD pipelines, use PGPASSWORD to avoid interactive prompts:

PGPASSWORD=secret pg-upsert -h host -d db -u user \
  -s staging -b public -t books \
  --output json --commit

pg-upsert also supports PostgreSQL's .pgpass file via psycopg2.

Exit Codes

Code Meaning
0 QA passed and upsert completed (or user cancelled)
1 QA failed, schema check failed, or error

Contributing

See CONTRIBUTING.md for development setup, available recipes, testing, and release process.

git clone https://github.com/geocoug/pg-upsert
cd pg-upsert
just sync
just test

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pg_upsert-1.20.0.tar.gz (1.3 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pg_upsert-1.20.0-py3-none-any.whl (81.0 kB view details)

Uploaded Python 3

File details

Details for the file pg_upsert-1.20.0.tar.gz.

File metadata

  • Download URL: pg_upsert-1.20.0.tar.gz
  • Upload date:
  • Size: 1.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pg_upsert-1.20.0.tar.gz
Algorithm Hash digest
SHA256 25621af73e0aa7f77a3935ce8131651cc4a180b78c622b5697b1938d0d9598d2
MD5 e6ef5af93e80fdd7fe9c7a3db876b904
BLAKE2b-256 3a0e9b76dbad3e885468952cbd6786117c940647748a66f045748c0301793275

See more details on using hashes here.

Provenance

The following attestation bundles were made for pg_upsert-1.20.0.tar.gz:

Publisher: ci-cd.yml on geocoug/pg-upsert

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pg_upsert-1.20.0-py3-none-any.whl.

File metadata

  • Download URL: pg_upsert-1.20.0-py3-none-any.whl
  • Upload date:
  • Size: 81.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pg_upsert-1.20.0-py3-none-any.whl
Algorithm Hash digest
SHA256 187eb56ce7d2683cbdd9277f3490c6eaa440809c7dc84e66c0bc5d26825bf87e
MD5 6f730121e468364b94db6e090d0fc622
BLAKE2b-256 e1f8f3d1358ea17f354c4536065407344a7df073d2e4608c12f6554b015c8837

See more details on using hashes here.

Provenance

The following attestation bundles were made for pg_upsert-1.20.0-py3-none-any.whl:

Publisher: ci-cd.yml on geocoug/pg-upsert

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page