Skip to main content

A Python library for validating and upserting data into PostgreSQL with automated QA checks.

Project description

pg-upsert

ci/cd codecov Documentation Status PyPI Latest Release PyPI Downloads Python Version Support

pg-upsert is a Python package for validating and upserting data from staging tables into base tables in PostgreSQL. It runs automated QA checks, reports errors with rich formatted output, and performs dependency-aware upserts.

pg-upsert terminal output with interactive compare table

Why Use pg-upsert?

  • 7 Automated QA Checks – Validates NOT NULL, PRIMARY KEY, UNIQUE, FOREIGN KEY, CHECK CONSTRAINT, column existence, and column type compatibility before any modifications occur.
  • Interactive Confirmation – Two UI backends: Textual TUI (terminal) and Tkinter (desktop). Auto-detected or choose with --ui auto|textual|tkinter. The compare-tables dialog includes a Highlight Diffs toggle that tints matching/changed rows and flags the exact cells that differ, skipping any columns excluded from the upsert.
  • Structured Resultsrun() returns an UpsertResult with per-table stats, QA errors, and JSON serialization (--output=json for CI/CD pipelines).
  • Exportable Fix Sheets--export-failures <dir> writes an actionable report of failing rows: one row per unique violating staging row with an _issues column listing every problem (NULL in 'genre', duplicate PK, FK violation, etc.) so users can open it in Excel and fix the data. Supports CSV (file per table), JSON (nested), and XLSX (sheets per table) via --export-format.
  • Schema Validation--check-schema flag validates column existence and type compatibility without running data checks or upserts.
  • Flexible Upsert Strategies – Supports upsert, update, and insert methods.
  • Dependency-Aware Ordering – Tables are processed in FK dependency order automatically.
  • Rich Output – Colored pass/fail indicators, formatted tables, and dual console+logfile output.

Usage

Python API

from pg_upsert import PgUpsert

result = PgUpsert(
    uri="postgresql://user@localhost:5432/mydb",
    tables=("genres", "publishers", "books", "authors", "book_authors"),
    staging_schema="staging",
    base_schema="public",
    do_commit=True,
    upsert_method="upsert",
    exclude_cols=("rev_user", "rev_time"),
    exclude_null_check_cols=("book_alias",),
).run()

# UpsertResult provides structured access to results
print(result.qa_passed)       # True if all QA checks passed
print(result.committed)       # True if changes were committed
print(result.total_updated)   # Total rows updated across all tables
print(result.total_inserted)  # Total rows inserted across all tables
print(result.to_json())       # JSON serialization for CI/CD

Using an existing connection:

import psycopg2
from pg_upsert import PgUpsert

conn = psycopg2.connect(host="localhost", port=5432, dbname="mydb", user="user", password="pass")

ups = PgUpsert(
    conn=conn,
    tables=("genres", "publishers", "books"),
    staging_schema="staging",
    base_schema="public",
    do_commit=True,
)
result = ups.run()

# Drop all ups_* temp objects (connection stays open)
ups.cleanup()

QA-only mode (no upsert):

from pg_upsert import PgUpsert

ups = PgUpsert(
    uri="postgresql://user@localhost:5432/mydb",
    tables=("genres", "books"),
    staging_schema="staging",
    base_schema="public",
).qa_all()

if not ups.qa_passed:
    for err in ups.qa_errors:
        print(f"{err.table}: {err.check_type.value}{err.details}")

Schema compatibility check (column existence and type mismatches only):

from pg_upsert import PgUpsert

ups = PgUpsert(
    uri="postgresql://user@localhost:5432/mydb",
    tables=("genres", "books"),
    staging_schema="staging",
    base_schema="public",
).qa_column_existence().qa_type_mismatch()

if ups.qa_errors:
    for err in ups.qa_errors:
        print(f"{err.table}: {err.check_type.value}{err.details}")
else:
    print("Schemas are compatible")

Pipeline callbacks (per-table progress):

from pg_upsert import PgUpsert, CallbackEvent

def on_event(event):
    if event.event == CallbackEvent.QA_TABLE_COMPLETE:
        print(f"QA {'passed' if event.qa_passed else 'failed'} for {event.table}")
    elif event.event == CallbackEvent.UPSERT_TABLE_COMPLETE:
        print(f"{event.table}: {event.rows_inserted} inserted, {event.rows_updated} updated")

result = PgUpsert(
    uri="postgresql://user@localhost:5432/mydb",
    tables=("genres", "books"),
    staging_schema="staging",
    base_schema="public",
    do_commit=True,
    callback=on_event,
).run()

CLI

pg-upsert -h localhost -p 5432 -d mydb -u user \
  -s staging -b public \
  -t genres -t publishers -t books -t authors -t book_authors \
  -x rev_user -x rev_time \
  --commit
Option Description
-h, --host Database host
-p, --port Database port (default: 5432)
-d, --database Database name
-u, --user Database user (see Authentication)
-s, --staging-schema Staging schema name (default: staging)
-b, --base-schema Base schema name (default: public)
-e, --encoding Database connection encoding (default: utf-8)
-t, --table Table name to process (repeatable)
-x, --exclude-columns Columns to exclude from upsert (repeatable)
-n, --null-columns Columns to skip during NOT NULL checks (repeatable)
-m, --upsert-method upsert, update, or insert (default: upsert)
-c, --commit Commit changes (default: roll back)
-i, --interactive Prompt for confirmation at each step
-l, --logfile Write log to file (appends, does not overwrite)
-o, --output Output format: text (default) or json
--check-schema Validate column existence and types only, then exit
--compact Use compact grid format for QA summary
--ui Interactive UI: auto (default), textual, or tkinter
--export-failures Directory to write a QA failure fix sheet into
--export-format Fix sheet format: csv (default), json, or xlsx
--export-max-rows Max rows to capture per check per table (default 1000)
-f, --config-file Path to YAML configuration file
-g, --generate-config Generate a template config file
-v, --version Show version and exit
--docs Open documentation in browser
--debug Enable debug output

[!NOTE] CLI arguments take precedence over configuration file values. Explicit CLI flags are never overridden by the config file.

Configuration File

Create a YAML config file (see pg-upsert.example.yaml):

debug: false
commit: false
interactive: false
upsert_method: "upsert"  # Options: "upsert", "insert", "update"
logfile: "pg_upsert.log"
host: "localhost"
port: 5432
user: "docker"
database: "dev"
staging_schema: "staging"
base_schema: "public"
encoding: "utf-8"
tables:
  - "authors"
  - "publishers"
  - "books"
  - "book_authors"
  - "genres"
exclude_columns:
  - "rev_time"
  - "rev_user"
null_columns:
  - "book_alias"
output: "text"  # Options: "text", "json"
check_schema: false
compact: false
ui_mode: "auto"  # Options: "auto", "textual", "tkinter"

Run with: pg-upsert -f config.yaml

Docker

docker run -it --rm \
  -v $(pwd):/app \
  ghcr.io/geocoug/pg-upsert:latest \
  -h host.docker.internal -p 5432 -d dev -u docker \
  -s staging -b public -t genres --commit

QA Checks

pg-upsert runs 7 types of QA checks on staging data before upserting:

Check What it validates
Column Existence All base table columns exist in the staging table (respects --exclude-columns)
Column Type No hard type incompatibilities between staging and base (uses PostgreSQL's pg_cast catalog)
NOT NULL Non-nullable base columns have no NULL values in staging
Primary Key No duplicate values in PK columns
Unique No duplicate values in UNIQUE-constrained columns (NULLs allowed per PostgreSQL semantics)
Foreign Key All FK references point to existing rows in the referenced table
Check Constraint All CHECK constraint expressions evaluate to true

[!NOTE] pg-upsert is constraint-driven. Data checks pass vacuously when the base table has no constraints of that type, and tables without a primary key are skipped during the upsert step (a warning is printed). To upsert against a table, make sure the base table has a PK. See Running Without Constraints for details.

See the QA Checks Reference for detailed documentation.

Authentication

pg-upsert resolves the database password in this order:

  1. Password in URI (Python API only) — postgresql://user:pass@host/db
  2. PGPASSWORD environment variable — standard PostgreSQL convention, works with both CLI and API
  3. Interactive prompt — if neither of the above is set

For CI/CD pipelines, use PGPASSWORD to avoid interactive prompts:

PGPASSWORD=secret pg-upsert -h host -d db -u user \
  -s staging -b public -t books \
  --output json --commit

pg-upsert also supports PostgreSQL's .pgpass file via psycopg2.

Exit Codes

Code Meaning
0 QA passed and upsert completed (or user cancelled)
1 QA failed, schema check failed, or error

Contributing

See CONTRIBUTING.md for development setup, available recipes, testing, and release process.

git clone https://github.com/geocoug/pg-upsert
cd pg-upsert
just sync
just test

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pg_upsert-1.21.0.tar.gz (1.6 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pg_upsert-1.21.0-py3-none-any.whl (95.2 kB view details)

Uploaded Python 3

File details

Details for the file pg_upsert-1.21.0.tar.gz.

File metadata

  • Download URL: pg_upsert-1.21.0.tar.gz
  • Upload date:
  • Size: 1.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pg_upsert-1.21.0.tar.gz
Algorithm Hash digest
SHA256 8b70b83002287b8a47df2598bc69aa99e62be7b0c27d5efedb26075b334b9119
MD5 12c15f2f4d204936eb12fa10ec95b1db
BLAKE2b-256 6c910ba3482070e3fd1a5fdd43e5132b43f2fbfd64df135b3988f8744964a34e

See more details on using hashes here.

Provenance

The following attestation bundles were made for pg_upsert-1.21.0.tar.gz:

Publisher: ci-cd.yml on geocoug/pg-upsert

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pg_upsert-1.21.0-py3-none-any.whl.

File metadata

  • Download URL: pg_upsert-1.21.0-py3-none-any.whl
  • Upload date:
  • Size: 95.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pg_upsert-1.21.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4fca873166f3407f609c709cc1ef576bfd21824a9a059929e19c8c1fea919ad7
MD5 3b812860cb3480db900461e864e4f3e0
BLAKE2b-256 9979bfd849e7c99d405da687a74a9bf2afcbaace1bc76841c16bb0cea3683507

See more details on using hashes here.

Provenance

The following attestation bundles were made for pg_upsert-1.21.0-py3-none-any.whl:

Publisher: ci-cd.yml on geocoug/pg-upsert

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page