A Python library for validating and upserting data into PostgreSQL with automated QA checks.
Project description
pg-upsert
pg-upsert is a Python package for validating and upserting data from staging tables into base tables in PostgreSQL. It runs automated QA checks, reports errors with rich formatted output, and performs dependency-aware upserts.
Why Use pg-upsert?
- 7 Automated QA Checks – Validates NOT NULL, PRIMARY KEY, UNIQUE, FOREIGN KEY, CHECK CONSTRAINT, column existence, and column type compatibility before any modifications occur.
- Interactive Confirmation – Two UI backends: Textual TUI (terminal) and Tkinter (desktop). Auto-detected or choose with
--ui auto|textual|tkinter. - Structured Results –
run()returns anUpsertResultwith per-table stats, QA errors, and JSON serialization (--output=jsonfor CI/CD pipelines). - Schema Validation –
--check-schemaflag validates column existence and type compatibility without running data checks or upserts. - Flexible Upsert Strategies – Supports
upsert,update, andinsertmethods. - Dependency-Aware Ordering – Tables are processed in FK dependency order automatically.
- Rich Output – Colored pass/fail indicators, formatted tables, and dual console+logfile output.
Usage
Python API
from pg_upsert import PgUpsert
result = PgUpsert(
uri="postgresql://user@localhost:5432/mydb",
tables=("genres", "publishers", "books", "authors", "book_authors"),
staging_schema="staging",
base_schema="public",
do_commit=True,
upsert_method="upsert",
exclude_cols=("rev_user", "rev_time"),
exclude_null_check_cols=("book_alias",),
).run()
# UpsertResult provides structured access to results
print(result.qa_passed) # True if all QA checks passed
print(result.committed) # True if changes were committed
print(result.total_updated) # Total rows updated across all tables
print(result.total_inserted) # Total rows inserted across all tables
print(result.to_json()) # JSON serialization for CI/CD
Using an existing connection:
import psycopg2
from pg_upsert import PgUpsert
conn = psycopg2.connect(host="localhost", port=5432, dbname="mydb", user="user", password="pass")
ups = PgUpsert(
conn=conn,
tables=("genres", "publishers", "books"),
staging_schema="staging",
base_schema="public",
do_commit=True,
)
result = ups.run()
# Drop all ups_* temp objects (connection stays open)
ups.cleanup()
QA-only mode (no upsert):
from pg_upsert import PgUpsert
ups = PgUpsert(
uri="postgresql://user@localhost:5432/mydb",
tables=("genres", "books"),
staging_schema="staging",
base_schema="public",
).qa_all()
if not ups.qa_passed:
for err in ups.qa_errors:
print(f"{err.table}: {err.check_type.value} — {err.details}")
Schema compatibility check (column existence and type mismatches only):
from pg_upsert import PgUpsert
ups = PgUpsert(
uri="postgresql://user@localhost:5432/mydb",
tables=("genres", "books"),
staging_schema="staging",
base_schema="public",
).qa_column_existence().qa_type_mismatch()
if ups.qa_errors:
for err in ups.qa_errors:
print(f"{err.table}: {err.check_type.value} — {err.details}")
else:
print("Schemas are compatible")
Pipeline callbacks (per-table progress):
from pg_upsert import PgUpsert, CallbackEvent
def on_event(event):
if event.event == CallbackEvent.QA_TABLE_COMPLETE:
print(f"QA {'passed' if event.qa_passed else 'failed'} for {event.table}")
elif event.event == CallbackEvent.UPSERT_TABLE_COMPLETE:
print(f"{event.table}: {event.rows_inserted} inserted, {event.rows_updated} updated")
result = PgUpsert(
uri="postgresql://user@localhost:5432/mydb",
tables=("genres", "books"),
staging_schema="staging",
base_schema="public",
do_commit=True,
callback=on_event,
).run()
CLI
pg-upsert -h localhost -p 5432 -d mydb -u user \
-s staging -b public \
-t genres -t publishers -t books -t authors -t book_authors \
-x rev_user -x rev_time \
--commit
| Option | Description |
|---|---|
-h, --host |
Database host |
-p, --port |
Database port (default: 5432) |
-d, --database |
Database name |
-u, --user |
Database user (see Authentication) |
-s, --staging-schema |
Staging schema name (default: staging) |
-b, --base-schema |
Base schema name (default: public) |
-e, --encoding |
Database connection encoding (default: utf-8) |
-t, --table |
Table name to process (repeatable) |
-x, --exclude-columns |
Columns to exclude from upsert (repeatable) |
-n, --null-columns |
Columns to skip during NOT NULL checks (repeatable) |
-m, --upsert-method |
upsert, update, or insert (default: upsert) |
-c, --commit |
Commit changes (default: roll back) |
-i, --interactive |
Prompt for confirmation at each step |
-l, --logfile |
Write log to file (appends, does not overwrite) |
-o, --output |
Output format: text (default) or json |
--check-schema |
Validate column existence and types only, then exit |
--compact |
Use compact grid format for QA summary |
--ui |
Interactive UI: auto (default), textual, or tkinter |
-f, --config-file |
Path to YAML configuration file |
-g, --generate-config |
Generate a template config file |
-v, --version |
Show version and exit |
--docs |
Open documentation in browser |
--debug |
Enable debug output |
[!NOTE] CLI arguments take precedence over configuration file values. Explicit CLI flags are never overridden by the config file.
Configuration File
Create a YAML config file (see pg-upsert.example.yaml):
debug: false
commit: false
interactive: false
upsert_method: "upsert" # Options: "upsert", "insert", "update"
logfile: "pg_upsert.log"
host: "localhost"
port: 5432
user: "docker"
database: "dev"
staging_schema: "staging"
base_schema: "public"
encoding: "utf-8"
tables:
- "authors"
- "publishers"
- "books"
- "book_authors"
- "genres"
exclude_columns:
- "rev_time"
- "rev_user"
null_columns:
- "book_alias"
output: "text" # Options: "text", "json"
check_schema: false
compact: false
ui_mode: "auto" # Options: "auto", "textual", "tkinter"
Run with: pg-upsert -f config.yaml
Docker
docker run -it --rm \
-v $(pwd):/app \
ghcr.io/geocoug/pg-upsert:latest \
-h host.docker.internal -p 5432 -d dev -u docker \
-s staging -b public -t genres --commit
QA Checks
pg-upsert runs 7 types of QA checks on staging data before upserting:
| Check | What it validates |
|---|---|
| Column Existence | All base table columns exist in the staging table (respects --exclude-columns) |
| Column Type | No hard type incompatibilities between staging and base (uses PostgreSQL's pg_cast catalog) |
| NOT NULL | Non-nullable base columns have no NULL values in staging |
| Primary Key | No duplicate values in PK columns |
| Unique | No duplicate values in UNIQUE-constrained columns (NULLs allowed per PostgreSQL semantics) |
| Foreign Key | All FK references point to existing rows in the referenced table |
| Check Constraint | All CHECK constraint expressions evaluate to true |
See the QA Checks Reference for detailed documentation.
Authentication
pg-upsert resolves the database password in this order:
- Password in URI (Python API only) —
postgresql://user:pass@host/db PGPASSWORDenvironment variable — standard PostgreSQL convention, works with both CLI and API- Interactive prompt — if neither of the above is set
For CI/CD pipelines, use PGPASSWORD to avoid interactive prompts:
PGPASSWORD=secret pg-upsert -h host -d db -u user \
-s staging -b public -t books \
--output json --commit
pg-upsert also supports PostgreSQL's .pgpass file via psycopg2.
Exit Codes
| Code | Meaning |
|---|---|
| 0 | QA passed and upsert completed (or user cancelled) |
| 1 | QA failed, schema check failed, or error |
Contributing
See CONTRIBUTING.md for development setup, available recipes, testing, and release process.
git clone https://github.com/geocoug/pg-upsert
cd pg-upsert
just sync
just test
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pg_upsert-1.20.0.tar.gz.
File metadata
- Download URL: pg_upsert-1.20.0.tar.gz
- Upload date:
- Size: 1.3 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
25621af73e0aa7f77a3935ce8131651cc4a180b78c622b5697b1938d0d9598d2
|
|
| MD5 |
e6ef5af93e80fdd7fe9c7a3db876b904
|
|
| BLAKE2b-256 |
3a0e9b76dbad3e885468952cbd6786117c940647748a66f045748c0301793275
|
Provenance
The following attestation bundles were made for pg_upsert-1.20.0.tar.gz:
Publisher:
ci-cd.yml on geocoug/pg-upsert
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pg_upsert-1.20.0.tar.gz -
Subject digest:
25621af73e0aa7f77a3935ce8131651cc4a180b78c622b5697b1938d0d9598d2 - Sigstore transparency entry: 1237316377
- Sigstore integration time:
-
Permalink:
geocoug/pg-upsert@a6f1a4f53979d80c78be5fe104624073c86df065 -
Branch / Tag:
refs/tags/v1.20.0 - Owner: https://github.com/geocoug
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
ci-cd.yml@a6f1a4f53979d80c78be5fe104624073c86df065 -
Trigger Event:
push
-
Statement type:
File details
Details for the file pg_upsert-1.20.0-py3-none-any.whl.
File metadata
- Download URL: pg_upsert-1.20.0-py3-none-any.whl
- Upload date:
- Size: 81.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
187eb56ce7d2683cbdd9277f3490c6eaa440809c7dc84e66c0bc5d26825bf87e
|
|
| MD5 |
6f730121e468364b94db6e090d0fc622
|
|
| BLAKE2b-256 |
e1f8f3d1358ea17f354c4536065407344a7df073d2e4608c12f6554b015c8837
|
Provenance
The following attestation bundles were made for pg_upsert-1.20.0-py3-none-any.whl:
Publisher:
ci-cd.yml on geocoug/pg-upsert
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pg_upsert-1.20.0-py3-none-any.whl -
Subject digest:
187eb56ce7d2683cbdd9277f3490c6eaa440809c7dc84e66c0bc5d26825bf87e - Sigstore transparency entry: 1237316421
- Sigstore integration time:
-
Permalink:
geocoug/pg-upsert@a6f1a4f53979d80c78be5fe104624073c86df065 -
Branch / Tag:
refs/tags/v1.20.0 - Owner: https://github.com/geocoug
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
ci-cd.yml@a6f1a4f53979d80c78be5fe104624073c86df065 -
Trigger Event:
push
-
Statement type: