PostgreSQL schema inventory, SQL extraction, and documentation generation tool
Project description
pgcarter
A production-quality PostgreSQL schema inventory, SQL extraction,
documentation generation, and database-shape analysis tool. pgcarter is
both a schema extraction tool and a lightweight database discovery and
profiling platform.
๐ Full documentation: https://blu3pr1n7.github.io/PGCarter/
Quick start
uv pip install -e . # or: pip install -e .
# 1. Extract a schema inventory (SQL + JSON + docs)
pgcarter index --database mydb --output-dir ./inventory
# 2. Profile it offline (no database needed)
pgcarter analyze --input ./inventory/json --output ./analysis
pgcarter connects to a PostgreSQL database and produces three independent
outputs:
- Executable SQL โ deterministic, ordered PostgreSQL DDL files that can be
run to recreate database structures (plus a single ordered
apply.sql). - Structured metadata โ JSON files describing every extracted asset.
- Documentation โ Markdown rendered entirely from user-provided Jinja2 templates.
A fourth capability, the analyze subcommand, performs automated database
shape analysis โ table sizes, column characteristics, data-quality indicators,
relationships, and schema-design patterns. See
Database analysis & profiling.
It never exports table data. This is a schema and metadata extraction utility only โ no
INSERT, noCOPY, no row contents. The analyzer is likewise read-only: every profiling query is a validatedSELECT.
Design boundaries
The two output layers are strictly separated:
- The SQL generation layer depends on metadata models only โ never on templates.
- The documentation layer is entirely template-driven (Jinja2) and generates no SQL. No Markdown is embedded in Python source.
PostgreSQL โโโถ Extractors โโโถ Metadata Models โโฌโโถ SQL Generator โโโถ sql/
โโโถ Jinja Renderer โโโถ docs/
โโโโถ json/
Installation
Requires Python 3.12+. Uses psycopg
(psycopg3) and jinja2. No SQLAlchemy.
# with uv (recommended)
uv venv --python 3.12
uv pip install -e ".[dev]"
# or with pip
python3.12 -m venv .venv && . .venv/bin/activate
pip install -e ".[dev]"
Usage
The CLI (built with Typer) has two subcommands:
index (schema extraction) and analyze (shape analysis & profiling).
pgcarter index \
--host localhost \
--port 5432 \
--database mydb \
--user postgres \
--password secret \
--output-dir ./inventory \
--templates-dir ./templates
| Argument | Default | Notes |
|---|---|---|
--host |
localhost |
|
--port |
5432 |
|
--database |
(required) | |
--user |
postgres |
|
--password |
โ | falls back to PGPASSWORD |
--output-dir |
the database name | created if missing; must be writable |
--templates-dir |
./templates |
must exist for docs to be generated |
--schema |
public |
repeatable; architecture supports multiple schemas |
--log-level |
INFO |
DEBUG/INFO/WARNING/ERROR (or LOG_LEVEL) |
--pretty / --no-pretty |
JSON | colourised console logs for local dev (or LOG_PRETTY) |
The same --log-level / --pretty options apply to every subcommand.
The connection never writes: it issues SET default_transaction_read_only = on.
Output structure
inventory/
โโโ sql/
โ โโโ apply.sql # full schema, dependency-ordered, one run
โ โโโ database.sql
โ โโโ extensions.sql
โ โโโ roles.sql
โ โโโ schemas/
โ โโโ public.sql
โ โโโ public/
โ โโโ tables/ indexes/ functions/
โ โโโ views/ triggers/ sequences/
โ โโโ permissions/
โโโ json/
โ โโโ database.json schemas.json tables.json indexes.json
โ โโโ views.json functions.json triggers.json sequences.json
โ โโโ extensions.json roles.json permissions.json
โ โโโ relationships.json
โ โโโ relationships.dot # Graphviz relationship graph
โ โโโ schemas/<schema>.json
โ โโโ tables/<table>.json
โโโ docs/
โ โโโ index.md database.md roles.md permissions.md
โ โโโ schemas/<schema>/{schema.md, tables/, views/, functions/, triggers/, indexes/}
โโโ report.json # extracted / skipped / warnings / errors
Every SQL file starts with a provenance header:
-- Generated by pgcarter
-- Database: <database>
-- Object: <object>
-- Generated: <timestamp>
Every generated document starts with the equivalent HTML-comment header.
apply.sql โ guaranteed executable
Individual object files are executable when applied in dependency order. For
convenience sql/apply.sql is a single, dependency-ordered script
(extensions โ roles โ schemas โ sequences โ tables (FK-sorted) โ sequence
ownership โ indexes โ views (dependency-sorted) โ functions โ triggers โ
grants) that recreates the whole schema in one run. Role creation is guarded so
the script is re-runnable. This is exercised by the integration suite, which
replays apply.sql into a fresh database.
Extraction scope
Reads from PostgreSQL system catalogs (pg_catalog) and information_schema.
Extracts: database, schemas, tables (columns, types, defaults, nullability,
identity and generated columns, comments), constraints (PK/FK/unique/check/
exclusion), indexes (incl. partial & expression), views & materialized views,
functions & procedures (signature, parameters, return type, language,
volatility, security mode, owner), triggers, sequences, extensions, roles
(attributes, memberships), privileges (database/schema/table/column/sequence/
function), and a relationship graph (foreign keys, view/trigger/sequence
dependencies).
Default scope is the public schema; pass --schema (repeatable) for others.
Documentation templates
Documentation is completely template-driven. Templates are discovered
dynamically โ if a template is missing, a warning is logged and processing
continues. Provide a templates/ directory:
templates/
โโโ index.md.j2 # home page (database overview)
โโโ database.md.j2 # database detail
โโโ schema.md.j2 # per schema
โโโ table.md.j2 # per table
โโโ indexes.md.j2 # per-schema index overview
โโโ function.md.j2 # per function/procedure
โโโ view.md.j2 # per view / materialized view
โโโ trigger.md.j2 # per trigger
โโโ permissions.md.j2 # privilege matrix
โโโ roles.md.j2 # roles
Note: the spec lists
index.md.j2twice. Hereindex.md.j2is the home page andindexes.md.j2renders the per-schema index overview, avoiding the filename collision.
Templates receive the full structured context (database, schemas, tables,
indexes, views, functions, triggers, sequences, extensions,
roles, permissions, relationships, generated_at) plus object-specific
variables (e.g. table, function, schema_tables).
Error handling
Connection failures, insufficient privileges, and unsupported objects are
handled gracefully: a failure in one extractor is captured and the run
continues. Everything is summarised in report.json:
{
"database": "mydb",
"summary": { "extracted": {"tables": 120}, "skipped_count": 1,
"warning_count": 0, "error_count": 0, "generated_file_count": 412 },
"skipped": [], "warnings": [], "errors": [], "generated_files": []
}
Exit codes: 0 success, 2 completed with recorded errors, 1 fatal error.
Database analysis & profiling
The analyze subcommand turns the inventory into an understanding of the
database's shape: what kinds of datasets exist, how big tables are, column
characteristics, data-quality signals, relationships, and likely design issues.
It is not a data dump โ it profiles structure and statistics.
The existing JSON inventory is the analyzer's first input source, so it runs in two modes.
Offline mode (structure only, no database)
Analyze an existing JSON inventory with no connection. Identifies possible checks from table structure, column names, data types, constraints, relationships, and indexes โ and records the exact read-only SQL each check would run online.
pgcarter analyze --input ./inventory/json
Online mode (connect and profile)
Connects to PostgreSQL and enriches the structural analysis with row counts, null statistics, cardinality estimates, value distributions, freshness checks, and size metrics.
pgcarter analyze \
--database mydb \
--schema public \
--output ./analysis \
--sample-size 10000
| Argument | Default | Notes |
|---|---|---|
--input |
โ | JSON inventory directory โ offline mode |
--database |
โ | connect & profile โ online mode |
--host/--port/--user/--password |
localhost/5432/postgres/PGPASSWORD |
online connection |
--schema |
public |
repeatable |
--output |
./analysis |
output directory |
--templates-dir |
./templates |
Jinja2 templates for analysis docs |
--config |
โ | analysis YAML (enabled checks, thresholds) |
--sample-size |
โ | row cap for expensive per-column scans |
--statement-timeout |
0 |
per-query timeout (ms); an overrun is logged and skipped |
Provide --input (offline), --database (online), or both (use the JSON as the
inventory base while connecting for statistics).
Output
analysis/
โโโ report.json # full analysis (tables, metrics, checks, warnings)
โโโ report.md # human-readable summary (rendered from a template)
โโโ warnings.json # every non-informational finding
โโโ tables/
โ โโโ users.json # per-table metrics, columns, checks
โ โโโ orders.json
โโโ run-report.json # run summary (extracted counts / errors)
โโโ docs/analysis/
โโโ index.md # rendered overview
โโโ warnings.md
โโโ tables/<table>.md
A worked example lives in examples/analysis/.
Checks
Checks are plugin-style: each is a class registered with @register, and
adding a new check requires nothing more than dropping a new class into the
relevant pgcarter/analyzer/checks/ module. Categories:
- Tables โ
table_size,row_count(empty / extremely large), andgrowth_indicators(freshness window fromcreated_at/updated_at/โฆ), plus structuraltable_structure(missing primary key, very wide tables). - Columns โ
null_analysis,cardinality(unique identifiers, low cardinality, text enums),distribution(min/max/avg),string_profiling(avg/min/max length), and name heuristics:identifier_detection,timestamp_detection,email_detection,status_columns,suspicious_columns. - Indexes โ
duplicate_indexes,missing_fk_indexes,unused_indexes. - Relationships โ
heavily_referenced(fan-in importance),relationship_depth,orphan_relationships(rows with a missing parent). - Quality โ
duplicate_primary_keys,duplicate_unique_values,unused_tables.
Each finding has a severity (info/warning/critical); warnings and
criticals are collected into warnings.json.
Resilience & progress
Online runs degrade gracefully. A table the connecting role cannot read
(permission denied), a missing object, or a query that exceeds
--statement-timeout is logged as a single warning and skipped โ recorded
under skipped in run-report.json, never escalated to a run error (the exit
code stays 0). The denied relation is remembered, so its remaining columns are
skipped without further round trips, and it still receives structural analysis.
Per-table progress is logged ([n/N] analyzing โฆ) so long runs are observable.
For large databases, prefer --sample-size (bounds per-column scans) and
--statement-timeout (caps any single query); together they keep a run bounded
and visible.
Query safety & performance
Every generated query is mechanically validated to be a single read-only
SELECT/WITH with no INSERT/UPDATE/DELETE/DROP/TRUNCATE/ALTER/โฆ
(see pgcarter/analyzer/queries.py::assert_safe). Identifiers are quoted and
schema-qualified. Row-count and size checks prefer PostgreSQL's own statistics
(pg_class.reltuples, pg_total_relation_size) over scanning, and
--sample-size N bounds expensive per-column aggregates with a LIMIT
subquery. Identical queries from different checks share one round trip.
Configuration
analysis:
enabled_checks: # omit to run every check
- null_analysis
- cardinality
- table_size
thresholds:
high_null_percentage: 80
low_cardinality_limit: 10
large_table_rows: 10000000
unique_ratio: 0.99
long_text_length: 10000
# sample_size: 10000 # --sample-size overrides
pgcarter analyze --input ./inventory/json --config analysis.yml
A ready-to-edit example is in analysis.yml.
Analysis templates
Like the rest of the project, analysis documentation is completely
template-driven โ no Markdown is embedded in Python. Add to templates/:
templates/
โโโ analysis.md.j2 # overview (report.md + docs/analysis/index.md)
โโโ table_analysis.md.j2 # per table
โโโ column_analysis.md.j2 # per-column fragment (included by table_analysis)
โโโ warnings.md.j2 # warnings list
Logging
Logging is built on structlog. The default is
structured JSON on stdout, ready for log aggregators (Datadog, ELK/
OpenSearch, CloudWatch, Loki, โฆ):
{"event": "database_connecting", "host": "localhost", "database": "mydb",
"level": "info", "logger": "pgcarter.extractor.connection",
"timestamp": "2026-06-23T10:30:00.123456Z"}
For local development, enable colourised console output:
pgcarter index --database mydb --pretty
# or, for any process:
LOG_PRETTY=true LOG_LEVEL=DEBUG pgcarter analyze --input ./inventory/json
2026-06-23T10:30:00Z [info] database_connecting host=localhost database=mydb
Configuration:
| Source | Production (default) | Local dev |
|---|---|---|
| CLI flag | --no-pretty |
--pretty |
| Env var | LOG_PRETTY=false |
LOG_PRETTY=true |
| Level | --log-level / LOG_LEVEL (default INFO) |
Using it in code
Setup happens once, in pgcarter/logging_config.py:
from pgcarter.logging_config import configure_logging, get_logger
configure_logging(pretty_logs=False, level="INFO") # JSON to stdout
log = get_logger(__name__)
log.info("user_created", user_id=123, plan="enterprise") # structured
log.exception("payment_failed", payment_id=123) # structured traceback
Both stdlib logging.getLogger(...) and structlog get_logger(...)
render through the same pipeline, so existing log.info("Extracting %s", x)
calls keep working alongside structured events.
Attach request/job correlation (or any global metadata) with contextvars โ it is added to every subsequent event automatically:
import structlog
structlog.contextvars.bind_contextvars(service="pgcarter", request_id="abc123")
Never log secrets, tokens, or sensitive row data โ logging stays advisory and metadata-only, like the rest of the tool.
Documentation site
The project documentation (this README's content plus installation, usage, CLI, architecture, and API reference) is published with MkDocs Material to GitHub Pages: https://blu3pr1n7.github.io/PGCarter/.
Build or preview it locally:
uv pip install -e ".[docs]" # or: make docs
make docs-serve # live preview at http://127.0.0.1:8000
make docs-build # strict build (fails on warnings/broken links)
The .github/workflows/docs.yml workflow builds the site with
mkdocs build --strict and deploys it on every push to master.
One-time manual step: GitHub Pages must be set to publish from GitHub Actions before the first deploy. In the repository, go to Settings โ Pages โ Build and deployment and set Source to โGitHub Actionsโ. (This cannot be enabled from the workflow itself; it is a repository setting.) Once set, the
docsworkflow publishes automatically.
Development
make dev # install with dev dependencies
make test # unit tests (no database needed)
make lint # ruff
make typecheck # mypy
make coverage # unit tests with coverage
Integration / e2e tests (dockerised database)
A dedicated, disposable PostgreSQL 16 instance is provided via
docker-compose.yml (host port 55432, isolated from any other local
Postgres). The seed schema in tests/fixtures/seed.sql exercises every asset
category.
make db-up # start the test DB and wait until healthy
make test-integration # run integration tests against it
make e2e # run the index CLI end-to-end into ./build/e2e
make e2e-analyze # run an online analysis into ./build/analysis
make test-all # unit + integration
make db-down # tear down (removes the data)
The integration suite asserts that no row data is ever emitted and that the
generated apply.sql replays cleanly into a fresh database.
Project layout
pgcarter/
โโโ cli.py # Typer entry point (index + analyze subcommands)
โโโ config.py # configuration + defaults
โโโ main.py # orchestration
โโโ report.py # run report
โโโ logging_config.py # structured logging
โโโ models/ # dataclass metadata models (single source of truth)
โโโ extractor/ # one module per asset category + orchestrator
โโโ sql/ # deterministic DDL generation (template-free)
โโโ docs/ # Jinja2 renderer (no SQL)
โโโ writers/ # SQL / JSON / DOT / apply.sql writers
โโโ analyzer/ # database shape analysis & profiling
โโโ runner.py # analyze orchestration (offline/online)
โโโ config.py # analysis config (enabled checks, thresholds)
โโโ models.py # analysis result models
โโโ heuristics.py # column name/type semantics
โโโ queries.py # read-only SQL builders + safety guard
โโโ queries/ # *.sql templates (table_stats, column_stats, relationships)
โโโ rules.py # Check base, registry, AnalysisContext
โโโ checks/ # plugin checks: tables/columns/indexes/relationships/quality/statistics
โโโ engine.py # runs checks, assembles the report
โโโ loader.py # reconstruct an Inventory from JSON (offline input)
โโโ writer.py # analysis JSON + Jinja2 docs
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pgcarter-0.1.0.tar.gz.
File metadata
- Download URL: pgcarter-0.1.0.tar.gz
- Upload date:
- Size: 223.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.21 {"installer":{"name":"uv","version":"0.11.21","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"CachyOS Linux","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a0a5e3f7201ca5cfd851948598a7188ccf6c2e7629758e0e3735eac3e29e19ce
|
|
| MD5 |
0e7d3949ac587a2603e7c1e98ad9e161
|
|
| BLAKE2b-256 |
77aed4894ec0b17208bd26acfe8484064418c0f5d9af20e3eb19325c49d0c066
|
File details
Details for the file pgcarter-0.1.0-py3-none-any.whl.
File metadata
- Download URL: pgcarter-0.1.0-py3-none-any.whl
- Upload date:
- Size: 82.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.21 {"installer":{"name":"uv","version":"0.11.21","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"CachyOS Linux","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6d7b4a62ba9bf4a5772af39ba625b249196087114f05fae3fdfee7ed6f576e99
|
|
| MD5 |
463917021703fea267441af815db2c0e
|
|
| BLAKE2b-256 |
61b96d88e623869f6a16d0d7a8b6d33f351a6b2d8e0b7970bcb1e96839cabc7a
|