Governance compiler for PostgreSQL
Project description
Tarkin: govern your data
An open source governance compiler for PostgreSQL.
Why?
Data governance is tough. Regulations aren't made with implementation in mind, and data engineers often aren't the ones making those decisions anyway.
The result:
- A lot of ad-hoc work in triggers, functions, and manual grants
- Constant required fixes
- Limited documentation (or a ton of work to write it all) Or, you take an off-the-shelf solution that's a complete black-box.
But it's not rocket science, just a lot of work. I made this to help. And also so that I don't have to write out column GRANTs ever again.
How it works
By design, Tarkin is an open book: open source, fully accessible code, fully human-readable output. The point of data governance is to keep things secure, so having any aspect of the process live in a black-box is counterintuitive.
Tarkin is run through a Command Line Interface (CLI) tool built in Python with Typer. Some commands generate YAMLs for you to view and modify, or SQL scripts for you to validate, and others apply those scripts once you've decided they're ready. Nothing happens without your direct approval. And don't just take my word for it — check the GitHub repo yourself.
Installation
pip install tarkin
Requires Python 3.11+ and PostgreSQL 14+, PostgreSQL 15+ for Row-Level Security (RLS), or PostgreSQL 16+ for MAINTAIN privileges.
Quick start
# Inspect a live database and generate a governance YAML
tarkin inspect --profile mydb
# Edit out/mydb_model.yaml to configure governance
# Compile and validate
tarkin validate out/mydb_model.yaml
tarkin build out/mydb_model.yaml --profile mydb
# Apply
tarkin attach --profile mydb
# Remove
tarkin detach --profile mydb --keep-versioning
Run tarkin help or tarkin --help for the full command reference.
Credentials
Tarkin uses a credentials.toml file (default: ~/.tarkin/credentials.toml) to store connection profiles:
[mydb]
host = "localhost"
port = 5432
database = "myapp"
username = "postgres"
password = "secret"
# Optional: HMAC key for HMAC256 column hashing
# hmac_key = "your-strong-secret-key"
Versioning columns
When a column has versioned: true, Tarkin adds __valid_from__ and __valid_to__ columns to the shadow table to maintain a full history of changes. These columns are intentionally not exposed through the public-facing view — the view layer presents only the declared columns. The _current view variant (e.g. users_current) is created automatically and filters to live records (__valid_to__ = 'infinity').
Because a versioned table keeps historical rows that reuse the same key values, its original single-row primary key is replaced with a partial unique index covering only the live row (__valid_to__ = 'infinity'). As a consequence, a versioned table cannot be the target of a foreign key; tarkin validate rejects any configuration that does this.
On tarkin detach:
- With
--keep-versioning: the__valid_from__and__valid_to__columns are retained in the restored table, along with all historical records. This is the safe default when history is valuable. - With
--drop-versioning: only current records (__valid_to__ = 'infinity') are retained and the versioning columns are dropped. This operation is destructive and irreversible.
The versioning index (idx_<table>_current) is dropped when --drop-versioning is used, and retained otherwise.
Retention columns
When a table has retention_days set, Tarkin adds __expires_at__ and __erase_on_expiry__ columns to the shadow table to support time-based data expiry. These columns are intentionally not exposed through the public-facing view — the view layer presents only the declared columns.
__expires_at__ is a timestamptz column with a default of now() + interval '<retention_days> days', computed at INSERT time. __erase_on_expiry__ is a bool column defaulting to true. A partial index on __expires_at__ WHERE __erase_on_expiry__ = true is created to keep the scheduled sweep performant.
Setting __erase_on_expiry__ = false on any individual row exempts it from scheduled deletion — this is the mechanism for legal holds when a record must be retained beyond its normal expiry.
When retention_schedule is configured on the database, Tarkin registers a pg_cron job named tarkin_retention_<database> that calls __META__.tarkin_erase_expired_records() on the configured cron schedule. That function sweeps all tables registered in __META__.tarkin_retention, finds rows where __expires_at__ <= now() AND __erase_on_expiry__ = true, and applies the table's erase_strategy (delete, nullify, or obfuscate). Each sweep is logged to __META__.tarkin_erasures with was_scheduled = true.
On tarkin detach:
- The pg_cron job is unscheduled (guarded by a check that pg_cron is installed, since it may have been removed independently)
- The partial index
idx_<table>_expires_atis dropped - The
__expires_at__and__erase_on_expiry__columns are dropped from the shadow table before the schema rename
Unlike versioning, there is no keep/drop flag — retention columns are always removed on detach. Any records that had not yet expired are restored to the table without expiry metadata, and the operator is responsible for any cleanup.
Security
See SECURITY.md for:
- Release integrity and SBOM verification
- HMAC key management and rotation
- Shadow schema model and detach guarantees
- Column masking security notes (xxhash vs SHA vs HMAC)
- Sensitive column enforcement
- pgaudit configuration and restoration
- Known limitations
Reference
See REFERENCE.md for an overview of all available CLI commands.
License
Apache 2.0. See LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tarkin-0.1.0.tar.gz.
File metadata
- Download URL: tarkin-0.1.0.tar.gz
- Upload date:
- Size: 431.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
70758eace44bf8a7bb31942e4e081073d73e7152ebeb50bdbff3a95ade811dcd
|
|
| MD5 |
db9814a20074e0d854e22db5a6ab2815
|
|
| BLAKE2b-256 |
c9d2d2eca9ae6b1fdca93b16121b8478eef45de2522be39478cbb8a1899bb2a9
|
Provenance
The following attestation bundles were made for tarkin-0.1.0.tar.gz:
Publisher:
publish.yaml on BProgramming/tarkin
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
tarkin-0.1.0.tar.gz -
Subject digest:
70758eace44bf8a7bb31942e4e081073d73e7152ebeb50bdbff3a95ade811dcd - Sigstore transparency entry: 1588672456
- Sigstore integration time:
-
Permalink:
BProgramming/tarkin@53b4741cd070bc53728c95b63945bebceb1cc674 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/BProgramming
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yaml@53b4741cd070bc53728c95b63945bebceb1cc674 -
Trigger Event:
release
-
Statement type:
File details
Details for the file tarkin-0.1.0-py3-none-any.whl.
File metadata
- Download URL: tarkin-0.1.0-py3-none-any.whl
- Upload date:
- Size: 82.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5f4dd25cca2a750915dfba253da9a1b0166cbbfd0dbce54d5266f67b80a1a719
|
|
| MD5 |
9245972d956013fb97258ff26e2a3ee1
|
|
| BLAKE2b-256 |
fa28c9e349ee5adb1264164be9259e7a8a0b0e21ffeed404e47ffa3faf098075
|
Provenance
The following attestation bundles were made for tarkin-0.1.0-py3-none-any.whl:
Publisher:
publish.yaml on BProgramming/tarkin
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
tarkin-0.1.0-py3-none-any.whl -
Subject digest:
5f4dd25cca2a750915dfba253da9a1b0166cbbfd0dbce54d5266f67b80a1a719 - Sigstore transparency entry: 1588672475
- Sigstore integration time:
-
Permalink:
BProgramming/tarkin@53b4741cd070bc53728c95b63945bebceb1cc674 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/BProgramming
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yaml@53b4741cd070bc53728c95b63945bebceb1cc674 -
Trigger Event:
release
-
Statement type: