Skip to main content

Governance compiler for PostgreSQL

Project description

Tarkin: govern your data

An open source governance compiler for PostgreSQL.

Careful, Princess. Disclosing the Rebellion's location like that is strictly in violation of GDPR.

Why?

Data governance is tough. Regulations aren't made with implementation in mind, and data engineers often aren't the ones making those decisions anyway.

The result:

  • A lot of ad-hoc work in triggers, functions, and manual grants
  • Constant required fixes
  • Limited documentation (or a ton of work to write it all) Or, you take an off-the-shelf solution that's a complete black-box.

But it's not rocket science, just a lot of work. I made this to help. And also so that I don't have to write out column GRANTs ever again.

How it works

By design, Tarkin is an open book: open source, fully accessible code, fully human-readable output. The point of data governance is to keep things secure, so having any aspect of the process live in a black-box is counterintuitive.

Tarkin is run through a Command Line Interface (CLI) tool built in Python with Typer. Some commands generate YAMLs for you to view and modify, or SQL scripts for you to validate, and others apply those scripts once you've decided they're ready. Nothing happens without your direct approval. And don't just take my word for it — check the GitHub repo yourself.

Installation

pip install tarkin

Requires Python 3.11+ and PostgreSQL 14+, PostgreSQL 15+ for Row-Level Security (RLS), or PostgreSQL 16+ for MAINTAIN privileges.

Quick start

# Inspect a live database and generate a governance YAML
tarkin inspect --profile mydb

# Edit out/mydb_model.yaml to configure governance

# Compile and validate
tarkin validate out/mydb_model.yaml
tarkin build out/mydb_model.yaml --profile mydb

# Apply
tarkin attach --profile mydb

# Remove
tarkin detach --profile mydb --keep-versioning

Run tarkin help or tarkin --help for the full command reference.

Credentials

Tarkin uses a credentials.toml file (default: ~/.tarkin/credentials.toml) to store connection profiles:

[mydb]
host     = "localhost"
port     = 5432
database = "myapp"
username = "postgres"
password = "secret"

# Optional: HMAC key for HMAC256 column hashing
# hmac_key = "your-strong-secret-key"

Versioning columns

When a column has versioned: true, Tarkin adds __valid_from__ and __valid_to__ columns to the shadow table to maintain a full history of changes. These columns are intentionally not exposed through the public-facing view — the view layer presents only the declared columns. The _current view variant (e.g. users_current) is created automatically and filters to live records (__valid_to__ = 'infinity').

Because a versioned table keeps historical rows that reuse the same key values, its original single-row primary key is replaced with a partial unique index covering only the live row (__valid_to__ = 'infinity'). As a consequence, a versioned table cannot be the target of a foreign key; tarkin validate rejects any configuration that does this.

On tarkin detach:

  • With --keep-versioning: the __valid_from__ and __valid_to__ columns are retained in the restored table, along with all historical records. This is the safe default when history is valuable.
  • With --drop-versioning: only current records (__valid_to__ = 'infinity') are retained and the versioning columns are dropped. This operation is destructive and irreversible.

The versioning index (idx_<table>_current) is dropped when --drop-versioning is used, and retained otherwise.

Retention columns

When a table has retention_days set, Tarkin adds __expires_at__ and __erase_on_expiry__ columns to the shadow table to support time-based data expiry. These columns are intentionally not exposed through the public-facing view — the view layer presents only the declared columns.

__expires_at__ is a timestamptz column with a default of now() + interval '<retention_days> days', computed at INSERT time. __erase_on_expiry__ is a bool column defaulting to true. A partial index on __expires_at__ WHERE __erase_on_expiry__ = true is created to keep the scheduled sweep performant.

Setting __erase_on_expiry__ = false on any individual row exempts it from scheduled deletion — this is the mechanism for legal holds when a record must be retained beyond its normal expiry.

When retention_schedule is configured on the database, Tarkin registers a pg_cron job named tarkin_retention_<database> that calls __META__.tarkin_erase_expired_records() on the configured cron schedule. That function sweeps all tables registered in __META__.tarkin_retention, finds rows where __expires_at__ <= now() AND __erase_on_expiry__ = true, and applies the table's erase_strategy (delete, nullify, or obfuscate). Each sweep is logged to __META__.tarkin_erasures with was_scheduled = true.

On tarkin detach:

  • The pg_cron job is unscheduled (guarded by a check that pg_cron is installed, since it may have been removed independently)
  • The partial index idx_<table>_expires_at is dropped
  • The __expires_at__ and __erase_on_expiry__ columns are dropped from the shadow table before the schema rename

Unlike versioning, there is no keep/drop flag — retention columns are always removed on detach. Any records that had not yet expired are restored to the table without expiry metadata, and the operator is responsible for any cleanup.

Security

See SECURITY.md for:

  • Release integrity and SBOM verification
  • HMAC key management and rotation
  • Shadow schema model and detach guarantees
  • Column masking security notes (xxhash vs SHA vs HMAC)
  • Sensitive column enforcement
  • pgaudit configuration and restoration
  • Known limitations

Reference

See REFERENCE.md for an overview of all available CLI commands.

License

Apache 2.0. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tarkin-0.1.0.tar.gz (431.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tarkin-0.1.0-py3-none-any.whl (82.6 kB view details)

Uploaded Python 3

File details

Details for the file tarkin-0.1.0.tar.gz.

File metadata

  • Download URL: tarkin-0.1.0.tar.gz
  • Upload date:
  • Size: 431.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for tarkin-0.1.0.tar.gz
Algorithm Hash digest
SHA256 70758eace44bf8a7bb31942e4e081073d73e7152ebeb50bdbff3a95ade811dcd
MD5 db9814a20074e0d854e22db5a6ab2815
BLAKE2b-256 c9d2d2eca9ae6b1fdca93b16121b8478eef45de2522be39478cbb8a1899bb2a9

See more details on using hashes here.

Provenance

The following attestation bundles were made for tarkin-0.1.0.tar.gz:

Publisher: publish.yaml on BProgramming/tarkin

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tarkin-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: tarkin-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 82.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for tarkin-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5f4dd25cca2a750915dfba253da9a1b0166cbbfd0dbce54d5266f67b80a1a719
MD5 9245972d956013fb97258ff26e2a3ee1
BLAKE2b-256 fa28c9e349ee5adb1264164be9259e7a8a0b0e21ffeed404e47ffa3faf098075

See more details on using hashes here.

Provenance

The following attestation bundles were made for tarkin-0.1.0-py3-none-any.whl:

Publisher: publish.yaml on BProgramming/tarkin

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page