Skip to main content

No project description provided

Project description

Padmy

Python versions Latest PyPI version CI

CLI utility functions for PostgreSQL such as sampling and anonymization.

Installation

Just run

uv add padmy

1. Database Exploration

You can get information about a database by running

uvx padmy analyze --db test --schemas test

or using the docker image

 docker run -it \
   --network host \
   ghcr.io/soren/padmy:latest analyze --db test --schemas test

For instance, the following table definition will output:

CREATE TABLE table1
(
    id SERIAL PRIMARY KEY
);

CREATE TABLE table2
(
    id        SERIAL PRIMARY KEY,
    table1_id INT REFERENCES table1
);

CREATE TABLE table3
(
    id        SERIAL PRIMARY KEY,
    table1_id INT REFERENCES table1,
    table2_id INT REFERENCES table2
);

CREATE TABLE table4
(
    id        SERIAL PRIMARY KEY,
    table1_id INT REFERENCES table1
);
INSERT INTO table1(id)
SELECT generate_series(0, 10);

Default

Network schema

Network Schema (if --show-graphs is specified)

Network schema

2. Sampling

You can quickly sample (ie: take a subset) of a database by running

uvx padmy sample \
  --db test --to-db test-sampled \
  --sample 20 \
  --schemas public

This will sample the test database into a new test-sampled database, copy of the original one, keeping if possible (see: Annexe) 20% of the original database.

You can choose how to sample with more granularity by passing a configuration file. Here is an example:

# We want a default sampling size of 20% of each table count
sample: 20
# We want to sample `schema_1` and `schema_2`
schemas:
  - schema_1
  # We want a default size of 30% for the tables of this schema
  - name: schema_2
    sample: 30

tables:
  # We want a sample size of 10% for this table
  - schema: public
    table: table_3
    sample: 10

3. Anonymization

You can scrub PII from selected columns with padmy anonymize. Field types map to Faker generators (or NULL to blank the column). Available types:

Type Behavior
EMAIL faker.email() — supports domain: extra arg
NULL Sets the column to NULL (useful for hashes, tokens, anything you don't want to fake)
FIRST_NAME faker.first_name()
LAST_NAME faker.last_name()
NAME faker.name() (full name)
PHONE_NUMBER faker.phone_number()
DATE_OF_BIRTH faker.date_of_birth() — supports minimum_age: / maximum_age:
TEXT faker.text() — supports max_nb_chars:
WORD faker.word()

Example config:

tables:
  - schema: public
    table: users
    fields:
      - column: email
        type: EMAIL
        domain: example.com         # extra arg forwarded to faker.email
      - column: password_hash
        type: "NULL"                # quoted: YAML's bare NULL parses as null
      - column: first_name
        type: FIRST_NAME
      - column: birthdate
        type: DATE_OF_BIRTH
        minimum_age: 18
        maximum_age: 80

Run with:

uvx padmy anonymize --db test -f config.yml

4. Migration utils

Setting up

This library includes a migration utility to help you evolve your data model. In order to use it, start by setting up the migration table:

uvx padmy -vv migrate setup --db postgres

This will create the public.migration table that stores all the migration / rollback that will be applied.

Setting up the Schemas

Now that we are all setup, let's create our first sql file that will create the schema:

rm -rf /tmp/sql
mkdir /tmp/sql
uvx padmy -v migrate new-sql 1 --sql-dir /tmp/sql
tree /tmp/sql

Add CREATE SCHEMA general; to the file.

echo "CREATE SCHEMA general;" >> /tmp/sql/0001_new_file.sql
cat /tmp/sql/0001_new_file.sql

Then apply the modifications to the database:

uvx padmy -v migrate apply-sql --sql-dir /tmp/sql --db postgres 

[!WARNING] This will run through all the files in the /tmp/sql folder (in order) run them. Sql files here need to be IDEMPOTENT

Creating a first migration

Now, lets create our first migration:

migration_dir="/tmp/migrations"
rm -rf "$migration_dir"
mkdir "$migration_dir" # You can choose a different folder to store your migrations
uvx padmy -v migrate new --sql-dir "$migration_dir" --author padmy
tree "$migration_dir"

This will create 2 new files:

  • up: {timestamp}-{migration_id}-up.sql that contains your migration to apply to the database.
  • down: {timestamp}-{migration_id}-down.sql that contains the code to revert your changes.

[!NOTE]
A migration contains:

  • a reference to the previous file (if any): -- Prev-file: {timestamp}-{migration_id}
  • the name of the author who created the migration: -- Author: {author}
  • (optionally) should the down migration be ignored: -- Skip-verify: {reason}

Let's now modify the up.sql file with:

CREATE TABLE IF NOT EXISTS general.test
(
    id  int primary key,
    foo int
);

CREATE TABLE IF NOT EXISTS general.test2
(
    id  serial primary key,
    foo text
);
file="$migration_dir/$(ls "$migration_dir" | grep up.sql)"
printf "\n" >> "$file"
cat <<'SQL' >> "$file"
CREATE TABLE IF NOT EXISTS general.test (
    id  INT PRIMARY KEY,
    foo INT
);

CREATE TABLE IF NOT EXISTS general.test2 (
    id  SERIAL PRIMARY KEY,
    foo TEXT
);
SQL

and check that the migration is valid:

uvx padmy -v migrate verify --sql-dir /tmp/migrations --schemas general

Because we did not add anything to the down.sql file, the command returns an error. Let's modify it to make the command pass:

DROP table general.test;
DROP table general.test2;
uvx padmy -vv migrate verify --sql-dir /tmp/migrations

We are all good !

Optional: You can also verify that the order of the migration is correct by running:

uvx padmy -vv migrate verify-files --sql-dir /tmp/migrations --no-raise

5. Comparing databases schemas

You can compare two databases by running:

uvx padmy -vv schema-diff --db soren --schemas schema_1,schema_2

If differences are found, the command will output the differences between the two databases.

Known limitations

Exact sample size

Sometimes, we cannot guaranty that the sampled table will have the exact expected size.

For instance let's say we want 10% of table1 and 10% of table2, given the following table definitions:

CREATE TABLE table1
(
    id SERIAL PRIMARY KEY
);

CREATE TABLE table2
(
    id        SERIAL PRIMARY KEY,
    table1_id INT NOT NULL REFERENCES table1
);

INSERT INTO table1(id)
VALUES (1);

INSERT INTO table2(table1_id)
SELECT 1
FROM generate_series(1, 10);

In this case, it's not possible to have less that 100% of table 1 since it has only 1 key on which depend all the table1_id rows of table2.

Cyclic foreign keys

Cyclic foreign keys (table with a FK on another table that reference the previous one) are not supported. Here is an example.

CREATE TABLE table1
(
    id        SERIAL PRIMARY KEY,
    table2_id INT NOT NULL
);

CREATE TABLE table2
(
    id        SERIAL PRIMARY KEY,
    table1_id INT NOT NULL
);

ALTER TABLE table1
    ADD CONSTRAINT table1_table2_id_fk
        FOREIGN KEY (table2_id) REFERENCES table2;

ALTER TABLE table2
    ADD CONSTRAINT table2_table1_id_fk
        FOREIGN KEY (table1_id) REFERENCES table1;

Cyclic dependencies

You can display cycling dependencies in a database by running:

uvx padmy -vv analyze --db test --schemas test --show-graph

(Note:: you'll need to have installed the network extra )

Self referencing foreign keys

Foreign keys referencing another column in the same table are ignored.

CREATE TABLE table1
(
    id        SERIAL PRIMARY KEY,
    parent_id INT REFERENCES table1
);

Annexes

Showing Network in Jupyter

You can display the network visualization in Jupyter.

Start by launching a JupyterLab session with:

uv run --group notebook jupyter lab

Then create a new notebook and run the following code:

from dash import Dash
from padmy.sampling import network, viz, sampling
from padmy.utils import init_connection
import asyncpg

PG_URL = 'postgresql://postgres:postgres@localhost:5432/test'

app = Dash(__name__)

db = sampling.Database(name='test')

async with asyncpg.create_pool(PG_URL, init=init_connection) as pool:
    await db.explore(pool, ['public'])

g = network.convert_db(db)

app.layout = viz.get_layout(g,
                            style={'width': '100%', 'height': '800px'},
                            layout='klay')

app.run_server(mode='jupyterlab')  # or mode='inline'

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

padmy-0.26.1.tar.gz (35.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

padmy-0.26.1-py3-none-any.whl (43.0 kB view details)

Uploaded Python 3

File details

Details for the file padmy-0.26.1.tar.gz.

File metadata

  • Download URL: padmy-0.26.1.tar.gz
  • Upload date:
  • Size: 35.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.14 {"installer":{"name":"uv","version":"0.11.14","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for padmy-0.26.1.tar.gz
Algorithm Hash digest
SHA256 c0ae4cf651d12e7609f7120c00a91c014ea2a1e58df6f1710d378e17ff31e885
MD5 c8de6a1d4d746e12ccbb18fa116aba8d
BLAKE2b-256 586fadccc4eb13e5ce51b80aefed00195c81bdf1e087b373891e16ce78de7d37

See more details on using hashes here.

File details

Details for the file padmy-0.26.1-py3-none-any.whl.

File metadata

  • Download URL: padmy-0.26.1-py3-none-any.whl
  • Upload date:
  • Size: 43.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.14 {"installer":{"name":"uv","version":"0.11.14","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for padmy-0.26.1-py3-none-any.whl
Algorithm Hash digest
SHA256 21bc8bab0b38bc17089857ad5b332c989a5414cc93b6872802fe94efd7e20bb9
MD5 c3ba5b4e54c4fa3b9511611c8560e9a1
BLAKE2b-256 0cf316d9977423c2db716ce9db947af6b6a11395298cb1612c44a85836857f72

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page