No project description provided
Project description
Padmy
CLI utility functions for Postgresql such as sampling and anonymization.
Installation
Just run
uv add padmy
1. Database Exploration
You can get information about a database by running
uvx padmy analyze --db test --schemas test
or using the docker image
docker run -it \
--network host \
tracktor/padmy:latest analyze --db test --schemas test
For instance, the following table definition will output:
CREATE TABLE table1
(
id SERIAL PRIMARY KEY
);
CREATE TABLE table2
(
id SERIAL PRIMARY KEY,
table1_id INT REFERENCES table1
);
CREATE TABLE table3
(
id SERIAL PRIMARY KEY,
table1_id INT REFERENCES table1,
table2_id INT REFERENCES table2
);
CREATE TABLE table4
(
id SERIAL PRIMARY KEY,
table1_id INT REFERENCES table1
);
INSERT INTO table1(id)
SELECT generate_series(0, 10);
Default
Network Schema (if --show-graphs is specified)
2. Sampling
You can quickly sample (ie: take a subset) of a database by running
uvx padmy sample \
--db test --to-db test-sampled \
--sample 20 \
--schemas public
This will sample the test database into a new test-sampled database, copy of the
original one, keeping if possible (see: Annexe) 20% of the original database.
You can choose how to sample with more granularity by passing a configuration file. Here is an example:
# We want a default sampling size of 20% of each table count
sample: 20
# We want to sample `schema_1` and `schema_2`
schemas:
- schema_1
# We want a default size of 30% for the tables of this schema
- name: schema_2
sample: 30
tables:
# We want a sample size of 10% for this table
- schema: public
table: table_3
sample: 10
3. Migration utils
Setting up
This library includes a migration utility to help you evolve your data model. In order to use it, start by setting up the migration table:
uvx padmy -vv migrate setup --db postgres
This will create the public.migration table that stores all the migration / rollback that
will be applied.
Setting up the Schemas
Now that we are all setup, let's create our first sql file that will create the schema:
rm -rf /tmp/sql
mkdir /tmp/sql
uvx padmy -v migrate new-sql 1 --sql-dir /tmp/sql
tree /tmp/sql
Add CREATE SCHEMA general; to the file.
echo "CREATE SCHEMA general;" >> /tmp/sql/0001_new_file.sql
cat /tmp/sql/0001_new_file.sql
Then apply the modifications to the database:
uvx padmy -v migrate apply-sql --sql-dir /tmp/sql --db postgres
[!WARNING] This will run through all the files in the
/tmp/sqlfolder (in order) run them. Sql files here need to be IDEMPOTENT
Creating a first migration
Now, lets create our first migration:
migration_dir="/tmp/migrations"
rm -rf "$migration_dir"
mkdir "$migration_dir" # You can choose a different folder to store your migrations
uvx padmy -v migrate new --sql-dir "$migration_dir" --author padmy
tree "$migration_dir"
This will create 2 new files:
- up:
{timestamp}-{migration_id}-up.sqlthat contains your migration to apply to the database. - down:
{timestamp}-{migration_id}-down.sqlthat contains the code to revert your changes.
[!NOTE]
A migration contains:
- a reference to the previous file (if any):
-- Prev-file: {timestamp}-{migration_id}- the name of the author who created the migration:
-- Author: {author}- (optionally) should the down migration be ignored:
-- Skip-verify: {reason}
Let's now modify the up.sql file with:
CREATE TABLE IF NOT EXISTS general.test
(
id int primary key,
foo int
);
CREATE TABLE IF NOT EXISTS general.test2
(
id serial primary key,
foo text
);
file="$migration_dir/$(ls "$migration_dir" | grep up.sql)"
printf "\n" >> "$file"
cat <<'SQL' >> "$file"
CREATE TABLE IF NOT EXISTS general.test (
id INT PRIMARY KEY,
foo INT
);
CREATE TABLE IF NOT EXISTS general.test2 (
id SERIAL PRIMARY KEY,
foo TEXT
);
SQL
and check that the migration is valid:
uvx padmy -v migrate verify --sql-dir /tmp/migrations --schemas general
Because we did not add anything to the down.sql file, the command returns an error.
Let's modify it to make the command pass:
DROP table general.test;
DROP table general.test2;
uvx padmy -vv migrate verify --sql-dir /tmp/migrations
We are all good !
Optional: You can also verify that the order of the migration is correct by running:
uvx padmy -vv migrate verify-files --sql-dir /tmp/migrations --no-raise
4. Comparing databases schemas
You can compare two databases by running:
uvx padmy -vv schema-diff --db tracktor --schemas schema_1,schema_2
If differences are found, the command will output the differences between the two databases.
Known limitations
Exact sample size
Sometimes, we cannot guaranty that the sampled table will have the exact expected size.
For instance let's say we want 10% of table1 and 10% of table2, given the following table definitions:
CREATE TABLE table1
(
id SERIAL PRIMARY KEY
);
CREATE TABLE table2
(
id SERIAL PRIMARY KEY,
table1_id INT NOT NULL REFERENCES table1
);
INSERT INTO table1(id)
VALUES (1);
INSERT INTO table2(table1_id)
SELECT 1
FROM generate_series(1, 10);
In this case, it's not possible to have less that 100% of table 1 since it has only 1 key on
which depend all the table1_id rows of table2.
Cyclic foreign keys
Cyclic foreign keys (table with a FK on another table that reference the previous one) are not supported. Here is an example.
CREATE TABLE table1
(
id SERIAL PRIMARY KEY,
table2_id INT NOT NULL
);
CREATE TABLE table2
(
id SERIAL PRIMARY KEY,
table1_id INT NOT NULL
);
ALTER TABLE table1
ADD CONSTRAINT table1_table2_id_fk
FOREIGN KEY (table2_id) REFERENCES table2;
ALTER TABLE table2
ADD CONSTRAINT table2_table1_id_fk
FOREIGN KEY (table1_id) REFERENCES table1;
You can display cycling dependencies in a database by running:
uvx padmy -vv analyze --db test --schemas test --show-graph
(Note:: you'll need to have installed the network extra )
Self referencing foreign keys
Foreign keys referencing another column in the same table are ignored.
CREATE TABLE table1
(
id SERIAL PRIMARY KEY,
parent_id INT REFERENCES table1
);
Annexes
Showing Network in Jupyter
You can display the network visualization in Jupyter.
Start by launching a JupyterLab session with:
uv run --group notebook jupyter lab
Then create a new notebook and run the following code:
from dash import Dash
from padmy.sampling import network, viz, sampling
from padmy.utils import init_connection
import asyncpg
PG_URL = 'postgresql://postgres:postgres@localhost:5432/test'
app = Dash(__name__)
db = sampling.Database(name='test')
async with asyncpg.create_pool(PG_URL, init=init_connection) as pool:
await db.explore(pool, ['public'])
g = network.convert_db(db)
app.layout = viz.get_layout(g,
style={'width': '100%', 'height': '800px'},
layout='klay')
app.run_server(mode='jupyterlab') # or mode='inline'
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file padmy-0.25.2.tar.gz.
File metadata
- Download URL: padmy-0.25.2.tar.gz
- Upload date:
- Size: 34.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.15 {"installer":{"name":"uv","version":"0.9.15","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7f694fa3bf7f3abf2de913291831522d31898e2840dd2a4ae8f883aefb562138
|
|
| MD5 |
5c4d62dd5b7c23b1ea7e4877453c69dd
|
|
| BLAKE2b-256 |
21aa8810ea3bb7c06cfa6a1d23988490a9f339279d6a3f17030814e124cced02
|
File details
Details for the file padmy-0.25.2-py3-none-any.whl.
File metadata
- Download URL: padmy-0.25.2-py3-none-any.whl
- Upload date:
- Size: 42.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.15 {"installer":{"name":"uv","version":"0.9.15","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
061e5e0e38e3c4be1979622fade735f0e2da805c775735d2713e85f80d3c06e4
|
|
| MD5 |
ca805765d5b5a2362cacfac392f41cde
|
|
| BLAKE2b-256 |
0ea99a3d5c4d57df700f5ab4ce1938e682e56bcd5dc2c6285598e29ce2b8e57a
|