Config-driven database subsetting tool that creates representative samples while preserving referential integrity. Modernized fork of Tonic's Condenser with performance and feature improvements.
Project description
Database Condenser
A config-driven database subsetting tool for PostgreSQL and MySQL, forked from Tonic's Condenser.
Some changes from the original Condenser:
- Concurrent thread pool usage to speed up subsetting
- Optional temp-table strategy for memory-efficient ID batching
- Optional Postgres COPY protocol for faster transfer
- Automatic sequence reset after subsetting
- Built on psycopg3 and managed with astral's uv
Subsetting data is the process of taking a representative sample of your data in a manner that preserves the integrity of your database, e.g., give me 5% of my users. If you do this naively, e.g., just grab 5% of all the tables in your database, most likely, your database will break foreign key constraints. At best, you’ll end up with a statistically non-representative data sample.
One common use-case is to scale down a production database to a more reasonable size so that it can be used in staging, test, and development environments. This can be done to save costs and, when used in tandem with PII removal, can be quite powerful as a productivity enhancer. Another example is copying specific rows from one database and placing them into another while maintaining referential integrity.
You can find more about the original Condenser details here and here.
Installation
Six steps to set up from source, assuming Python 3.10+:
-
Install astral-uv
-
Install Postgres and/or MySQL database tools. For Postgres we need
pg_dumpandpsqltools; they need to be on your$PATHor point to them with$POSTGRES_PATH. For MySQL we needmysqldumpandmysql, they can be on your$PATHor point to them with$MYSQL_PATH. -
Clone this project locally.
-
Install the project with
uv sync --frozen -
Setup your configuration and save it in
config.json. The providedconfig.json.examplehas the skeleton of what you need to provide: source and destination database connection details, as well as subsetting goals ininitial_targets. Here's an example that will collect 10% of a table namedpublic.target_table."initial_targets": [ { "table": "public.target_table", "percent": 10 } ]There may be more required configuration depending on your database, but simple databases should be easy. See the CONFIG.md for more details, and
config.json.example_allfor all of the options in a single config file. -
Run!
$ uv run subset
Or, from PyPI:
pip install db-condenser
pipx install db-condenser
Running
Almost all the configuration is in the config.json file, so running it is as simple as
uv run subset
Three command-line arguments are supported:
-v: Verbose output. Useful for performance debugging. Lists almost every
query made, and its speed.
--no-constraints: For Postgres this will not add constraints found in the source
database to the destination database. This option has no effect for MySQL.
-y: Skip confirmation on subsetting to destination host if it isn't localhost
or 127.0.0.1
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file db_condenser-1.0.0.tar.gz.
File metadata
- Download URL: db_condenser-1.0.0.tar.gz
- Upload date:
- Size: 20.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.11.12 {"installer":{"name":"uv","version":"0.11.12","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c81dfc085985c4b2ebd491dc87ce3cbea05b7b38105e74b817760e7d024bdbaa
|
|
| MD5 |
772376c3e6284129ebdea4ea1d4a3940
|
|
| BLAKE2b-256 |
ecf82c5a6213a641ac4a185d240fb423581248138f25ccd4ce67f1417696d4dd
|
File details
Details for the file db_condenser-1.0.0-py3-none-any.whl.
File metadata
- Download URL: db_condenser-1.0.0-py3-none-any.whl
- Upload date:
- Size: 28.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.11.12 {"installer":{"name":"uv","version":"0.11.12","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
13845efb05b4d51d91bcb4a05f3c2498f878210c3eaafe3588f3129cc1e80b4f
|
|
| MD5 |
6e6abb5dfedd93f65070cd1a5b5c9cff
|
|
| BLAKE2b-256 |
42bc69b2fdf524489c7cc7762187f0329ff9d5d5942f294baacf36149862279b
|