Database incremental exports, transfers, imports, ETL, creation / management
Project description
A Python/CLI tool for:
- Exporting database tables to compressed CSV files.
- Transferring tables from from one database server to another.
- Loading database data (from both files and Python)
- Creating/Managing Postgresql/TimescaleDB tables, views, materialized views, functions, procedures, continuous aggregates, scheduled tasks.
- Checking for mismatched attributes between SQLAlchemy tables/models and actual tables in a database.
Currently only Postgresql and Postgresql-based databases (e.g. TimescaleDB) are supported.
Install
pip install dbflows
Optional extras: pip install dbgrees[all]
or select specific extras:
transfer
- enable functionality for transferring data from one database to another.drivers
- installpsycopg[c]
andasyncpg
database drivers (If the parent application installs it's own database driver, then you don't need this). Forpsycopg[c]
, make sure you havelibpq-dev
installed (sudo apt install libpq-dev
)
If using the export functionality (export database tables to compressed CSV files), then you will additionally need to have the psql
executable available.
To install psql
:
# enable PostgreSQL package repository
sudo sh -c 'echo "deb http://apt.postgresql.org/pub/repos/apt $(lsb_release -cs)-pgdg main" > /etc/apt/sources.list.d/pgdg.list'
wget -qO- https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo tee /etc/apt/trusted.gpg.d/pgdg.asc &>/dev/null
# replace `15` with the major version of your database
sudo apt update && sudo apt install -y postgresql-client-15
Export
Features:
- File splitting. Create separate export files based on a 'slice column' (an orderable column. e.g. datetime, integer, etc) and/or 'partition column' (a categorical column. e.g. name string).
- Incremental exports (export only data not yet exported). This works for both single file and multiple/split file output.
Examples
from dbflows import export_table
import sqlalchemy as sa
# the table to export data from
my_table = sa.Table(
"my_table",
sa.MetaData(schema="my_schema"),
sa.Column("inserted", sa.DateTime),
sa.Column("category", sa.String),
sa.Column("value", sa.Float),
)
# one or more save locations (2 in this case)
save_locs = ["s3://my-bucket/my_table_exports", "/path/to/local_dir/my_table_exports"]
# database URL
url = "postgres://user:password@hostname:port/database-name"
Export entire table to a single file.
export_table(
table=my_table,
engine=url, # or sa.engine
save_locs=save_locs
)
CLI equivalent:
db export table \
my_table.my_schema \
postgres://user:password@hostname:port/database-name` \
s3://my-bucket/my_table_exports \
/path/to/local_dir/my_table_exports
Export 500 MB CSVs, sorted and sliced on inserted
datetime column.
export_table(
table=my_table,
engine=url, # or sa.engine
save_locs=save_locs,
slice_column=my_table.c.inserted,
file_max_size="500 MB"
)
Create a CSV export for each unique category in table.
export_table(
table=my_table,
engine=url, # or sa.engine
save_locs=save_locs,
partition_column=my_table.c.category
)
CLI equivalent:
db export table \
my_table.my_schema \
postgres://user:password@hostname:port/database-name` \
# save to one or more locations (s3 paths or local)
s3://my-bucket/my_table_exports \
/path/to/local_dir/my_table_exports \
--partition-column category # or "-p category"
export 500 MB CSVs for each unique category, sorted and sliced on inserted
datetime column.
export_table(
table=my_table,
engine=url, # or sa.engine
save_locs=save_locs,
slice_column=my_table.c.inserted,
file_max_size="500 MB",
partition_column=my_table.c.category,
)
Loading/Importing
Loading from Python objects
Install a Postgresql driver in the parent application (e.g. psycopg[c]
or asyncpg
.
Create a Loader
instance for your table and use the load
method to load batches of rows.
Loading from CSV files
Use import_csvs to load CSV with parallel worker threads. This internally uses timescaledb-parallel-copy which can be installed with: go install github.com/timescale/timescaledb-parallel-copy/cmd/timescaledb-parallel-copy@latest
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.