Skip to main content

A Python library to dump, load, and mask data between databases in Alembic-managed environments, with SSH tunnel support.

Project description

Alembic Dump

A Python library to dump, load, and mask data between databases (e.g., PostgreSQL, MySQL) in environments managed by Alembic, with SSH tunnel support.

This tool is designed to help developers synchronize database schemas using Alembic revisions and then transfer data, optionally applying masking rules to sensitive information. It's particularly useful for creating staging or development environments from production data, or for migrating data between different database instances while maintaining schema integrity.

Key Features

  • Schema Synchronization: Ensures target database schema matches the source database schema based on Alembic revisions before data transfer.
  • Data Dump & Load: Efficiently transfers data table by table, respecting foreign key constraints by processing tables in a topologically sorted order.
  • Data Masking: Supports various strategies (e.g., hashing, partial masking, using Faker) to anonymize sensitive data during the transfer. Masking rules can be configured per table and per column.
  • SSH Tunneling: Built-in support for connecting to databases via an SSH bastion host.
  • Configuration: Uses Pydantic for clear and validated configuration of database connections, SSH tunnels, and masking rules.
  • Chunking: Processes data in chunks to manage memory usage effectively, especially for large tables.
  • Flexible Table Selection: Allows specifying tables to include or exclude from the dump/load process.

Installation

You can install alembic-dump using pip:

pip install alembic-dump

baisc usage

from alembic_dump.config import AppSettings, DBConfig, MaskingConfig, MaskingRule
from alembic_dump.core import dump_and_load

# 1. Define Source and Target Database Configurations
source_db_config = DBConfig(
    driver="postgresql",
    host="source.db.example.com",
    port=5432,
    username="user",
    password="password", # In real use, manage secrets carefully (e.g., env vars)
    database="sourcedb"
)

target_db_config = DBConfig(
    driver="postgresql",
    host="target.db.example.com",
    port=5432,
    username="user",
    password="password",
    database="targetdb"
)

# 2. (Optional) Define Masking Rules
masking_config = MaskingConfig(
    rules={
        "users": {
            "email": MaskingRule(strategy="hash"),
            "full_name": MaskingRule(strategy="faker", faker_provider="name")
        },
        "sensitive_logs": {
            "ip_address": MaskingRule(strategy="null")
        }
    }
)

# 3. (Optional) Define SSH Tunnel Configuration if needed
# ssh_config = SSHConfig(...)

# 4. Create AppSettings
app_settings = AppSettings(
    source_db=source_db_config,
    target_db=target_db_config,
    # ssh_tunnel=ssh_config, # Uncomment if using SSH tunnel
    masking=masking_config,
    chunk_size=1000,
    tables_to_exclude=["some_large_irrelevant_table"],
    # tables_to_include=["users", "orders"] # Only include these if specified
)

# 5. Specify the path to your Alembic migrations directory
# This directory should contain your alembic.ini and version scripts.
alembic_migrations_directory = "/path/to/your/alembic_migrations" 
# For testing, you might use a dedicated test Alembic environment.

# 6. Run the dump and load process
try:
    dump_and_load(settings=app_settings, alembic_dir=alembic_migrations_directory)
    print("Data dump and load completed successfully!")
except Exception as e:
    print(f"An error occurred: {e}")

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

alembic_dump-0.1.0.tar.gz (110.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

alembic_dump-0.1.0-py3-none-any.whl (12.5 kB view details)

Uploaded Python 3

File details

Details for the file alembic_dump-0.1.0.tar.gz.

File metadata

  • Download URL: alembic_dump-0.1.0.tar.gz
  • Upload date:
  • Size: 110.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.22

File hashes

Hashes for alembic_dump-0.1.0.tar.gz
Algorithm Hash digest
SHA256 4cebb584564c5bf5a0d9716339a094c70b78332e5a980a9c45cbc83c9f6ba45d
MD5 ced460726e6fd77c92e4458b32099a0f
BLAKE2b-256 d30f2eabcc535012b5d83d9fd105032edcf4c16ab965cdf3072243be048087ad

See more details on using hashes here.

File details

Details for the file alembic_dump-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: alembic_dump-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 12.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.22

File hashes

Hashes for alembic_dump-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1ef72edee5a713b71e9edd69f2b0616e05ad24fa42bf79d0f0acdba4a5836385
MD5 4785f56e081da47c34ead31865cdf46e
BLAKE2b-256 0c1d53f7e527563ade8f98473b4ad1ed512da0a0e312ef09191e1e6139fcdd71

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page