Skip to main content

A Python library to dump, load, and mask data between databases in Alembic-managed environments, with SSH tunnel support.

Project description

Alembic Dump

A Python library to dump, load, and mask data between databases (e.g., PostgreSQL, MySQL) in environments managed by Alembic, with SSH tunnel support.

This tool is designed to help developers synchronize database schemas using Alembic revisions and then transfer data, optionally applying masking rules to sensitive information. It's particularly useful for creating staging or development environments from production data, or for migrating data between different database instances while maintaining schema integrity.

Key Features

  • Schema Synchronization: Ensures target database schema matches the source database schema based on Alembic revisions before data transfer.
  • Data Dump & Load: Efficiently transfers data table by table, respecting foreign key constraints by processing tables in a topologically sorted order.
  • Data Masking: Supports various strategies (e.g., hashing, partial masking, using Faker) to anonymize sensitive data during the transfer. Masking rules can be configured per table and per column.
  • SSH Tunneling: Built-in support for connecting to databases via an SSH bastion host.
  • Configuration: Uses Pydantic for clear and validated configuration of database connections, SSH tunnels, and masking rules.
  • Chunking: Processes data in chunks to manage memory usage effectively, especially for large tables.
  • Flexible Table Selection: Allows specifying tables to include or exclude from the dump/load process.
  • External Secret Management: Integration with AWS Secrets Manager and HashiCorp Vault for managing sensitive information.

Installation

You can install alembic-dump using pip:

# Basic installation
pip install alembic-dump

# With AWS Secrets Manager support
pip install alembic-dump[aws]

# With HashiCorp Vault support
pip install alembic-dump[vault]

# With both secret management systems
pip install alembic-dump[aws,vault]

# For development (includes all optional dependencies)
pip install alembic-dump[dev]

Quick Start

from alembic_dump.core import dump_and_load
from alembic_dump.config import AppSettings

# Basic configuration
settings = AppSettings(
    source_db={
        "driver": "postgresql",
        "host": "source.example.com",
        "port": 5432,
        "username": "source_user",
        "password": "source_pass",
        "database": "source_db",
    },
    target_db={
        "driver": "postgresql",
        "host": "target.example.com",
        "port": 5432,
        "username": "target_user",
        "password": "target_pass",
        "database": "target_db",
    }
)

# Run the dump and load process
dump_and_load(settings, alembic_dir="path/to/alembic")

Configuration

Configuration can be provided through environment variables, a .env file, or directly in code using the AppSettings class.

Using Secret Management

AWS Secrets Manager

from alembic_dump.config import AppSettings

settings = AppSettings(
    source_db={
        "driver": "postgresql",
        "database": "source_db",
        "secret_provider_config": {
            "provider_type": "aws_secrets_manager",
            "secret_id": "arn:aws:secretsmanager:region:account:secret:db-credentials",
            "region_name": "us-west-2",
            # Optional: role_arn for cross-account access
            # "role_arn": "arn:aws:iam::account:role/role-name",
            # Optional: profile_name for local AWS credentials
            # "profile_name": "my-profile"
        },
        "secret_key_mapping": {
            "host": "db_host",
            "port": "db_port",
            "username": "db_user",
            "password": "db_pass"
        }
    }
)

HashiCorp Vault

from alembic_dump.config import AppSettings

settings = AppSettings(
    source_db={
        "driver": "postgresql",
        "database": "source_db",
        "secret_provider_config": {
            "provider_type": "hashicorp_vault",
            "vault_addr": "https://vault.example.com:8200",
            "secret_path": "secret/data/db-credentials",
            # Either vault_token or role_id/secret_id must be provided
            "vault_token": "s.token",
            # "role_id": "role-id",
            # "secret_id": "secret-id"
        },
        "secret_key_mapping": {
            "host": "db_host",
            "port": "db_port",
            "username": "db_user",
            "password": "db_pass"
        }
    }
)

Environment Variables

All configuration can also be provided through environment variables:

# Basic configuration
ALEMBIC_DUMP_SOURCE_DB__DRIVER=postgresql
ALEMBIC_DUMP_SOURCE_DB__HOST=source.example.com
ALEMBIC_DUMP_SOURCE_DB__PORT=5432
ALEMBIC_DUMP_SOURCE_DB__USERNAME=source_user
ALEMBIC_DUMP_SOURCE_DB__PASSWORD=source_pass
ALEMBIC_DUMP_SOURCE_DB__DATABASE=source_db

# AWS Secrets Manager configuration
ALEMBIC_DUMP_SOURCE_DB__SECRET_PROVIDER_CONFIG__PROVIDER_TYPE=aws_secrets_manager
ALEMBIC_DUMP_SOURCE_DB__SECRET_PROVIDER_CONFIG__SECRET_ID=arn:aws:secretsmanager:region:account:secret:db-credentials
ALEMBIC_DUMP_SOURCE_DB__SECRET_PROVIDER_CONFIG__REGION_NAME=us-west-2
ALEMBIC_DUMP_SOURCE_DB__SECRET_KEY_MAPPING__HOST=db_host
ALEMBIC_DUMP_SOURCE_DB__SECRET_KEY_MAPPING__PORT=db_port
ALEMBIC_DUMP_SOURCE_DB__SECRET_KEY_MAPPING__USERNAME=db_user
ALEMBIC_DUMP_SOURCE_DB__SECRET_KEY_MAPPING__PASSWORD=db_pass

# HashiCorp Vault configuration
ALEMBIC_DUMP_SOURCE_DB__SECRET_PROVIDER_CONFIG__PROVIDER_TYPE=hashicorp_vault
ALEMBIC_DUMP_SOURCE_DB__SECRET_PROVIDER_CONFIG__VAULT_ADDR=https://vault.example.com:8200
ALEMBIC_DUMP_SOURCE_DB__SECRET_PROVIDER_CONFIG__SECRET_PATH=secret/data/db-credentials
ALEMBIC_DUMP_SOURCE_DB__SECRET_PROVIDER_CONFIG__VAULT_TOKEN=s.token

Development

  1. Clone the repository:

    git clone https://github.com/jaeyoung0509/alembic-dump.git
    cd alembic-dump
    
  2. Create and activate a virtual environment:

    python -m venv .venv
    source .venv/bin/activate  # On Windows: .venv\Scripts\activate
    
  3. Install development dependencies:

    pip install -e ".[dev]"
    
  4. Run tests:

    pytest
    

Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

alembic_dump-0.1.7.tar.gz (121.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

alembic_dump-0.1.7-py3-none-any.whl (16.8 kB view details)

Uploaded Python 3

File details

Details for the file alembic_dump-0.1.7.tar.gz.

File metadata

  • Download URL: alembic_dump-0.1.7.tar.gz
  • Upload date:
  • Size: 121.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.10

File hashes

Hashes for alembic_dump-0.1.7.tar.gz
Algorithm Hash digest
SHA256 c6b063049303bd11aa6896d936f4d3993150ffe34610a5521c6f84acc80070b7
MD5 26625ca7456b700bc1f923c90430e20a
BLAKE2b-256 ddca90dfd8288eac6e3add8ac79c867fee0596ee5106b4e65f2b1615be20e149

See more details on using hashes here.

File details

Details for the file alembic_dump-0.1.7-py3-none-any.whl.

File metadata

  • Download URL: alembic_dump-0.1.7-py3-none-any.whl
  • Upload date:
  • Size: 16.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.10

File hashes

Hashes for alembic_dump-0.1.7-py3-none-any.whl
Algorithm Hash digest
SHA256 b1b0bf958ff9afb9f4fd1f735841a75f1798ba8799c873e66c0493cd066fb9f2
MD5 48c74e3fd2211bac83978b768bae0791
BLAKE2b-256 90bedc3a5b977e646a19c2a81e205a25ecf7ffcedbdad6e5622161cf1b0e5a4c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page