Skip to main content

An RDS database factory for rapidly creating database copies

Project description

Background

A "database artifact" is a bundle of metadata and code/automation for creating or recreating another database.

In the case of creating a database, it includes: * specifications for the location of the new database - networking, etc.

  • specifications for the hardware of the cluster instances - cpu, memory etc
  • specifications for the maintenance of the cluster - backup windows, etc
  • the schema for a relational database

In the case of re-creating a database, it includes:

  • a locator for the source database
  • specifications for the location of the new database - networking, etc.
  • specifications for the hardware of the cluster instances - cpu, memory etc
  • specifications for the maintenance of the cluster - backup windows, etc

In both cases, a new cluster is created and automation either does the copy or the create.

In the case of Aurora RDS - the copy can be a "copy-on-write" which means even a massive production database can be "copied" in a matter of minutes. The commonality is shared between the source and target, while deltas to the target are recorded separately.

The "database artifact" produced by this module is a Docker image. The image contains all the necessary metadata and automation to do the necessary. A Docker image was used because the automation (originally) contains a fairly diverse mix of software components.

Installing the Database Artifact Factory

Prerequisites

You will need a Docker engine installed locally. For more information on how to do this, please see: https://docs.docker.com/engine/install/

You will need a python 3.7 or greater environment with pip installed. For more information on installing python, please see: https://www.python.org/about/gettingstarted/

Install via pip

pip install db-artifact-factory

Build the Database Artifact

Configuration

In order to build the artifact, a configuration INI file needs to be created with all the necessary information for the source and target databases.

An example INI file follows:

[source_db]
source_db_cluster_id = dbartifact-prod-create-cluster-1k326eivoq3i0

[target_db]
subnet_ids = source
instance_type = db.r4.large

The singular published artifact contains all the information to do both the create or the clone.
This makes for a mix of arguments in the source/target sections - some values are only necessary for create, and some are only necessary for clone.

  • source_db_cluster_id This is the Aurora RDS database cluster id to clone. Even when creating a fresh database cluster, there should be a "source" database that it is re-creating - just without data.

  • subnet_ids This can be a comma-delimited list of two or more subnet ids to create the target db in, e.g. subnet-1234, subnet-456 or it can be "source" and discover the subnet-ids from the source db cluster.

    Make sure these subnet_ids are in different AZ.

  • instance_type The instance type of the DB instance created in the cluster, e.g. db.r4.xlarge

  • database_name In the create case, the name of a database to create and apply the liquibase changelog to. This can be dontcare if you only want the artifact to be able to clone.

Building

To invoke the build process, first make sure there are ambient AWS credentials.

Then run db-artifact-builder against the created INI file:

db-artifact-builder artifact-config.ini

If the build process is successful, the build process will appear in stdout and a Docker image will be created with the tag db-artifact

Publishing

Once the db-artifact is in the local registry, from here it can be "published" to DockerHub or the internal registry of choice (e.g. GitHub, Artifactory, etc.). It is up to the operator to publish the image (and this tool doesn't do anything special to support that).

Converging a Database Artifact

If running locally after a "build", then "convergence" (i.e creating or cloning) is just a matter of running the Docker image with a few execution settings. If the Docker image has been published in a registry, then a docker pull will be necessary to make the image accessible.

Ultimately docker run is being executed but there is a wrapper CLI script db-artifact-converge installed as part of database-artifact-factory that provides a nicer user experience (no fussing with the filesystem mounts and environment variables)

To clarify - installing database-artifact-factory contains both the tools to build and to converge, but an operator v. an end-user might use only one or the other.

Clone an Existing Database

The following is the minimal command to clone an existing database. It presumes a few things:

  • that AWS credentials are stored in ~/.aws/credentials
  • AWS profile is "default".
  • the Docker image db-artifact:latest contains the artifact to converge

BEWARE - you must specify the password, but the username follows from the source cluster

db-artifact-converge --db_password thisisafakepassw0rd

If any of those defaults are inappropriate, please invoke the db-artifact-converge -h to see the other options.

Future Direction

  • Parameterize support for mysql v postgresql
  • Add support for other operational configuration items for the RDS cluster
  • better progress reporting and troubleshooting mechanisms
  • control over stack names

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

File details

Details for the file database_artifact_factory-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: database_artifact_factory-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 13.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.0.0 requests-toolbelt/0.9.1 tqdm/4.46.1 CPython/3.7.7

File hashes

Hashes for database_artifact_factory-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0200c5a826c438f777f76226fe75fac15de71e3b28a23e6fbffc2a4b142736b5
MD5 bc2282b36473cf448de6ea4514971d17
BLAKE2b-256 9b95f861aaf8731fb60b49cdc335bcb655d496460c9ee965092b43ec14b6f84e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page