Skip to main content

AWS CDK construct library for Aurora backup and restore using ECS on a schedule, storing backups in S3.

Project description

cdk-library-aurora-native-backup

A CDK construct library that creates and manages Docker images for Aurora PostgreSQL native backups using pg_dump. The resulting images are designed for use with Amazon ECS Fargate for scalable, serverless backup operations.

Features

  • Multi-Database Support: Back up multiple databases from the same Aurora cluster in a single service
  • Pre-built Docker Image: Amazon Linux 2023 base with PostgreSQL 17 client tools and AWS CLI v2
  • ECR Repository Management: Automatically creates and manages ECR repositories with security best practices
  • Complete Backup Service: Ready-to-use ECS Fargate service for scheduled Aurora backups
  • EFS and S3 Support: Built-in support for backing up to EFS with S3 sync
  • Comprehensive Backup: Uses pg_dump directory format for efficient storage and simplified restore
  • Production Ready: Includes proper error handling, logging, and cleanup mechanisms
  • Secure Authentication: Uses AWS Secrets Manager for database password management

API Doc

See API

Interface Structure

The library provides two main constructs, each with its own configuration interface:

  • AuroraBackupRepository (AuroraBackupRepositoryProps): Manages the ECR repository and Docker image for backups.

  • AuroraNativeBackupService (AuroraNativeBackupServiceProps): Manages the backup service infrastructure (VPC, Aurora cluster, S3 bucket, compute resources, etc.), and uses:

    • AuroraBackupConnectionProps: For database connection settings (username, database names array, password secret).

This separation allows for cleaner organization of image/repository management, connection credentials, and infrastructure settings.

Multi-Database Support

The library supports backing up multiple databases from the same Aurora PostgreSQL cluster in a single backup service. Simply provide an array of database names in the databaseNames property (defaults to ['postgres'] if not specified). Each database will be backed up separately and stored in its own S3 folder structure.

Database User Setup

Create a dedicated database user with read-only backup permissions on ALL databases to be backed up.

For PostgreSQL 14+ (recommended), use the built-in pg_read_all_data role for comprehensive read access:

-- Connect to each database and grant permissions
\c your_database_1;
GRANT CONNECT ON DATABASE your_database_1 TO backup_user;
GRANT pg_read_all_data TO backup_user;

-- Repeat for each additional database
\c your_database_2;
GRANT CONNECT ON DATABASE your_database_2 TO backup_user;
GRANT pg_read_all_data TO backup_user;

The pg_read_all_data role automatically provides:

  • SELECT on all tables and views
  • USAGE on all schemas
  • SELECT and USAGE on all sequences
  • Access to future objects without requiring additional grants

Note: This library requires PostgreSQL 14 or newer for the pg_read_all_data role.

Shortcomings

  • The backup service requires password-based authentication (no IAM database authentication for now)
  • The backup container runs as a scheduled task, not continuously, so it cannot capture incremental changes
  • Custom backup scripts are not currently supported, only the built-in pg_dump functionality
  • When backing up multiple databases, if one database backup fails, the task continues with the remaining databases but the overall task does not fail - individual database backup failures must be monitored through CloudWatch logs

Examples

Prerequisites

To use this construct, you must have:

  • An AWS CDK stack with a defined environment (account and region)
  • An existing VPC for the backup service
  • An existing Aurora PostgreSQL database cluster
  • An AWS Secrets Manager secret containing database credentials (recommended)
  • A database user with the required backup permissions (see above)

Complete Backup Service (Recommended)

For most use cases, use the AuroraNativeBackupService which provides a complete, ready-to-use backup solution:

TypeScript

import { Stack, StackProps, Duration, aws_ec2 as ec2, aws_rds as rds, aws_scheduler as scheduler, aws_secretsmanager as secretsmanager } from 'aws-cdk-lib';
import { Construct } from 'constructs';
import { AuroraNativeBackupService, AuroraBackupRepository } from '@renovosolutions/cdk-library-aurora-native-backup';

export class BackupServiceStack extends Stack {
  constructor(scope: Construct, id: string, props: StackProps) {
    super(scope, id, props);

    // Your existing Aurora PostgreSQL database cluster and VPC
    const vpc = ec2.Vpc.fromLookup(this, 'Vpc', { isDefault: true });
    const dbCluster = rds.DatabaseCluster.fromDatabaseClusterAttributes(this, 'DbCluster', {
      clusterIdentifier: 'my-production-cluster',
      clusterEndpointAddress: 'cluster.xyz.region.rds.amazonaws.com',
      port: 5432,
    });

    // First create the backup repository
    const backupRepository = new AuroraBackupRepository(this, 'BackupRepository', {
      repositoryName: 'aurora-postgres-backup',
    });

    // Secret containing the backup user's password
    const backupUserSecret = secretsmanager.Secret.fromSecretAttributes(this, 'BackupUserSecret', {
      secretArn: 'arn:aws:secretsmanager:region:account:secret:backup-user-password-abc123',
    });

    // Create the complete backup service
    const backupService = new AuroraNativeBackupService(this, 'BackupService', {
      cluster: dbCluster,
      vpc,
      backupBucketName: 'my-aurora-production-backups',
      ecrRepository: backupRepository.repository,
      connection: {
        username: 'backup_user',
        databaseNames: ['production', 'analytics', 'reporting'],
        passwordSecret: backupUserSecret,
      },
      retentionDays: 30,
      backupSchedule: scheduler.ScheduleExpression.cron({ minute: '0', hour: '2' }), // Daily at 2 AM UTC
      cpu: 1024, // Override default of 256
      memoryLimitMiB: 2048, // Override default of 512
    });
  }
}

Python

from aws_cdk import (
  Stack,
  Duration,
  aws_ec2 as ec2,
  aws_rds as rds,
  aws_scheduler as scheduler,
  aws_secretsmanager as secretsmanager
)
from constructs import Construct
from cdk_library_aurora_native_backup import AuroraNativeBackupService, AuroraBackupRepository

class BackupServiceStack(Stack):
  def __init__(self, scope: Construct, id: str, **kwargs):
    super().__init__(scope, id, **kwargs)

    # Your existing Aurora PostgreSQL database cluster and VPC
    vpc = ec2.Vpc.from_lookup(self, "Vpc", is_default=True)
    db_cluster = rds.DatabaseCluster.from_database_cluster_attributes(self, "DbCluster",
      cluster_identifier="my-production-cluster",
      cluster_endpoint_address="cluster.xyz.region.rds.amazonaws.com",
      port=5432
    )

    # First create the backup repository
    backup_repository = AuroraBackupRepository(self, "BackupRepository",
      repository_name="aurora-postgres-backup"
    )

    # Secret containing the backup user's password
    backup_user_secret = secretsmanager.Secret.from_secret_attributes(self, "BackupUserSecret",
      secret_arn="arn:aws:secretsmanager:region:account:secret:backup-user-password-abc123"
    )

    # Create the complete backup service
    backup_service = AuroraNativeBackupService(self, "BackupService",
      cluster=db_cluster,
      vpc=vpc,
      backup_bucket_name="my-aurora-production-backups",
      ecr_repository=backup_repository.repository,
      connection={
        "username": "backup_user",
        "database_names": ["production", "analytics", "reporting"],
        "password_secret": backup_user_secret
      },
      retention_days=30,
      backup_schedule=scheduler.ScheduleExpression.cron(minute='0', hour='2'),  # Daily at 2 AM UTC
      cpu=1024,  # Override default of 256
      memory_limit_mi_b=2048  # Override default of 512
    )

Environment Variables

All environment variables used by the backup container are set automatically by the constructs. You do not need to set them manually.

Environment Variable Description CDK Prop / Source
DB_HOST Aurora PostgreSQL database cluster endpoint cluster.clusterEndpoint.hostname
DB_NAMES Array of database names to backup connection.databaseNames
DB_USER Database username connection.username
DB_PASSWORD Database password connection.passwordSecret
AWS_REGION AWS region Stack.region
CLUSTER_IDENTIFIER Cluster ID used as S3 path prefix (backups/{CLUSTER_IDENTIFIER}/) cluster.clusterIdentifier
DB_PORT Database port (default: 5432) cluster.clusterEndpoint.port
BACKUP_ROOT Backup directory (default: /mnt/aurora-backups) (internal default)
S3_BUCKET S3 bucket for backup sync backupBucketName
S3_PREFIX S3 prefix (default: backups) (internal default)

Backup Process

  1. Validation: Checks AWS credentials and creates backup directories

  2. Database Backup: For each database in the DB_NAMES array:

    • Uses pg_dump --format=directory with gzip compression (level 9) for each data file
    • Creates separate backup directory per database with date stamp
    • If one database backup fails, continues with remaining databases
  3. Verification: Validates each backup contains toc.dat file

  4. S3 Sync: Syncs each database backup to S3 bucket under separate database folders

  5. Cleanup: Removes local backups after successful S3 sync

Security Considerations

  • ECR repositories created with image scanning enabled
  • EFS encryption in transit supported
  • IAM permissions follow principle of least privilege
  • Use AWS Secrets Manager for database passwords in production
  • Consider VPC endpoints for S3 to avoid internet traffic

Backup Storage Structure

Local EFS structure (per database):

/mnt/aurora-backups/
├── production/
│   └── YYYY-MM-DD/
│       ├── toc.dat                # PostgreSQL table of contents
│       ├── ####.dat.gz            # Compressed table data files
│       └── ####.dat.gz            # Additional data files
├── analytics/
│   └── YYYY-MM-DD/
│       ├── toc.dat
│       └── ####.dat.gz
└── reporting/
    └── YYYY-MM-DD/
        ├── toc.dat
        └── ####.dat.gz

S3 structure:

s3://my-backup-bucket/
└── backups/
    └── {CLUSTER_IDENTIFIER}/
        ├── production/
        │   └── YYYY-MM-DD/
        │       ├── toc.dat
        │       └── ####.dat.gz
        ├── analytics/
        │   └── YYYY-MM-DD/
        │       ├── toc.dat
        │       └── ####.dat.gz
        └── reporting/
            └── YYYY-MM-DD/
                ├── toc.dat
                └── ####.dat.gz

Restoration

Interactive Restore CLI (Recommended)

This library includes an interactive TypeScript CLI that simplifies the restore process with auto-discovery and guided prompts:

npx ts-node restore_script/aurora-restore-cli.ts

Features:

  • Auto-discovery: Automatically finds S3 backup buckets using the aurora_native_backup_bucket=true tag
  • Interactive selection: Guided prompts for cluster, database, backup date, and tables
  • Table-level restore: Select specific tables or restore entire database
  • Optimized downloads: Only downloads required backup files
  • Ready-to-run commands: Generates and optionally executes pg_restore commands

Prerequisites:

  • Node.js and TypeScript installed

  • AWS credentials configured (via AWS CLI, environment variables, or IAM role)

  • pg_restore command available in your PATH

  • Network access to target PostgreSQL database

  • Database user with restore permissions on target database:

    • CREATE privilege (for creating tables, indexes, constraints)
    • INSERT privilege (for loading data)
    • USAGE and CREATE on schemas
    • For full database restore: CREATEDB privilege or superuser role

Setup and Execution:

First, install dependencies:

cd restore_script
yarn install

Then run the interactive CLI:

npx ts-node aurora-restore-cli.ts

The CLI will guide you through selecting your backup source, target database, and specific tables to restore.

Workflow:

  1. S3 Configuration: Auto-discovers backup bucket or prompts for manual entry
  2. Source Selection: Choose cluster, database, and backup date
  3. Table Selection: Select specific tables or full database restore
  4. Target Configuration: Enter target database connection details
  5. Execution: Downloads backup files and generates restore command

Manual Restoration

For advanced users or automation, backups are stored in S3 under organized paths:

s3://my-backup-bucket/backups/{CLUSTER_IDENTIFIER}/{DATABASE_NAME}/YYYY-MM-DD/

Download backup files:

aws s3 cp --recursive s3://my-backup-bucket/backups/{CLUSTER_IDENTIFIER}/production/YYYY-MM-DD/ /path/to/backup/directory/

Restore commands:

Full database restore:

pg_restore -h target-host -U username -d target_db -v -C /path/to/backup/directory/

List backup contents:

pg_restore --list /path/to/backup/directory/

Selective table restore:

pg_restore -h target-host -U username -d target_db -v -t table_name /path/to/backup/directory/

Contributing

Contributions are welcome! Please follow these guidelines to help us maintain and improve the project:

Code Structure and Interfaces

  • The main user-facing interfaces are:

    • AuroraBackupRepositoryProps in src/aurora-backup-repository.ts
    • AuroraNativeBackupServiceProps and AuroraBackupConnectionProps in src/aurora-native-backup-service.ts
  • All constructs and their configuration interfaces are defined in the src/ directory.

Code Generation and Project Tasks

  • This project uses projen for project management and code generation.

  • If you make changes to the project configuration (.projenrc.ts), run:

    npx projen
    

    This will regenerate all managed files, including package.json and other configuration files.

Building and Testing

  • To build the project and run all tests, use:

    yarn build
    

    This will compile the code, run unit tests, and ensure everything is up to date.

License

This project is licensed under the Apache License, Version 2.0 - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file renovosolutions_aws_cdk_aurora_native_backup-0.1.1.tar.gz.

File metadata

File hashes

Hashes for renovosolutions_aws_cdk_aurora_native_backup-0.1.1.tar.gz
Algorithm Hash digest
SHA256 7569c68ba856341d20e7ccfbeb76c9944ef8f29ea3243641c444b52960ef18b8
MD5 abe5738d8f83ef1f597942621bbcc997
BLAKE2b-256 d39ec698a862b7579de5b44340b0f73112aca04601128ffb0672f40a809f1319

See more details on using hashes here.

File details

Details for the file renovosolutions_aws_cdk_aurora_native_backup-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for renovosolutions_aws_cdk_aurora_native_backup-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 2fba4afba339c99a510d342833be6b4e585d2a0acbfbfa9e14e7b6bccd191c64
MD5 d7f50485843ebb36eedb4b323c571649
BLAKE2b-256 90ab3806cba32b73fb285f3e1131d8d8cec2493b43233bf123979121e57e3c78

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page