A CDK construct for implementing multi-AZ observability to detect single AZ impairments

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

cdklabs-automation

These details have not been verified by PyPI

Development Status
- 4 - Beta
Intended Audience
- Developers
License
- OSI Approved
Operating System
- OS Independent
Programming Language
Typing
- Typed

Project description

Build Workflow Release Workflow GitHub Release

multi-az-observability

This is a CDK construct for multi-AZ observability to help detect single-AZ impairments. This is currently an alpha version, but is being used in the AWS Advanced Multi-AZ Resilience Patterns workshop.

There is a lot of available information to think through and combine to provide signals about single-AZ impact. To simplify the setup and use reasonable defaults, this construct (available in TypeScript, Go, Python, .NET, and Java) sets up the necessary observability. To use the CDK construct, you first define your service like this:

from aws_cdk.aws_ec2 import SubnetSelection
from cdklabs.multi_az_observability import AddCanaryTestProps, NetworkConfigurationProps, MinimumUnhealthyTargets, OperationAvailabilityMetricDetailsProps, OperationLatencyMetricDetailsProps, OperationAvailabilityMetricDetailsProps, OperationLatencyMetricDetailsProps
service = Service(
    service_name="test",
    availability_zone_names=vpc.availability_zones,
    base_url="http://www.example.com",
    fault_count_threshold=25,
    period=Duration.seconds(60),
    load_balancer=load_balancer,
    target_groups=[target_group1, target_group2],
    default_availability_metric_details=ServiceAvailabilityMetricDetails(
        metric_namespace="front-end/metrics",
        success_metric_names=["Success"],
        fault_metric_names=["Fault", "Error"],
        alarm_statistic="Sum",
        unit=Unit.COUNT,
        period=Duration.seconds(60),
        evaluation_periods=5,
        datapoints_to_alarm=3,
        success_alarm_threshold=99.9,
        fault_alarm_threshold=0.1,
        graphed_fault_statistics=["Sum"],
        graphed_success_statistics=["Sum"]
    ),
    default_latency_metric_details=ServiceLatencyMetricDetails(
        metric_namespace="front-end/metrics",
        success_metric_names=["SuccessLatency"],
        fault_metric_names=["FaultLatency"],
        alarm_statistic="p99",
        unit=Unit.MILLISECONDS,
        period=Duration.seconds(60),
        evaluation_periods=5,
        datapoints_to_alarm=3,
        success_alarm_threshold=Duration.millis(150),
        graphed_fault_statistics=["p99"],
        graphed_success_statistics=["p50", "p99", "tm99"]
    ),
    default_contributor_insight_rule_details=ContributorInsightRuleDetails(
        success_latency_metric_json_path="$.SuccessLatency",
        fault_metric_json_path="$.Faults",
        operation_name_json_path="$.Operation",
        instance_id_json_path="$.InstanceId",
        availability_zone_id_json_path="$.AZ-ID",
        log_groups=[log_group]
    ),
    canary_test_props=AddCanaryTestProps(
        request_count=10,
        schedule="rate(1 minute)",
        load_balancer=load_balancer,
        network_configuration=NetworkConfigurationProps(
            vpc=vpc,
            subnet_selection=SubnetSelection(subnet_type=SubnetType.PRIVATE_ISOLATED)
        )
    ),
    minimum_unhealthy_targets=MinimumUnhealthyTargets(
        percentage=0.1
    )
)

ride_operation = {
    "operation_name": "ride",
    "service": service,
    "path": "/ride",
    "critical": True,
    "http_methods": ["GET"],
    "server_side_contributor_insight_rule_details": ContributorInsightRuleDetails(
        log_groups=[log_group],
        success_latency_metric_json_path="$.SuccessLatency",
        fault_metric_json_path="$.Faults",
        operation_name_json_path="$.Operation",
        instance_id_json_path="$.InstanceId",
        availability_zone_id_json_path="$.AZ-ID"
    ),
    "server_side_availability_metric_details": OperationAvailabilityMetricDetails(OperationAvailabilityMetricDetailsProps(
        operation_name="ride",
        metric_dimensions=MetricDimensions({"Operation": "ride"}, "AZ-ID", "Region")
    ), service.default_availability_metric_details),
    "server_side_latency_metric_details": OperationLatencyMetricDetails(OperationLatencyMetricDetailsProps(
        operation_name="ride",
        metric_dimensions=MetricDimensions({"Operation": "ride"}, "AZ-ID", "Region")
    ), service.default_latency_metric_details)
}

pay_operation = {
    "operation_name": "pay",
    "service": service,
    "path": "/pay",
    "critical": True,
    "http_methods": ["GET"],
    "server_side_contributor_insight_rule_details": ContributorInsightRuleDetails(
        log_groups=[log_group],
        success_latency_metric_json_path="$.SuccessLatency",
        fault_metric_json_path="$.Faults",
        operation_name_json_path="$.Operation",
        instance_id_json_path="$.InstanceId",
        availability_zone_id_json_path="$.AZ-ID"
    ),
    "server_side_availability_metric_details": OperationAvailabilityMetricDetails(OperationAvailabilityMetricDetailsProps(
        operation_name="pay",
        metric_dimensions=MetricDimensions({"Operation": "ride"}, "AZ-ID", "Region")
    ), service.default_availability_metric_details),
    "server_side_latency_metric_details": OperationLatencyMetricDetails(OperationLatencyMetricDetailsProps(
        operation_name="pay",
        metric_dimensions=MetricDimensions({"Operation": "ride"}, "AZ-ID", "Region")
    ), service.default_latency_metric_details)
}

service.add_operation(ride_operation)
service.add_operation(pay_operation)

Then you provide that service definition to the CDK construct.

InstrumentedServiceMultiAZObservability(stack, "MAZObservability",
    create_dashboards=True,
    service=service,
    interval=Duration.minutes(60)
)

You define some characteristics of the service, default values for metrics and alarms, and then add operations as well as any overrides for default values that you need. The construct can also automatically create synthetic canaries that test each operation with a very simple HTTP check, or you can configure your own synthetics and just tell the construct about the metric details and optionally log files. This creates metrics, alarms, and dashboards that can be used to detect single-AZ impact. You can access these alarms from the multiAvailabilityZoneObservability object and use them in your CDK project to start automation, send SNS notifications, or incorporate in your own dashboards.

If you don't have service specific logs and custom metrics with per-AZ dimensions, you can still use the construct to evaluate ALB and/or NAT Gateway metrics to find single AZ impairments.

from cdklabs.multi_az_observability import ApplicationLoadBalancerDetectionProps, AlbTargetGroupMap, NatGatewayDetectionProps
BasicServiceMultiAZObservability(stack, "MAZObservability",
    application_load_balancer_props=ApplicationLoadBalancerDetectionProps(
        alb_target_group_map=[AlbTargetGroupMap(
            application_load_balancer=ApplicationLoadBalancer(stack, "alb",
                vpc=vpc,
                cross_zone_enabled=True
            ),
            target_groups=[target_group1, target_group2
            ]
        )
        ],
        fault_count_percent_threshold=1,
        latency_statistic=Stats.percentile(99),
        latency_threshold=Duration.millis(200),
        latency_outlier_algorithm=ApplicationLoadBalancerLatencyOutlierAlgorithm.STATIC,
        latency_outlier_threshold=45
    ),
    nat_gateway_props=NatGatewayDetectionProps(
        nat_gateways={
            "us-east-1a": [nat_gateway1],
            "us-east-1b": [nat_gateway2],
            "us-east-1c": [nat_gateway3]
        },
        packet_loss_percent_threshold=0.01
    ),
    service_name="test",
    period=Duration.seconds(60),
    create_dashboard=True,
    evaluation_periods=5,
    datapoints_to_alarm=3
)

If you provide a load balancer, the construct assumes it is deployed in each AZ of the VPC the load balancer is associated with and will look for HTTP metrics using those AZs as dimensions.

Both options support running workloads on EC2, ECS, Lambda, and EKS.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

cdklabs-automation

These details have not been verified by PyPI

Development Status
- 4 - Beta
Intended Audience
- Developers
License
- OSI Approved
Operating System
- OS Independent
Programming Language
Typing
- Typed

Release history Release notifications | RSS feed

This version

0.0.1a67 pre-release

May 19, 2026

0.0.1a66 pre-release

Apr 21, 2026

0.0.1a65 pre-release

Apr 21, 2026

0.0.1a64 pre-release

Apr 3, 2026

0.0.1a63 pre-release

Mar 11, 2026

0.0.1a62 pre-release

Mar 11, 2026

0.0.1a61 pre-release

Mar 11, 2026

0.0.1a60 pre-release

Oct 7, 2025

0.0.1a59 pre-release

Oct 2, 2025

0.0.1a58 pre-release

Oct 1, 2025

0.0.1a57 pre-release

Oct 1, 2025

0.0.1a56 pre-release

Oct 1, 2025

0.0.1a55 pre-release

Sep 30, 2025

0.0.1a54 pre-release

Sep 29, 2025

0.0.1a53 pre-release

Sep 29, 2025

0.0.1a52 pre-release

Sep 29, 2025

0.0.1a51 pre-release

Sep 29, 2025

0.0.1a50 pre-release

Sep 29, 2025

0.0.1a49 pre-release

Sep 29, 2025

0.0.1a48 pre-release

Sep 26, 2025

0.0.1a47 pre-release

Apr 7, 2025

0.0.1a46 pre-release

Apr 4, 2025

0.0.1a45 pre-release

Apr 4, 2025

0.0.1a44 pre-release

Apr 4, 2025

0.0.1a43 pre-release

Apr 3, 2025

0.0.1a42 pre-release

Mar 27, 2025

0.0.1a41 pre-release

Mar 27, 2025

0.0.1a40 pre-release

Mar 26, 2025

0.0.1a39 pre-release

Mar 22, 2025

0.0.1a38 pre-release

Mar 22, 2025

0.0.1a37 pre-release

Mar 21, 2025

0.0.1a36 pre-release

Mar 21, 2025

0.0.1a35 pre-release

Mar 21, 2025

0.0.1a34 pre-release

Mar 21, 2025

0.0.1a33 pre-release

Mar 21, 2025

0.0.1a32 pre-release

Mar 21, 2025

0.0.1a31 pre-release

Mar 20, 2025

0.0.1a30 pre-release

Mar 20, 2025

0.0.1a29 pre-release

Mar 20, 2025

0.0.1a28 pre-release

Mar 20, 2025

0.0.1a27 pre-release

Mar 19, 2025

0.0.1a26 pre-release

Mar 19, 2025

0.0.1a25 pre-release

Mar 19, 2025

0.0.1a24 pre-release

Mar 19, 2025

0.0.1a23 pre-release

Mar 18, 2025

0.0.1a22 pre-release

Mar 18, 2025

0.0.1a21 pre-release

Mar 18, 2025

0.0.1a20 pre-release

Mar 18, 2025

0.0.1a19 pre-release

Mar 18, 2025

0.0.1a18 pre-release

Mar 17, 2025

0.0.1a17 pre-release

Mar 17, 2025

0.0.1a16 pre-release

Mar 17, 2025

0.0.1a15 pre-release

Mar 17, 2025

0.0.1a14 pre-release

Mar 15, 2025

0.0.1a13 pre-release

Mar 15, 2025

0.0.1a12 pre-release

Mar 15, 2025

0.0.1a11 pre-release

Mar 14, 2025

0.0.1a10 pre-release

Feb 17, 2025

0.0.1a9 pre-release

Feb 17, 2025

0.0.1a8 pre-release

Feb 17, 2025

0.0.1a7 pre-release

Feb 17, 2025

0.0.1a6 pre-release

Feb 16, 2025

0.0.1a5 pre-release

Feb 16, 2025

0.0.1a4 pre-release

Feb 15, 2025

0.0.1a3 pre-release

Feb 14, 2025

0.0.1a2 pre-release

Feb 14, 2025

0.0.1a1 pre-release

Feb 10, 2025

0.0.1a0 pre-release

Feb 10, 2025

0.0.0a0 pre-release

Dec 16, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cdklabs_multi_az_observability-0.0.1a67.tar.gz (21.5 MB view details)

Uploaded May 19, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

cdklabs_multi_az_observability-0.0.1a67-py3-none-any.whl (21.5 MB view details)

Uploaded May 19, 2026 Python 3

File details

Details for the file cdklabs_multi_az_observability-0.0.1a67.tar.gz.

File metadata

Download URL: cdklabs_multi_az_observability-0.0.1a67.tar.gz
Upload date: May 19, 2026
Size: 21.5 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.14.5

File hashes

Hashes for cdklabs_multi_az_observability-0.0.1a67.tar.gz
Algorithm	Hash digest
SHA256	`6253871df794b086b514bc4b93523e83a30eba223dcf94945a531f0183f06d7c`
MD5	`2716ec2862c5ed97c9de6c048fcb4b90`
BLAKE2b-256	`2b75f931f194766f4704e0803a98faec4896787a958b185fe77fe173e9025649`

See more details on using hashes here.

Provenance

The following attestation bundles were made for cdklabs_multi_az_observability-0.0.1a67.tar.gz:

Publisher: release.yml on cdklabs/cdk-multi-az-observability

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: cdklabs_multi_az_observability-0.0.1a67.tar.gz
- Subject digest: 6253871df794b086b514bc4b93523e83a30eba223dcf94945a531f0183f06d7c
- Sigstore transparency entry: 1574020801
- Sigstore integration time: May 19, 2026
Source repository:
- Permalink: cdklabs/cdk-multi-az-observability@67a258ef5ba606ea4d88a694c0abeade4172e6d9
- Branch / Tag: refs/heads/main
- Owner: https://github.com/cdklabs
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@67a258ef5ba606ea4d88a694c0abeade4172e6d9
- Trigger Event: push

File details

Details for the file cdklabs_multi_az_observability-0.0.1a67-py3-none-any.whl.

File metadata

Download URL: cdklabs_multi_az_observability-0.0.1a67-py3-none-any.whl
Upload date: May 19, 2026
Size: 21.5 MB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.14.5

File hashes

Hashes for cdklabs_multi_az_observability-0.0.1a67-py3-none-any.whl
Algorithm	Hash digest
SHA256	`713c0425dacd7a932b2e194cbeacf88c351b1f72a13fe9d759b867cf2f987707`
MD5	`776ead3613ae2babc9d684b70c9f29a8`
BLAKE2b-256	`159a69c3a9625d90472457135be9a4f4c42d4be438544796f8c4ad1404731220`

See more details on using hashes here.

Provenance

The following attestation bundles were made for cdklabs_multi_az_observability-0.0.1a67-py3-none-any.whl:

Publisher: release.yml on cdklabs/cdk-multi-az-observability

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: cdklabs_multi_az_observability-0.0.1a67-py3-none-any.whl
- Subject digest: 713c0425dacd7a932b2e194cbeacf88c351b1f72a13fe9d759b867cf2f987707
- Sigstore transparency entry: 1574020674
- Sigstore integration time: May 19, 2026
Source repository:
- Permalink: cdklabs/cdk-multi-az-observability@67a258ef5ba606ea4d88a694c0abeade4172e6d9
- Branch / Tag: refs/heads/main
- Owner: https://github.com/cdklabs
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@67a258ef5ba606ea4d88a694c0abeade4172e6d9
- Trigger Event: push

cdklabs.multi-az-observability 0.0.1a67

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

multi-az-observability

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance