Skip to main content

cdk-monitoring-constructs

Project description

CDK Monitoring Constructs

NPM version Maven Central PyPI version NuGet version Gitpod Ready-to-Code Mergify

Easy-to-use CDK constructs for monitoring your AWS infrastructure with Amazon CloudWatch.

  • Easily add commonly-used alarms using predefined properties
  • Generate concise CloudWatch dashboards that indicate your alarms
  • Extend the library with your own extensions or custom metrics
  • Consume the library in multiple supported languages

Installation

TypeScript

https://www.npmjs.com/package/cdk-monitoring-constructs

In your package.json:

{
  "dependencies": {
    "cdk-monitoring-constructs": "^9.0.0",

    // peer dependencies of cdk-monitoring-constructs
    "aws-cdk-lib": "^2.160.0",
    "constructs": "^10.0.5"

    // ...your other dependencies...
  }
}
Java

See https://mvnrepository.com/artifact/io.github.cdklabs/cdkmonitoringconstructs

Python

See https://pypi.org/project/cdk-monitoring-constructs/

C#

See https://www.nuget.org/packages/Cdklabs.CdkMonitoringConstructs/

Features

You can browse the documentation at https://constructs.dev/packages/cdk-monitoring-constructs/

Item Monitoring Alarms Notes
AWS API Gateway (REST API) (.monitorApiGateway()) TPS, latency, errors Latency, error count/rate, low/high TPS To see metrics, you have to enable Advanced Monitoring
AWS API Gateway V2 (HTTP API) (.monitorApiGatewayV2HttpApi()) TPS, latency, errors Latency, error count/rate, low/high TPS To see route level metrics, you have to enable Advanced Monitoring
AWS AppSync (GraphQL API) (.monitorAppSyncApi()) TPS, latency, errors Latency, error count/rate, low/high TPS
Amazon Aurora (.monitorAuroraCluster()) Query duration, connections, latency, CPU usage, Serverless Database Capacity Connections, Serverless Database Capacity and CPU usage
AWS Billing (.monitorBilling()) AWS account cost Total cost (anomaly) Requires enabling the Receive Billing Alerts option in AWS Console / Billing Preferences
AWS Certificate Manager (.monitorCertificate()) Certificate expiration Days until expiration
AWS CloudFront (.monitorCloudFrontDistribution()) TPS, traffic, latency, errors Error rate, low/high TPS
AWS CloudWatch Logs (.monitorLog()) Patterns present in the log group Minimum incoming logs
AWS CloudWatch Synthetics Canary (.monitorSyntheticsCanary()) Latency, error count/rate Error count/rate, latency
AWS CodeBuild (.monitorCodeBuildProject()) Build counts (total, successful, failed), failed rate, duration Failed build count/rate, duration
AWS DocumentDB (.monitorDocumentDbCluster()) CPU, throttling, read/write latency, transactions, cursors CPU
AWS DynamoDB (.monitorDynamoTable()) Read and write capacity provisioned / used Consumed capacity, throttling, latency, errors
AWS DynamoDB Global Secondary Index (.monitorDynamoTableGlobalSecondaryIndex()) Read and write capacity, indexing progress, throttled events
AWS EC2 (.monitorEC2Instances()) CPU, disk operations, network
AWS EC2 Auto Scaling Groups (.monitorAutoScalingGroup()) Group size, instance status
AWS ECS (.monitorFargateService(), .monitorEc2Service(), .monitorSimpleFargateService(), monitorSimpleEc2Service(), .monitorQueueProcessingFargateService(), .monitorQueueProcessingEc2Service()) System resources and task health Unhealthy task count, running tasks count, CPU/memory usage, and bytes processed by load balancer (if any) Use for ecs-patterns load balanced ec2/fargate constructs (NetworkLoadBalancedEc2Service, NetworkLoadBalancedFargateService, ApplicationLoadBalancedEc2Service, ApplicationLoadBalancedFargateService)
AWS ElastiCache (.monitorElastiCacheCluster()) CPU/memory usage, evictions and connections CPU, memory, items count
AWS Glue (.monitorGlueJob()) Traffic, job status, memory/CPU usage Failed/killed task count/rate
AWS Kinesis Data Analytics (.monitorKinesisDataAnalytics) Up/Downtime, CPU/memory usage, KPU usage, checkpoint metrics, and garbage collection metrics Downtime, full restart count
AWS Kinesis Data Stream (.monitorKinesisDataStream()) Put/Get/Incoming Record/s and Throttling Throttling, throughput, iterator max age
AWS Kinesis Firehose (.monitorKinesisFirehose()) Number of records, requests, latency, throttling Throttling
AWS Lambda (.monitorLambdaFunction()) Latency, errors, iterator max age Latency, errors, throttles, iterator max age Optional Lambda Insights metrics (opt-in) support
AWS Load Balancing (.monitorNetworkLoadBalancer(), .monitorFargateApplicationLoadBalancer(), .monitorFargateNetworkLoadBalancer(), .monitorEc2ApplicationLoadBalancer(), .monitorEc2NetworkLoadBalancer()) System resources and task health Unhealthy task count, running tasks count, (for Fargate/Ec2 apps) CPU/memory usage Use for FargateService or Ec2Service backed by a NetworkLoadBalancer or ApplicationLoadBalancer
AWS OpenSearch/Elasticsearch (.monitorOpenSearchCluster(), .monitorElasticsearchCluster()) Indexing and search latency, disk/memory/CPU usage Indexing and search latency, disk/memory/CPU usage, cluster status, KMS keys
AWS OpenSearch Ingestion (.monitorOpenSearchIngestionPipeline()) Latency, incoming data, DLQ records count DLQ records count
AWS OpenSearch Serverless (.monitorOpenSearchServerlessCollection()) Search latency, errors, ingestion requests/latency Search latency, errors
AWS OpenSearch Serverless (.monitorOpenSearchServerlessIndex()) Documents count
AWS RDS (.monitorRdsCluster()) Query duration, connections, latency, disk/CPU usage Connections, disk and CPU usage
AWS RDS (.monitorRdsInstance()) Query duration, connections, latency, disk/CPU usage Connections, disk and CPU usage
AWS Redshift (.monitorRedshiftCluster()) Query duration, connections, latency, disk/CPU usage Query duration, connections, disk and CPU usage
AWS S3 Bucket (.monitorS3Bucket()) Bucket size and number of objects
AWS SecretsManager (.monitorSecretsManager()) Max secret count, min secret sount, secret count change Min/max secret count or change in secret count
AWS SecretsManager Secret (.monitorSecretsManagerSecret()) Days since last rotation Days since last change or rotation
AWS SNS Topic (.monitorSnsTopic()) Message count, size, failed notifications Failed notifications, min/max published messages
AWS SQS Queue (.monitorSqsQueue(), .monitorSqsQueueWithDlq()) Message count, age, size Message count, age, DLQ incoming messages
AWS Step Functions (.monitorStepFunction(), .monitorStepFunctionActivity(), monitorStepFunctionLambdaIntegration(), .monitorStepFunctionServiceIntegration()) Execution count and breakdown per state Duration, failed, failed rate, aborted, throttled, timed out executions
AWS Web Application Firewall (.monitorWebApplicationFirewallAclV2()) Allowed/blocked requests Blocked requests count/rate
FluentBit (.monitorFluentBit()) Num of input records, Output failures & retries, Filter metrics, Storage metrics FluentBit needs proper configuration with metrics enabled: Official sample configuration. This function creates MetricFilters to publish all FluentBit metrics.
Custom metrics (.monitorCustom()) Addition of custom metrics into the dashboard (each group is a widget) Supports anomaly detection

Getting started

Create a facade

Important note: Please, do NOT import anything from the /dist/lib package. This is unsupported and might break any time.

  1. Create an instance of MonitoringFacade, which is the main entrypoint.
  2. Call methods on the facade like .monitorLambdaFunction() and chain them together to define your monitors. You can also use methods to add your own widgets, headers of various sizes, and more.

For examples of monitoring different resources, refer to the unit tests.

# Example automatically generated from non-compiling source. May contain errors.
# This could be in the same stack as your resources, as a nested stack, or a separate stack as you see fit
class MonitoringStack(DeploymentStack):
    def __init__(self, parent, name, *):
        super().__init__(parent, name)

        monitoring = MonitoringFacade(self, "Monitoring",
            # Defaults are provided for these, but they can be customized as desired
            metric_factory_defaults={...},
            alarm_factory_defaults={...},
            dashboard_factory={...}
        )

        # Monitor your resources
        monitoring.add_large_header("Storage").monitor_dynamo_table().monitor_dynamo_table().monitor_lambda_function().monitor_custom()

Customize actions

Alarms should have actions set up, otherwise they are not very useful.

Example of notifying an SNS topic:

# Example automatically generated from non-compiling source. May contain errors.
# on_alarm_topic: ITopic


monitoring = MonitoringFacade(self, "Monitoring",
    # ...other props
    alarm_factory_defaults={
        # ....other props
        "action": SnsAlarmActionStrategy(on_alarm_topic=on_alarm_topic)
    }
)

You can override the default topic for any alarm like this:

# Example automatically generated from non-compiling source. May contain errors.
monitoring.monitor_something(something,
    add_some_alarm={
        "Warning": {
            # ...other props
            "threshold": 42,
            "action_override": SnsAlarmActionStrategy(on_alarm_topic=on_alarm_topic)
        }
    }
)

Supported actions can be found here, including SNS and Lambda.

You can also compose multiple actions using multipleActions:

# Example automatically generated from non-compiling source. May contain errors.
# on_alarm_topic: ITopic
# on_alarm_function: IFunction


action = multiple_actions(notify_sns(on_alarm_topic), trigger_lambda(on_alarm_function))

Custom metrics

For simply adding some custom metrics, you can use .monitorCustom() and specify your own title and metric groups. Each metric group will be rendered as a single graph widget, and all widgets will be placed next to each other. All the widgets will have the same size, which is chosen based on the number of groups to maximize dashboard space usage.

Custom metric monitoring can be created for simple metrics, simple metrics with anomaly detection and search metrics. The first two also support alarming.

Below we are listing a couple of examples. Let us assume that there are three existing metric variables: m1, m2, m3. They can either be created by hand (new Metric({...})) or (preferably) by using metricFactory (that can be obtained from facade). The advantage of using the shared metricFactory is that you do not need to worry about period, etc.

# Example automatically generated from non-compiling source. May contain errors.
# create metrics manually
m1 = Metric()
# Example automatically generated from non-compiling source. May contain errors.
metric_factory = monitoring_facade.create_metric_factory()

# create metrics using metric factory
m1 = metric_factory.create_metric()

Example: metric with anomaly detection

In this case, only one metric is supported. Multiple metrics cannot be rendered with anomaly detection in a single widget due to a CloudWatch limitation.

# Example automatically generated from non-compiling source. May contain errors.
monitor_custom(
    title="Metric with anomaly detection",
    metric_groups=[{
        "metric": m1,
        "anomaly_detection_standard_deviation_to_render": 3
    }
    ]
)

Adding an alarm:

# Example automatically generated from non-compiling source. May contain errors.
monitor_custom(
    title="Metric with anomaly detection and alarm",
    metric_groups=[{
        "metric": m1,
        "alarm_friendly_name": "MetricWithAnomalyDetectionAlarm",
        "anomaly_detection_standard_deviation_to_render": 3,
        "add_alarm_on_anomaly": {
            "Warning": {
                "standard_deviation_for_alarm": 4,
                "alarm_when_above_the_band": True,
                "alarm_when_below_the_band": True
            }
        }
    }
    ]
)

Example: search metrics

# Example automatically generated from non-compiling source. May contain errors.
monitor_custom(
    title="Metric search",
    metric_groups=[{
        "search_query": "My.Prefix.",
        "dimensions_map": {
            "FirstDimension": "FirstDimensionValue",
            # Allow any value for the given dimension (pardon the weird typing to satisfy DimensionsMap)
            "SecondDimension": undefined
        },
        "statistic": MetricStatistic.SUM
    }
    ]
)

Search metrics do not support setting an alarm, which is a CloudWatch limitation.

Route53 Health Checks

Route53 has strict requirements as to which alarms are allowed to be referenced in Health Checks. You adjust the metric for an alarm so that it can be used in a Route53 Health Checks as follows:

# Example automatically generated from non-compiling source. May contain errors.
monitoring.monitor_something(something,
    add_some_alarm={
        "Warning": {
            # ...other props
            "metric_adjuster": Route53HealthCheckMetricAdjuster.INSTANCE
        }
    }
)

This will ensure the alarm can be used on a Route53 Health Check or otherwise throw an Error indicating why the alarm can't be used. In order to easily find your Route53 Health Check alarms later on, you can apply a custom tag to them as follows:

# Example automatically generated from non-compiling source. May contain errors.
from aws_cdk.aws_route53 import CfnHealthCheck


monitoring.monitor_something(something,
    add_some_alarm={
        "Warning": {
            # ...other props
            "custom_tags": ["route53-health-check"],
            "metric_adjuster": Route53HealthCheckMetricAdjuster.INSTANCE
        }
    }
)

alarms = monitoring.created_alarms_with_tag("route53-health-check")

health_checks = alarms.map(({ alarm }) => {
      const id = getHealthCheckConstructId(alarm);

      return new CfnHealthCheck(scope, id, {
        healthCheckConfig: {
          // ...other props
          type: "CLOUDWATCH_METRIC",
          alarmIdentifier: {
            name: alarm.alarmName,
            region: alarm.stack.region,
          },
        },
      });
    })

Custom monitoring segments

If you want even more flexibility, you can create your own segment.

This is a general procedure on how to do it:

  1. Extend the Monitoring class
  2. Override the widgets() method (and/or similar ones)
  3. Leverage the metric factory and alarm factory provided by the base class (you can create additional factories, if you will)
  4. Add all alarms to .addAlarm() so they are visible to the user and being placed on the alarm summary dashboard

Both of these monitoring base classes are dashboard segments, so you can add them to your monitoring by calling .addSegment() on the MonitoringFacade.

Modifying or omitting widgets from default dashboard segments

While the dashboard widgets defined in the library are meant to cover most use cases, they might not be what you're looking for.

To modify the widgets:

  1. Extend the appropriate Monitoring class (e.g., LambdaFunctionMonitoring for monitorLambdaFunction) and override the relevant methods (e.g., widgets):

    # Example automatically generated from non-compiling source. May contain errors.
    class MyCustomizedLambdaFunctionMonitoring(LambdaFunctionMonitoring):
        def widgets(self):
            return []
    
  2. Use the facade's addSegment method with your custom class:

    # Example automatically generated from non-compiling source. May contain errors.
    # facade: MonitoringFacade
    
    
    facade.add_segment(MyCustomizedLambdaFunctionMonitoring(facade))
    

Custom dashboards

If you want even more flexibility, you can take complete control over dashboard generation by leveraging dynamic dashboarding features. This allows you to create an arbitrary number of dashboards while configuring each of them separately. You can do this in three simple steps:

  1. Create a dynamic dashboard factory
  2. Create IDynamicDashboardSegment implementations
  3. Add Dynamic Segments to your MonitoringFacade

Create a dynamic dashboard factory

The below code sample will generate two dashboards with the following names:

  • ExampleDashboards-HostedService
  • ExampleDashboards-Infrastructure
# Example automatically generated from non-compiling source. May contain errors.
# create the dynamic dashboard factory.
factory = DynamicDashboardFactory(stack, "DynamicDashboards",
    dashboard_name_prefix="ExampleDashboards",
    dashboard_configs=[{"name": "HostedService"}, {
        "name": "Infrastructure",
        "range": Duration.hours(3),
        "period_override": PeriodOverride.AUTO,
        "rendering_preference": DashboardRenderingPreference.BITMAP_ONLY
    }
    ]
)

Create IDynamicDashboardSegment implementations

For each construct you want monitored, you will need to create an implementation of an IDynamicDashboardSegment. The following is a basic reference implementation as an example:

# Example automatically generated from non-compiling source. May contain errors.
export enum DashboardTypes {
  HostedService = "HostedService",
  Infrastructure = "Infrastructure",
}

class ExampleSegment(IDynamicDashboardSegment):
    def widgets_for_dashboard(self, name): switch (name) {
              case DashboardTypes.HostedService:
                return [new TextWidget({ markdown: "This shows metrics for your service hosted on AWS Infrastructure" })];
              case DashboardTypes.Infrastructure:
                return [new TextWidget({ markdown: "This shows metrics for the AWS Infrastructure supporting your hosted service" })];
              default:
                throw new Error("Unexpected dashboard name!");
            }

Add Dynamic Segments to MonitoringFacade

When you have instances of an IDynamicDashboardSegment to use, they can be added to your dashboard like this:

# Example automatically generated from non-compiling source. May contain errors.
monitoring.add_dynamic_segment(ExampleSegment())

Now, this widget will be added to both dashboards and will show different content depending on the dashboard. Using the above example code, two dashboards will be generated with the following content:

  • Dashboard Name: "ExampleDashboards-HostedService"

    • Content: "This shows metrics for your service hosted on AWS Infrastructure"
  • Dashboard Name: "ExampleDashboards-Infrastructure"

    • Content: "This shows metrics for the AWS Infrastructure supporting your hosted service"

Cross-account cross-Region Dashboards

Facades can be configured for different regions/accounts as a whole:

# Example automatically generated from non-compiling source. May contain errors.
MonitoringFacade(stack, "Monitoring",
    metric_factory_defaults={
        # Different region/account than what you're deploying to
        "region": "us-west-2",
        "account": "01234567890"
    }
)

Or at a more granular level:

# Example automatically generated from non-compiling source. May contain errors.
monitoring.monitor_dynamo_table(
    # Table from the same account/region
    table=Table.from_table_name(stack, "ImportedTable", "MyTableName")
).monitor_dynamo_table(
    # Table from another account/region
    table=Table.from_table_arn(stack, "XaXrImportedTable", "arn:aws:dynamodb:us-west-2:01234567890:table/my-other-table"),
    region="us-west-2",
    account="01234567890"
)

The order of precedence of the region/account values is:

  1. The individual metric factory's props (e.g. via the monitorDynamoTable props).
  2. The facade's metricFactoryDefaults props.
  3. The region/account that the stack is deployed to.

Note that while this allows for cross-account cross-Region dashboarding, cross-Region alarming is not supported by CloudWatch.

Monitoring scopes

You can monitor complete CDK construct scopes using an aspect. It will automatically discover all monitorable resources within the scope recursively and add them to your dashboard.

# Example automatically generated from non-compiling source. May contain errors.
monitoring.monitor_scope(stack,
    # With optional configuration
    lambda_={
        "props": {
            "add_latency_p50_alarm": {
                "Critical": {"max_latency": Duration.seconds(10)}
            }
        }
    },

    # Some resources that aren't dependent on nodes (e.g. general metrics across instances/account) may be included
    # by default, which can be explicitly disabled.
    billing={"enabled": False},
    ec2={"enabled": False},
    elastic_cache={"enabled": False}
)

Cloning alarms

You can also create alarms by cloning other alarms and applying a modification function. When given a list of alarms created using MonitoringFacade, the facade can apply a user-supplied function on each, generating new alarms with customizations from the function.

# Example automatically generated from non-compiling source. May contain errors.
# Clone alarms using a cloning-function
critical_alarms = monitoring.created_alarms_with_disambiguator("Critical")
clones = monitoring.clone_alarms(critical_alarms, (a) => {
       // Define a new alarm that has values inspired by the original alarm
       // Adjust some of those values using arbitrary, user-provided logic
       return {
          ...a.alarmDefinition.addAlarmProps,
          actionsEnabled: false,
          disambiguator: "ClonedCritical",
          alarmDescription: "Cloned alarm of " + a.alarmDescription,
          // Bump the threshold a bit
          threshold: a.alarmDefinition.addAlarmProps.threshold * 1.1,
          // Tighten the number of datapoints a bit
          datapointsToAlarm: a.alarmDefinition.datapointsToAlarm - 1,
          // Keep the same number of evaluation periods
          evaluationPeriods: a.alarmDefinition.evaluationPeriods,
       }
    })

This technique is particularly useful when you are using alarms for multiple purposes. For instance, you may want to ensure regressions that result in an SLA-breach are automatically rolled back before a ticketing action takes effect. This scheme uses pairs of alarms for each metric: a conservative ticketing alarm and an aggressive rollback alarm.

Rather that specifying both alarms throughout your application, you can automatically create the companion alarms by cloning with a scaling function. This library provides a ScaleFunction implementation that can be configured with multiplication factors for threshold, datapointsToAlarm, and evaluationPeriods; scaling factors between 0.0 and 1.0 will generate more aggressive alarms.

# Example automatically generated from non-compiling source. May contain errors.
# Clone critical alarms using a tighting scaling function
critical_alarms = monitoring.created_alarms_with_disambiguator("Critical")
rollback_alarms = monitoring.clone_alarms(critical_alarms, ScaleAlarms(
    disambiguator="Rollback",
    threshold_multiplier=0.8,
    datapoints_to_alarm_multiplier=0.3,
    evaluation_periods_multiplier=0.5
))

Contributing

See CONTRIBUTING for more information.

Security policy

See SECURITY for more information.

License

This project is licensed under the Apache-2.0 License.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cdk_monitoring_constructs-9.19.0.tar.gz (1.5 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cdk_monitoring_constructs-9.19.0-py3-none-any.whl (1.5 MB view details)

Uploaded Python 3

File details

Details for the file cdk_monitoring_constructs-9.19.0.tar.gz.

File metadata

File hashes

Hashes for cdk_monitoring_constructs-9.19.0.tar.gz
Algorithm Hash digest
SHA256 1d06af0217b1512b3212d0bb4a151425cbf91926e72c82fc511ec32a229fc5c8
MD5 bcee1f4e0063055dfe8f24f5b5b9529f
BLAKE2b-256 0939b6319b5fd8cb511458ff48f3f1590a2102dfc65739eae74428276f30eb6f

See more details on using hashes here.

Provenance

The following attestation bundles were made for cdk_monitoring_constructs-9.19.0.tar.gz:

Publisher: release.yml on cdklabs/cdk-monitoring-constructs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cdk_monitoring_constructs-9.19.0-py3-none-any.whl.

File metadata

File hashes

Hashes for cdk_monitoring_constructs-9.19.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c7eb90f0f803dc6291fa6177e343f96578072dcbab9e770add413577923937bf
MD5 d94bfac331d85a16f0e481007ee20deb
BLAKE2b-256 5e3f90a34ced89895cd20adce92f24959093db196906b750185c88f01dbdb087

See more details on using hashes here.

Provenance

The following attestation bundles were made for cdk_monitoring_constructs-9.19.0-py3-none-any.whl:

Publisher: release.yml on cdklabs/cdk-monitoring-constructs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page