Skip to main content

A constrcut for PII and redaction scenarios with Amazon Comprehend and S3 Object Lambda

Project description

cdk-comprehend-s3olap

License Release npm downloads pypi downloads NuGet downloads repo languages

npm (JS/TS) PyPI (Python) Maven (Java) Go NuGet
Link Link Link Link Link

This construct creates the foundation for developers to explore the combination of Amazon S3 Object Lambda and Amazon Comprehend for PII scenarios and it is designed with flexibility, i.e, the developers could tweak arguments via CDK to see how AWS services work and behave.

Table of Contents

Serverless Architecture

Access Control

Data Flow image Ram R. and Austin Q., 2021 Arhictecture image Ram R. and Austin Q., 2021

Redaction

image Ram R. and Austin Q., 2021 image Ram R. and Austin Q., 2021

Introduction

The architecture was introduced by Ram Ramani and Austin Quam and was posted on the AWS Blog as Protect PII using Amazon S3 Object Lambda to process and modify data during retrieval. I converted the architecture into a CDK constrcut for 4 programming languages. With this construct, you could manage the properties of IAM roles, the Lambda functions with Amazon Comprehend, and few for the constrcut. Before deploying the construct via the CDK, you could either places the text files, i.e., those for the access control case and redaction case, under a directory with a specific name as the following or just deploying directly yet you need to upload the text files onto the S3 buckets manually yourself. It's all your choie.

# For the access control case.
$ cd ${ROOT_DIRECTORY_CDK_APPLICATION}
$ mkdir -p files/access_control
$ curl -o survey-results.txt https://raw.githubusercontent.com/aws-samples/amazon-comprehend-examples/master/s3_object_lambda_pii_protection_blog/access-control/survey-results.txt
$ curl -o innocuous.txt https://raw.githubusercontent.com/aws-samples/amazon-comprehend-examples/master/s3_object_lambda_pii_protection_blog/access-control/innocuous.txt
# For the redaction case.
$ cd ${ROOT_DIRECTORY_CDK_APPLICATION}
$ mkdir -p files/redaction
$ curl -o transcript.txt https://raw.githubusercontent.com/aws-samples/amazon-comprehend-examples/master/s3_object_lambda_pii_protection_blog/redaction/transcript.txt

Example

Typescript

You could also refer to here.

$ cdk --init language typescript
$ yarn add cdk-comprehend-s3olap
import * as cdk from '@aws-cdk/core';
import { ComprehendS3olab } from 'cdk-comprehend-s3olap';

class TypescriptStack extends cdk.Stack {
  constructor(scope: cdk.Construct, id: string, props?: cdk.StackProps) {
    super(scope, id, props);
    const s3olab = new ComprehendS3olab(this, 'PiiDemo', {
      adminRedactionLambdaConfig: {
        maskCharacter: ' ',
        unsupportedFileHandling: 'PASS',
      },
      billingRedactionLambdaConfig: {
        maskMode: 'REPLACE_WITH_PII_ENTITY_TYPE',
        piiEntityTypes: 'AGE,DRIVER_ID,IP_ADDRESS,MAC_ADDRESS,PASSPORT_NUMBER,PASSWORD,SSN',
      },
      cusrtSupportRedactionLambdaConfig: {
        maskMode: 'REPLACE_WITH_PII_ENTITY_TYPE',
        piiEntityTypes: ' BANK_ACCOUNT_NUMBER,BANK_ROUTING,CREDIT_DEBIT_CVV,CREDIT_DEBIT_EXPIRY,CREDIT_DEBIT_NUMBER,SSN',
      },
    });

    new cdk.CfnOutput(this, 'OPiiAccessControlLambdaArn', { value: s3olab.piiAccessConrtolLambdaArn });
    new cdk.CfnOutput(this, 'OAdminLambdaArn', { value: s3olab.adminLambdaArn });
    new cdk.CfnOutput(this, 'OBillingLambdaArn', { value: s3olab.billingLambdaArn });
    new cdk.CfnOutput(this, 'OCustomerSupportLambdaArn', { value: s3olab.customerSupportLambdaArn });
    new cdk.CfnOutput(this, 'OS3ObjectLambdaGeneralArn', { value: s3olab.s3objectLambdaAccessControlArn });
    new cdk.CfnOutput(this, 'OS3ObjectLambdaAdminArn', { value: s3olab.s3objectLambdaAdminArn });
    new cdk.CfnOutput(this, 'OS3ObjectLambdaBillingArn', { value: s3olab.s3objectLambdaBillingArn });
    new cdk.CfnOutput(this, 'OS3ObjectLambdaCustomerSupportArn', { value: s3olab.customerSupportLambdaArn });
  }
}

const app = new cdk.App();
new TypescriptStack(app, 'TypescriptStack', {
  stackName: 'Comprehend-S3olap',
});

Python

You could also refer to here.

# upgrading related Python packages
$ python -m ensurepip --upgrade
$ python -m pip install --upgrade pip
$ python -m pip install --upgrade virtualenv
# initialize a CDK Python project
$ cdk init --language python
# make packages installed locally instead of globally
$ source .venv/bin/activate
$ # add "cdk-comprehend-s3olap==2.0.113" into `setup.py`
$ python -m pip install --upgrade -r requirements.txt

The demonstration sample code of Python can be viewed via the Python tab of this package on the Constrcut Hub.

Java

You could also refer to here.

$ cdk init --language java
$ mvn package # If you include the construct, you need to tweak the test case for Java in order to package with success via Maven.
```xml
.
.
<properties>
    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
    <cdk.version>2.72.1</cdk.version>
    <constrcut.verion>2.0.113</constrcut.verion>
    <junit.version>5.7.1</junit.version>
</properties>
.
.
<dependencies>
    <!-- AWS Cloud Development Kit -->
    <dependency>
        <groupId>software.amazon.awscdk</groupId>
        <artifactId>core</artifactId>
        <version>${cdk.version}</version>
    </dependency>
    <dependency>
        <groupId>io.github.hsiehshujeng</groupId>
        <artifactId>cdk-comprehend-s3olap</artifactId>
        <version>${constrcut.verion}</version>
    </dependency>
    .
    .
    .
</dependencies>
package com.myorg;

import software.amazon.awscdk.core.CfnOutput;
import software.amazon.awscdk.core.CfnOutputProps;
import software.amazon.awscdk.core.Construct;
import software.amazon.awscdk.core.Stack;
import software.amazon.awscdk.core.StackProps;
import io.github.hsiehshujeng.cdk.comprehend.s3olap.RedactionLambdaProps;
import io.github.hsiehshujeng.cdk.comprehend.s3olap.ComprehendS3olab;
import io.github.hsiehshujeng.cdk.comprehend.s3olap.ComprehendS3olabProps;

public class JavaStack extends Stack {
    public JavaStack(final Construct scope, final String id) {
        this(scope, id, null);
    }

    public JavaStack(final Construct scope, final String id, final StackProps props) {
        super(scope, id, props);

        ComprehendS3olab s3olab = new ComprehendS3olab(this, "PiiDemo", ComprehendS3olabProps.builder()
            .adminRedactionLambdaConfig(
                RedactionLambdaProps.builder()
                    .maskCharacter(" ")
                    .unsupportedFileHandling("PASS").build())
            .billingRedactionLambdaConfig(
                RedactionLambdaProps.builder()
                    .maskMode("REPLACE_WITH_PII_ENTITY_TYPE")
                    .piiEntityTypes("AGE,DRIVER_ID,IP_ADDRESS,MAC_ADDRESS,PASSPORT_NUMBER,PASSWORD,SSN")
                    .build())
            .cusrtSupportRedactionLambdaConfig(
                RedactionLambdaProps.builder()
                .maskMode("REPLACE_WITH_PII_ENTITY_TYPE")
                .piiEntityTypes("BANK_ACCOUNT_NUMBER,BANK_ROUTING,CREDIT_DEBIT_CVV,CREDIT_DEBIT_EXPIRY,CREDIT_DEBIT_NUMBER,SSN")
                .build())
            .exampleFileDir("/opt/learning/cdk-comprehend-s3olap/src/demo/java")
            .build()
            );

          new CfnOutput(this, "OPiiAccessControlLambdaArn", CfnOutputProps.builder().value(s3olab.getPiiAccessConrtolLambdaArn()).build());
          new CfnOutput(this, "OAdminLambdaArn", CfnOutputProps.builder().value(s3olab.getAdminLambdaArn()).build());
          new CfnOutput(this, "OBillingLambdaArn", CfnOutputProps.builder().value(s3olab.getBillingLambdaArn()).build());
          new CfnOutput(this, "OCustomerSupportLambdaArn", CfnOutputProps.builder().value(s3olab.getCustomerSupportLambdaArn()).build());
          new CfnOutput(this, "OS3ObjectLambdaGeneralArn", CfnOutputProps.builder().value(s3olab.getS3objectLambdaAccessControlArn()).build());
          new CfnOutput(this, "OS3ObjectLambdaAdminArn", CfnOutputProps.builder().value(s3olab.getS3objectLambdaAdminArn()).build());
          new CfnOutput(this, "OS3ObjectLambdaBillingArn", CfnOutputProps.builder().value(s3olab.getS3objectLambdaBillingArn()).build());
          new CfnOutput(this, "OS3ObjectLambdaCustomerSupportArn", CfnOutputProps.builder().value(s3olab.getCustomerSupportLambdaArn()).build());
    }
}

C#

You could also refer to here.

$ cdk init --language csharp
$ dotnet add src/Csharp package Comprehend.S3olap --version 2.0.113
using Amazon.CDK;
using ScottHsieh.Cdk;

namespace Csharp
{
    public class CsharpStack : Stack
    {
        internal CsharpStack(Construct scope, string id, IStackProps props = null) : base(scope, id, props)
        {
            var S3olab = new ComprehendS3olab(this, "PiiDemo", new ComprehendS3olabProps
            {
                AdminRedactionLambdaConfig = new RedactionLambdaProps
                {
                    MaskCharacter = " ",
                    UnsupportedFileHandling = "PASS"
                },
                BillingRedactionLambdaConfig = new RedactionLambdaProps
                {
                    MaskMode = "REPLACE_WITH_PII_ENTITY_TYPE",
                    PiiEntityTypes = "AGE,DRIVER_ID,IP_ADDRESS,MAC_ADDRESS,PASSPORT_NUMBER,PASSWORD,SSN"
                },
                CusrtSupportRedactionLambdaConfig = new RedactionLambdaProps
                {
                    MaskMode = "REPLACE_WITH_PII_ENTITY_TYPE",
                    PiiEntityTypes = "BANK_ACCOUNT_NUMBER,BANK_ROUTING,CREDIT_DEBIT_CVV,CREDIT_DEBIT_EXPIRY,CREDIT_DEBIT_NUMBER,SSN"
                },
                ExampleFileDir = "/opt/learning/cdk-comprehend-s3olap/src/demo/csharp"
            });

            new CfnOutput(this, "OPiiAccessControlLambdaArn", new CfnOutputProps { Value = S3olab.PiiAccessConrtolLambdaArn });
            new CfnOutput(this, "OAdminLambdaArn", new CfnOutputProps { Value = S3olab.AdminLambdaArn });
            new CfnOutput(this, "OBillingLambdaArn", new CfnOutputProps { Value = S3olab.BillingLambdaArn });
            new CfnOutput(this, "OCustomerSupportLambdaArn", new CfnOutputProps { Value = S3olab.CustomerSupportLambdaArn });
            new CfnOutput(this, "OS3ObjectLambdaGeneralArn", new CfnOutputProps { Value = S3olab.S3objectLambdaAccessControlArn });
            new CfnOutput(this, "OS3ObjectLambdaAdminArn", new CfnOutputProps { Value = S3olab.S3objectLambdaAdminArn });
            new CfnOutput(this, "OS3ObjectLambdaBillingArn", new CfnOutputProps { Value = S3olab.S3objectLambdaBillingArn });
            new CfnOutput(this, "OS3ObjectLambdaCustomerSupportArn", new CfnOutputProps { Value = S3olab.CustomerSupportLambdaArn });
        }
    }
}

Some Notes

  1. You should see similar items as the following diagram displays after deploying the constrcut. image
  2. After creating the foundation with success, you could switch roles that the consrtcut creates for you and see how Amazon S3 Object Lambda works. For what switching roles is, please refer to here for the detail. image
  3. You explore Amazon S3 Object Lambda through the Object Lambda access points on the AWS Console and open or download the text files via one of the IAM roles.
  4. Lambda code that incorporates with Amazon Comprehend could be see here.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cdk_comprehend_s3olap-2.0.275.tar.gz (15.4 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cdk_comprehend_s3olap-2.0.275-py3-none-any.whl (15.4 MB view details)

Uploaded Python 3

File details

Details for the file cdk_comprehend_s3olap-2.0.275.tar.gz.

File metadata

  • Download URL: cdk_comprehend_s3olap-2.0.275.tar.gz
  • Upload date:
  • Size: 15.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.5

File hashes

Hashes for cdk_comprehend_s3olap-2.0.275.tar.gz
Algorithm Hash digest
SHA256 b72aa2fdbdabdca27d61c86ca58e95d2b31667894666b2d61d9f85818b81bc23
MD5 57665cd7a6cb4327710b873f07076545
BLAKE2b-256 fccc7ed912df5ca8386c3787b940be0beae169ecab5bbf04d4a239549ebb37b1

See more details on using hashes here.

File details

Details for the file cdk_comprehend_s3olap-2.0.275-py3-none-any.whl.

File metadata

File hashes

Hashes for cdk_comprehend_s3olap-2.0.275-py3-none-any.whl
Algorithm Hash digest
SHA256 b7ac0b17302f9e3dee7227c6e7a1a2bc0847c9c7952022474f3fcb77e92e4adf
MD5 29d1a14ff8ff9b09305684d819db1a0a
BLAKE2b-256 fdfca5b3c9cb2e5461dedcb4c7a158ea8cd7593751d5e4fd246818d42b899b5e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page