Skip to main content

Validation of aws resources used to create Cloudera Data Platform environments

Project description

cdp_validator_for_aws

Overview

This tool validates that AWS resources have been setup correctly for use by Cloudera Data Platform (cdp), so that cdp can use those resources to create an environment, as defined in the Cloudera documentation.

Running the tool

The resources to be validated are recorded in a json file (called my_cdp.json below).

The validation uses AWS services and so needs a role with sufficient permissions. We setup and use a role called validator in the example below.

Both of the above are described in detail later in this document.

Once you've met the above prerequisites then execution is simple:

python -m cdp_validator_for_aws -c my_cdp_file.json --profile validator

Setup

Python Package

We recommend using a python virtual environment and installing this package into that environment. This will help eliminate any environmental issues while executing this tool.

CDP JSON File

This tool uses a json file (we called it my_cdp.json in the example above, but its name doesn't matter) to feed in the information about the resources to be checked.

The format of this file is shown below (there could be extra elements - the once we're displaying are the critical ones) and is generated from the cdp gui. However there are two elements that are not generated by the gui and are added by hand. They are:

  • idBrokerInstanceProfileArn1
  • storageLocationBase
{
  "aws": {
    "s3guard": {
      "dynamoDbTableName": "dynamo"
    }
  },
  "idBrokerInstanceProfileArn": "arn:aws:iam::007856030109:instance-profile/idbroker_instance_profile_workable-bird",
  "idBrokerMappings": {
    "baselineRole": "arn:aws:iam::007856030109:role/datalake_admin_role_workable-bird",
    "dataAccessRole": "arn:aws:iam::007856030109:role/ranger_audit_role_workable-bird",
  },
  "location": {
    "name": "us-east-1"
  },
  "network": {
    "aws": {
      "vpcId": "vpc-0bd760316679db5cb"
    },
    "subnetIds": [
      "subnet-0aaea807fb0bd7324",
      "subnet-0cf3890ddf5418adb",
      "subnet-019052b500b0ec751"
    ]
  },
  "securityAccess": {
    "defaultSecurityGroupId": "sg-0614ae4bc34aab00a",
    "securityGroupIdForKnox": "sg-0881e000a25678273"
  },
  "storageLocationBase": "s3a://terraform-20191004154753079000000001/base",
  "telemetry": {
    "logging": {
      "s3": {
        "instanceProfile": "arn:aws:iam::007856030109:instance-profile/logger_instance_profile_workable-bird"
      },
      "storageLocation": "s3a://terraform-20191004154753079700000002/logs"
    }
  }
}

The meanings of these fields is given below using jsonpath to denote the fields:

  • aws.s3guard.dynamoDbTableName: The name of the dynamo db table to be created
  • idBrokerInstanceProfileArn: The arn of the idbroker instance profile used to run the idbroker ec2 instance
  • idBrokerMappings.baselineRole: The arn of the adminstrator role that is used to manage data in the CDP datalake
  • idBrokerMappings.dataAccessRole: the arn of the ranger audit role
  • location.name: The AWS region for these resources
  • network.aws.vpcId: The VPC id
  • network.subnetIds: An array of subnet ids that will be used by the CDP
  • securityAccess.defaultSecurityGroupId: Id of the default security group
  • securityAccess.securityGroupIdForKnox: Id of the security group for Knox
  • storageLocationBase: The s3a:// url to the bucket and path where data will be stored in the data lake
  • telementery.logging.s3.instanceProfile: The arn of the instance profile that will be running the logging system
  • telemetry.logging.storageLocation: The s3a:// url where logs will be placed.

AWS Setup

AWS needs to be properly setup for this tool to work.

CLI

We assume you have installed and configured the AWS CLI as per AWS CLI Documentation

Permissions

The minimum permissions needed to run cdp_validator_for_aws are:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "ec2:DescribeRouteTables",
                "ec2:DescribeSecurityGroups",
                "ec2:DescribeSubnets",
                "ec2:DescribeVpcs",
                "ec2:DescribeVpcAttribute",
                "eks:ListClusters",
                "iam:GetContextKeysForPrincipalPolicy",
                "iam:GetInstanceProfile",
                "iam:GetRole",
                "iam:SimulatePrincipalPolicy",
                "s3:GetBucketLocation",
                "s3:HeadBucket"
            ],
            "Resource": "*"
        }
    ]
}

The permissions that have the deepest security impact are those required to simulate the various roles (iam:GetContextKeysForPrincipalPolicy & iam:SimulatePrincipalPolicy), as documented by AWS. cdp_validator_for_aws will do as much as it can with whatever permissions you can give it.

cdp_validator_for_aws takes a --profile profile_name argument, as per the usual AWS CLI, and all calls are handed off to Amazon's boto3 package to do the actual work.

Setting up the permissions structure

Lets assume you've setup to execute AWS CLI commands with the default profile with whatever permissions you normally get.

  1. Create a role (lets call it cdp_validation) that:
    1. Trusts your default role
    2. Has the above permissions (or most of them)
  2. In ${HOME}/.aws/credentials put the following:
[validator]
role_arn = arn:aws:iam::YOUR_AWS_ACCOUNT_ID:role/cdp_validation
source_profile = default

Now you can run the validator thus:

python -m cdp_validator_for_aws -c my_cdp_file.json --profile validator

Configuration

No configuration is needed. The below information is simply for full documentation purposes.

Policy Management

Cloudera's documentation shows the various policy files that are combined to give each of the four roles their necessary permissions for various resources.

These files are in the policies directory of the package and are named according to Cloudera's naming conventions defined in the Minimal setup for cloud storage. They dictate the actions and resources that are simulated for each role. If the actions change in the future then these files can be simply updated. If the variables in the resources change then I'm afraid you'll have to change the code (look in the policy_manager.py to start)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cdp_validator_for_aws-0.0.5.tar.gz (20.0 kB view details)

Uploaded Source

Built Distribution

cdp_validator_for_aws-0.0.5-py3-none-any.whl (28.1 kB view details)

Uploaded Python 3

File details

Details for the file cdp_validator_for_aws-0.0.5.tar.gz.

File metadata

  • Download URL: cdp_validator_for_aws-0.0.5.tar.gz
  • Upload date:
  • Size: 20.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.7.4

File hashes

Hashes for cdp_validator_for_aws-0.0.5.tar.gz
Algorithm Hash digest
SHA256 dd6ad41a1f6d711b00455de2518f5943c5b4508e1a79264edc6175bf9b3ea31b
MD5 83c965d82aa9f249649679df17ba6125
BLAKE2b-256 5f662abcb1a120f40ea20b48bb22ed10b9660287b1e34f016f55526f2ee04565

See more details on using hashes here.

File details

Details for the file cdp_validator_for_aws-0.0.5-py3-none-any.whl.

File metadata

  • Download URL: cdp_validator_for_aws-0.0.5-py3-none-any.whl
  • Upload date:
  • Size: 28.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.7.4

File hashes

Hashes for cdp_validator_for_aws-0.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 77c61273fa69d121deedf93c3621121568d05360c540a86556dbab2adbb783ef
MD5 57cf3c04eb54c61192230504b01d60eb
BLAKE2b-256 f9e3455898801ea013f40cb32f9729d5f8e9a5dbf548656c8a4dade4994a802d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page