Validation of aws resources used to create Cloudera Data Platform environments
Project description
cdp_validator_for_aws
Overview
This tool validates that AWS resources have been setup correctly for use by Cloudera Data Platform (cdp), so that cdp can use those resources to create an environment, as defined in the Cloudera documentation.
Running the tool
The resources to be validated are recorded in a json
file (called
my_cdp.json
below).
The validation uses AWS services and so needs a role with sufficient
permissions. We setup and use a role called validator
in the example
below.
Both of the above are described in detail later in this document.
Once you've met the above prerequisites then execution is simple:
python -m cdp_validator_for_aws -c my_cdp_file.json --profile validator
Setup
Python Package
We recommend using a python virtual environment and installing this package into that environment. This will help eliminate any environmental issues while executing this tool.
CDP JSON File
This tool uses a json
file (we called it my_cdp.json
in the
example above, but its name doesn't matter) to feed in the information
about the resources to be checked.
The format of this file is shown below (there could be extra elements - the once we're displaying are the critical ones) and is generated from the cdp gui. However there are two elements that are not generated by the gui and are added by hand. They are:
- `idBrokerInstanceProfileArn1
storageLocationBase
{
"aws": {
"s3guard": {
"dynamoDbTableName": "dynamo"
}
},
"idBrokerInstanceProfileArn": "arn:aws:iam::007856030109:instance-profile/idbroker_instance_profile_workable-bird",
"idBrokerMappings": {
"baselineRole": "arn:aws:iam::007856030109:role/datalake_admin_role_workable-bird",
"dataAccessRole": "arn:aws:iam::007856030109:role/ranger_audit_role_workable-bird",
},
"location": {
"name": "us-east-1"
},
"network": {
"aws": {
"vpcId": "vpc-0bd760316679db5cb"
},
"subnetIds": [
"subnet-0aaea807fb0bd7324",
"subnet-0cf3890ddf5418adb",
"subnet-019052b500b0ec751"
]
},
"securityAccess": {
"defaultSecurityGroupId": "sg-0614ae4bc34aab00a",
"securityGroupIdForKnox": "sg-0881e000a25678273"
},
"storageLocationBase": "s3a://terraform-20191004154753079000000001/base",
"telemetry": {
"logging": {
"s3": {
"instanceProfile": "arn:aws:iam::007856030109:instance-profile/logger_instance_profile_workable-bird"
},
"storageLocation": "s3a://terraform-20191004154753079700000002/logs"
}
}
}
The meanings of these fields is given below using jsonpath
to denote
the fields:
aws.s3guard.dynamoDbTableName
: The name of the dynamo db table to be createdidBrokerInstanceProfileArn
: The arn of the idbroker instance profile used to run the idbroker ec2 instanceidBrokerMappings.baselineRole
: The arn of the adminstrator role that is used to manage data in the CDP datalakeidBrokerMappings.dataAccessRole
: the arn of the ranger audit rolelocation.name
: The AWS region for these resourcesnetwork.aws.vpcId
: The VPC idnetwork.subnetIds
: An array of subnet ids that will be used by the CDPsecurityAccess.defaultSecurityGroupId
: Id of the default security groupsecurityAccess.securityGroupIdForKnox
: Id of the security group for KnoxstorageLocationBase
: Thes3a://
url to the bucket and path where data will be stored in the data laketelementery.logging.s3.instanceProfile
: The arn of the instance profile that will be running the logging systemtelemetry.logging.storageLocation
: Thes3a://
url where logs will be placed.
AWS Setup
AWS needs to be properly setup for this tool to work.
CLI
We assume you have installed and configured the AWS CLI as per {AWS CLI Documentation]( https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-install.html)
Permissions
The minimum permissions needed to run cdp_validator_for_aws
are:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "VisualEditor0",
"Effect": "Allow",
"Action": [
"ec2:DescribeRouteTables",
"ec2:DescribeSecurityGroups",
"ec2:DescribeSubnets",
"ec2:DescribeVpcs",
"ec2:DescribeVpcAttribute",
"eks:ListClusters",
"iam:GetContextKeysForPrincipalPolicy",
"iam:GetInstanceProfile",
"iam:GetRole",
"iam:SimulatePrincipalPolicy",
"s3:GetBucketLocation",
"s3:HeadBucket"
],
"Resource": "*"
}
]
}
The permissions that have the deepest security impact are those
required to simulate the various roles
(iam:GetContextKeysForPrincipalPolicy
&
iam:SimulatePrincipalPolicy
), as
documented
by AWS. cdp_validator_for_aws
will do as much as it can with whatever permissions you
can give it.
cdp_validator_for_aws
takes a --profile profile_name
argument, as
per the usual AWS CLI, and all calls are handed off to Amazon's
boto3
package to do the actual work.
Setting up the permissions structure
Lets assume you've setup to execute AWS CLI commands with the
default
profile with whatever permissions you normally get.
- Create a role (lets call it
cdp_validation
) that:- Trusts your
default
role - Has the above permissions (or most of them)
- Trusts your
- In
${HOME}/.aws/credentials
put the following:
[validator]
role_arn = arn:aws:iam::YOUR_AWS_ACCOUNT_ID:role/cdp_validation
source_profile = default
Now you can run the validator thus:
python -m cdp_validator_for_aws -c my_cdp_file.json --profile validator
Configuration
No configuration is needed. The below information is simply for full documentation purposes.
Policy Management
Cloudera's documentation shows the various policy files that are combined to give each of the four roles their necessary permissions for various resources.
These files are in the policies
directory of the package and are
named according to Cloudera's naming conventions defined in the
Minimal setup for cloud
storage.
They dictate the actions and resources that are simulated for each
role. If the actions change in the future then these files can be
simply updated. If the variables in the resources change then I'm
afraid you'll have to change the code (look in the policy_manager.py
to start)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for cdp_validator_for_aws-0.0.3.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 024f7a0da18b4faee19dac488ff41973af6f1d066569da47ab013ab0315798ab |
|
MD5 | 90eeffbfd6a50a22b2f2835bb5245bdc |
|
BLAKE2b-256 | a47a59b6c585a2833c305072824e93aa0607d078e5292bf9da7c49f4a382726d |
Hashes for cdp_validator_for_aws-0.0.3-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 375b00bc9c270144bced38ab409b9f98fcc735c1aab2459f0066a7c301c41c0d |
|
MD5 | 3e9a893034188a8d712d3438c9eb13ce |
|
BLAKE2b-256 | ac53d251d96b808b7eeb258b6d66456f67dc8cc748a0ed41297589f45c1608d3 |