Create enterprise grade Hadoop cluster in AWS in minutes.
Project description
Create Enterprise grade Hadoop cluster in AWS.
===============================
author: Rakesh Varma
Overview
--------
Create enterprise grade hadoop cluster in AWS in minutes.
Installation / Usage
--------------------
Make sure [terraform](https://www.terraform.io/intro/getting-started/install.html) is installed. It is required to run this solution.
Make sure AWS credentials exists in your local `~/.aws/credentials` file.
If you are using an `AWS_PROFILE` called `test` then your `credentials` file should like looks this:
```sh
[test]
aws_access_key_id = SOMEAWSACCESSKEYID
aws_secret_access_key = SOMEAWSSECRETACCESSKEY
```
Create a `config.ini` with the appropriate settings.
```sh
[default]
# AWS settings
aws_region = us-east-1
aws_profile = test
terraform_s3_bucket = hadoop-terraform-state
ssh_private_key = key.pem
vpc_id = vpc-883883883
vpc_subnets = [
'subnet-89dad652',
'subnet-7887z892',
'subnet-f300b8z8'
]
hadoop_namenode_instance_type = t2.micro
hadoop_secondarynamenode_instance_type = t2.micro
hadoop_datanodes_instance_type = t2.micro
hadoop_datanodes_count = 2
# Hadoop settings
hadoop_replication_factor = 2
```
Once `config.ini` file is ready then install the libs and run. It is recommended to use a virtualenv.
```
pip install aws-hadoop
```
Run this in python to create a hadoop cluster.
```
from aws_hadoop.install import Install
Install().create()
```
For running the source directly,
```sh
pip install -r requirements.txt
```
```sh
from aws_hadoop.install import Install
Install().create()
```
### Configuration Settings
This section describes each of the settings that go into the config file. Note some of the settings are optional.
###### aws_region
The aws_region where your terraform state bucket and your hadoop resources get created (eg: us-east-1)
##### aws_profile
The aws_profile that is used in your local `~/.aws/credentials` file.
##### terraform_s3_bucket
The terraform state information will be maintained in the specified s3 bucket. Make sure the aws_profile has write access to the s3 bucket.
##### ssh_key_pair
For hadoop provisioning, aws_hadoop needs to connect to hadoop nodes using SSH. The specified `ssh_key_pair` will allow the hadoop ec2's to be created with the public key.
If So make sure your machine has the private key in your `~/.ssh/` directory.
##### vpc_id
Specifiy the vpc id your AWS region in which the terraform resources should be created.
##### vpc_subnets
vpc_subnets is a list item that contains one or more subnet_id's. You can specify as many subnet id's as you want. Hadoop EC2 will get created in multiple subnets.
##### hadoop_namenode_instance_type (optional)
Specify the instance type of hadoop namenode. It not specified then the default instance type is `t2.micro`
##### hadoop_secondarynamenode_instance_type (optional)
Specify the instance type of hadoop secondarynamenode. It not specified then the default instance type is t2.micro
##### hadoop_datanodes_instance_type (optional)
Specify the instance type of hadoop datanodes. It not specified then the default instance type is t2.micro
##### hadoop_datanodes_count (optional)
Specify the number of hadoop data nodes that should be created. It not specified then the default value is set to 2
##### hadoop_replication_factor (optional)
Specify the replication factor of hadoop. It not specified then the default value is set to 2.
The following are ssh settings, used to ssh into the nodes.
##### ssh_user (optional)
The ssh user, eg: ubuntu
##### ssh_use_ssh_config (optional)
Set it to True if you want to use your settings in your `~/.ssh/config`
##### ssh_key_file (optional)
This is the key file location. SSH login is done thru a private/public key pair.
##### ssh_proxy (optional)
Use this setting if you are using a proxy ssh server (such as bastion).
Logging
------
A log file `hadoop-cluster.log` is created in the local directory.
===============================
author: Rakesh Varma
Overview
--------
Create enterprise grade hadoop cluster in AWS in minutes.
Installation / Usage
--------------------
Make sure [terraform](https://www.terraform.io/intro/getting-started/install.html) is installed. It is required to run this solution.
Make sure AWS credentials exists in your local `~/.aws/credentials` file.
If you are using an `AWS_PROFILE` called `test` then your `credentials` file should like looks this:
```sh
[test]
aws_access_key_id = SOMEAWSACCESSKEYID
aws_secret_access_key = SOMEAWSSECRETACCESSKEY
```
Create a `config.ini` with the appropriate settings.
```sh
[default]
# AWS settings
aws_region = us-east-1
aws_profile = test
terraform_s3_bucket = hadoop-terraform-state
ssh_private_key = key.pem
vpc_id = vpc-883883883
vpc_subnets = [
'subnet-89dad652',
'subnet-7887z892',
'subnet-f300b8z8'
]
hadoop_namenode_instance_type = t2.micro
hadoop_secondarynamenode_instance_type = t2.micro
hadoop_datanodes_instance_type = t2.micro
hadoop_datanodes_count = 2
# Hadoop settings
hadoop_replication_factor = 2
```
Once `config.ini` file is ready then install the libs and run. It is recommended to use a virtualenv.
```
pip install aws-hadoop
```
Run this in python to create a hadoop cluster.
```
from aws_hadoop.install import Install
Install().create()
```
For running the source directly,
```sh
pip install -r requirements.txt
```
```sh
from aws_hadoop.install import Install
Install().create()
```
### Configuration Settings
This section describes each of the settings that go into the config file. Note some of the settings are optional.
###### aws_region
The aws_region where your terraform state bucket and your hadoop resources get created (eg: us-east-1)
##### aws_profile
The aws_profile that is used in your local `~/.aws/credentials` file.
##### terraform_s3_bucket
The terraform state information will be maintained in the specified s3 bucket. Make sure the aws_profile has write access to the s3 bucket.
##### ssh_key_pair
For hadoop provisioning, aws_hadoop needs to connect to hadoop nodes using SSH. The specified `ssh_key_pair` will allow the hadoop ec2's to be created with the public key.
If So make sure your machine has the private key in your `~/.ssh/` directory.
##### vpc_id
Specifiy the vpc id your AWS region in which the terraform resources should be created.
##### vpc_subnets
vpc_subnets is a list item that contains one or more subnet_id's. You can specify as many subnet id's as you want. Hadoop EC2 will get created in multiple subnets.
##### hadoop_namenode_instance_type (optional)
Specify the instance type of hadoop namenode. It not specified then the default instance type is `t2.micro`
##### hadoop_secondarynamenode_instance_type (optional)
Specify the instance type of hadoop secondarynamenode. It not specified then the default instance type is t2.micro
##### hadoop_datanodes_instance_type (optional)
Specify the instance type of hadoop datanodes. It not specified then the default instance type is t2.micro
##### hadoop_datanodes_count (optional)
Specify the number of hadoop data nodes that should be created. It not specified then the default value is set to 2
##### hadoop_replication_factor (optional)
Specify the replication factor of hadoop. It not specified then the default value is set to 2.
The following are ssh settings, used to ssh into the nodes.
##### ssh_user (optional)
The ssh user, eg: ubuntu
##### ssh_use_ssh_config (optional)
Set it to True if you want to use your settings in your `~/.ssh/config`
##### ssh_key_file (optional)
This is the key file location. SSH login is done thru a private/public key pair.
##### ssh_proxy (optional)
Use this setting if you are using a proxy ssh server (such as bastion).
Logging
------
A log file `hadoop-cluster.log` is created in the local directory.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
aws-hadoop-1.0.dev2.tar.gz
(12.4 kB
view hashes)
Built Distribution
Close
Hashes for aws_hadoop-1.0.dev2-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 23dbed504b1c4bb63e0b130162c81460d01511398fcb1467b4f8954891f98ac4 |
|
MD5 | 6c91e175984ee2dcdbe5b4312ef92d86 |
|
BLAKE2b-256 | d38fa2d86271843bc72450d3e4d6133e165b116a30d1c2ccdfde1c0622d90766 |