Launches an AWS Elastic MapReduce cluster using templated configuration files written in JSON. Meant to make deployments consistent and reproducable.

Project description

# EMR Launcher

Launches EMR clusters using config files for consistent run-time behavior when setting up a cluster.

## Installing

pip install emr_launcher

## Usage

Starting a new cluster:
emr_launcher launch /path/to/config/<my_config>.json

Adding steps to an existing cluster
emr_launcher launch /path/to/config/<my_config>.json --job-id <job_id_of_existing_cluster>

## Creating configs

the json file maps directly to boto3's `run_job_flow` function found [here](, you can use the documentation as a guide to build your config or build off the [Example Config](

## Template functions

emr_launcher uses templating within the json configuration to call useful functions, for example having an anonymous output location:

"--conf", "spark.output=s3://mybucket/output/{{ emr_launcher.uuid() }}/

a full set of usable template functions can be found by running:

emr_launcher list-template-functions

Return the environment variables dictionary,
Example: {{ get_environ()['USER'] }}
A parent python program can use "os.environ[key] = value" before calling the emr launcher.

Returns a formatted datetime string,
relative to the current time,
as ajusted by the timedelta arguments.
{{ emr_launcher.get_relative_date(format='%Y-%m-01 00:00:00', timedelta_args=dict(days=-2)) }}

converts a given milliseconds since epoch into an iso date string
ms_epoch - int
string - formatted date string

returns a UUID4 hex string

## Plugins

Plugins are discovered by the naming convention `emr_launcher_<plugin-name>` (ex: `emr_launcher_consul`). To install a plugin simply run:
pip install emr_launcher_<plugin-name>

Available plugins:




