Skip to main content

SAGA to launch an Hadoop cluster as a normal batch job on Torque/PBS/SLURM clusters

Project description

# SAGA Hadoop

Last Updated: 10/01/2016

# Overview:

Use [SAGA](http://saga-project.github.io/saga-python/) to spawn an Hadoop Cluster within an HPC batch job.

Currently supported SAGA adaptors:

  • Fork

  • Torque

Requirements:

  • PBS/Torque cluster

  • Working directory should be on a shared filesystem

By default SAGA-Hadoop deploys an Hadoop 2.2.0 YARN cluster. The cluster can be customized by adjusting the templates for the Hadoop configuration files in core-site.xml, hdfs-site.xml, mapred-site.xml and yarn-site.xml in the hadoop2/bootstrap_hadoop2.py.

# Usage

Try to run a local Hadoop (e.g. for development and testing)

easy_install saga-hadoop saga-hadoop –resource fork://localhost

Try to run a Hadoop cluster inside a PBS/Torque job:

saga-hadoop –resource pbs+ssh://india.futuregrid.org –number_cores 8

Some Blog Posts about SAGA-Hadoop:

# Packages:

see hadoop1 for setting up a Hadoop 1.x.x cluster

see hadoop2 for setting up a Hadoop 2.7.x cluster

see spark for setting up a Spark 2.0.x cluster

see kafka for setting up a Kafka 0.10.x cluster

# Examples:

*Stampede:*

saga-hadoop –resource=slurm://localhost –queue=normal –walltime=239 –number_cores=256 –project=xxx

*Gordon:*

saga-hadoop –resource=pbs://localhost –walltime=59 –number_cores=16 –project=TG-CCR140028 –framework=spark

*Wrangler*

export JAVA_HOME=/usr/java/jdk1.8.0_45/ saga-hadoop –resource=slurm://localhost –queue=normal –walltime=59 –number_cores=24 –project=xxx

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

SAGA-Hadoop-0.31.2.tar.gz (36.2 kB view details)

Uploaded Source

File details

Details for the file SAGA-Hadoop-0.31.2.tar.gz.

File metadata

File hashes

Hashes for SAGA-Hadoop-0.31.2.tar.gz
Algorithm Hash digest
SHA256 9a672846094152c3fd45b54959d1fd52afcf6a139da196cf32754aabbf4ee277
MD5 354cdfd5833e5410148dd3085b0bb500
BLAKE2b-256 04e413f73dd51ad57e7670f87a56fdabf4993295dd09cbb551f4b9e541b11690

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page