Skip to main content

Optimize AWS EMR spark settings (spark-config-cheatsheet)

Project description

# Spark-optimizer

[![Build Status](https://api.travis-ci.org/delijati/spark-optimizer.svg?branch=master)](https://travis-ci.org/delijati/spark-optimizer)

Optimize spark settings (for cluster aka yarn run)

Original source: http://c2fo.io/c2fo/spark/aws/emr/2016/07/06/apache-spark-config-cheatsheet/

## Usage

Install:

$ virtualenv env
$ env/bin/pip install spark-optimizer

Dev install:

$ virtualenv env
$ env/bin/pip install -e .


Generate settings for `c4.4xlarge` with `4` nodes:

$ env/bin/spark-optimizer c4.4xlarge 4
{'spark.default.parallelism': '108',
'spark.driver.cores': '2',
'spark.driver.maxResultSize': '3481m',
'spark.driver.memory': '3481m',
'spark.driver.memoryOverhead': '614m',
'spark.executor.cores': '2',
'spark.executor.instances': '27',
'spark.executor.memory': '3481m',
'spark.executor.memoryOverhead': '614m'}

Update instance info:

$ env/bin/python spark_optimizer/emr_update.py


# CHANGES

0.1.1 (2018-09-12)
------------------

- fix email


0.1.0 (2018-09-12)

- initial release

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spark_optimizer-0.1.1.tar.gz (5.7 kB view details)

Uploaded Source

File details

Details for the file spark_optimizer-0.1.1.tar.gz.

File metadata

  • Download URL: spark_optimizer-0.1.1.tar.gz
  • Upload date:
  • Size: 5.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/40.0.0 requests-toolbelt/0.8.0 tqdm/4.25.0 CPython/3.6.5

File hashes

Hashes for spark_optimizer-0.1.1.tar.gz
Algorithm Hash digest
SHA256 543d90082a72f0a859dac735a633a652c270b137c8a80bd0eaf40ede24878dec
MD5 1655012e509f8577a9c75f43d7a11ba7
BLAKE2b-256 518812d5651c9d6475351488298df01a5630908fab68bba58d4ecd48935adaf2

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page