Optimize AWS EMR spark settings (spark-config-cheatsheet)
Project description
Spark-optimizer
Optimize spark settings (for cluster aka yarn run)
Original source: http://c2fo.io/c2fo/spark/aws/emr/2016/07/06/apache-spark-config-cheatsheet/
Usage
Install:
$ virtualenv env
$ env/bin/pip install spark-optimizer
Dev install:
$ virtualenv env
$ env/bin/pip install -e .
Generate settings for c4.4xlarge
with 4
nodes:
$ env/bin/spark-optimizer c4.4xlarge 4
Optimal numPartitions: 162
{'spark.default.parallelism': '108',
'spark.driver.cores': '2',
'spark.driver.maxResultSize': '3481m',
'spark.driver.memory': '3481m',
'spark.driver.memoryOverhead': '614m',
'spark.executor.cores': '2',
'spark.executor.instances': '27',
'spark.executor.memory': '3481m',
'spark.executor.memoryOverhead': '614m'}
Update instance info:
$ env/bin/python spark_optimizer/emr_update.py
CHANGES
0.1.8 (2020-02-14)
- add calculation for
numPartitions
- dropping pypy and python3.4
0.1.7 (2019-12-09)
- set
long_description_content_type="text/markdown"
0.1.6 (2019-12-09)
- fix docs
0.1.5 (2019-12-09)
- update
emr_instance.yaml
0.1.4 (2019-03-10)
- add ec2 and emr cost to yaml
0.1.3 (2019-03-08)
- add emr cost to yaml
- export load yaml file
- make
memory_overhead_coefficient
editable
0.1.2 (2019-02-20)
- unpin the versions
- rename cli from _ to -
0.1.1 (2018-09-12)
- fix email
0.1.0 (2018-09-12)
- initial release
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
spark_optimizer-0.1.8.tar.gz
(17.8 kB
view details)
File details
Details for the file spark_optimizer-0.1.8.tar.gz
.
File metadata
- Download URL: spark_optimizer-0.1.8.tar.gz
- Upload date:
- Size: 17.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.20.1 setuptools/41.6.0 requests-toolbelt/0.9.1 tqdm/4.19.1 CPython/3.6.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 15c1580b2b6d388ad2871ad7e63f36c0f75c6a11a0cd757f455cb3a7209cfe1a |
|
MD5 | baf0f94cfdc81375d91296b52113b672 |
|
BLAKE2b-256 | 521167b11e28b2276cc6a195a3ef62a02d3fa5ee1519c671a32a24a9bee1d985 |