Command Line Tool created with the purpose of ease the use of Apache Spark, allowing users to deploy/delete a Spark Cluster and submit batch jobs easily through different options.
Project description
EasySpark
EasySpark aims to make easier for users the execution of batch jobs with Apache Spark framework. It provides a user-friendly command line tool to create, setup and manage on-premise Spark clusters and ease the execution of batch jobs against those deployed clusters.
Its main objective is to facilitate the whole process of running batch jobs in Spark, providing subcommands to deploy/delete the necessary infrastructure and submit the desired jobs to that infraestructure.
The EasySpark project supports multiple operating systems (Windows, MacOS and Linux) and is open source licensed under the GNU General Public License v3.
Features
Easyspark CLI receives parameters via .INI configuration files, and offers the following functionalities:
- Setup and manage on-premise Spark clusters.
- Easy batch job submission to Spark clusters.
In order to facilitate the use of the mentioned features, the CLI tool has subcommands that allow validating and creating .INI configuration files with the accepted parameters.
Software Requirements
It is mandatory to have Apache Spark downloaded on the workstation and the SPARK_HOME environment variable correctly configured.
Additionally, depending on the requested cluster manager for the Spark cluster, it will be necessary:
- Standalone Spark cluster:
- Minikube
- Docker
- Kubernetes Spark cluster:
- VirtualBox
Installation
Run the following to install:
pip install easySparkTool
Installation from source
Clone EasySpark's repository from GitHub. Next open a terminal, cd
into that path and run:
pip install -e .
Quick Start
Usage:
easysparkcli COMMAND [ARGS]
Once installed, you can check the different options of the tool through the command:
easysparkcli --help
EasySpark counts with 5 subcommands:
- template
- Creates, as a template, a .INI configuration file with all the available options for the tool.
- validate
- Validates the provided .INI configuration file, checking if it meet the requirements.
- clusterinit
- Create and setup on-premise Spark Cluster using Kubernetes or Spark Standalone as cluster manager to be able to execute batch jobs on this infraestructure.
- clusterdelete
- Deletes Spark clusters previously deployed with EasySpark and its associated files.
- submit
- Allows users to submit batch jobs to the Spark cluster in a more user-friendly manner
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for easysparkcli-0.0.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8fc7e8316b7027c0497d465c407c78872ebd3c812191f6d74f81c37b1c9d4203 |
|
MD5 | 53133fb5d5b487cc0de2cfc00eb9db03 |
|
BLAKE2b-256 | e13c6c1281b735d755ea054a3e20549deb277042c14ac921d5c70f947c73a952 |