Skip to main content

Command Line Tool created with the purpose of ease the use of Apache Spark, allowing users to deploy/delete a Spark Cluster and submit batch jobs easily through different options.

Project description

EasySpark

EasySpark aims to make easier for users the execution of batch jobs with Apache Spark framework. It provides a user-friendly command line tool to create, setup and manage on-premise Spark clusters and ease the execution of batch jobs against those deployed clusters.

Its main objective is to facilitate the whole process of running batch jobs in Spark, providing subcommands to deploy/delete the necessary infrastructure and submit the desired jobs to that infraestructure.

The EasySpark project supports multiple operating systems (Windows, MacOS and Linux) and is open source licensed under the GNU General Public License v3.

Features

Easyspark CLI receives parameters via .INI configuration files, and offers the following functionalities:

  • Setup and manage on-premise Spark clusters.
  • Easy batch job submission to Spark clusters.

In order to facilitate the use of the mentioned features, the CLI tool has subcommands that allow validating and creating .INI configuration files with the accepted parameters.

Software Requirements

It is mandatory to have Apache Spark downloaded on the workstation and the SPARK_HOME environment variable correctly configured.

Additionally, depending on the requested cluster manager for the Spark cluster, it will be necessary:

  • Standalone Spark cluster:
    • Minikube
    • Docker
  • Kubernetes Spark cluster:
    • VirtualBox
    • Vagrant

Installation

Run the following to install:

pip install easysparkcli

Installation from source

Clone EasySpark's repository from GitHub. Next open a terminal, cd into that path and run:

pip install -e .

Quick Start

Usage: easysparkcli COMMAND [ARGS]

Once installed, you can check the different options of the tool through the command:

easysparkcli --help

EasySpark counts with 5 subcommands:

  1. template
    • Creates, as a template, a .INI configuration file with all the available options for the tool.
  2. validate
    • Validates the provided .INI configuration file, checking if it meet the requirements.
  3. clusterinit
    • Create and setup on-premise Spark Cluster using Kubernetes or Spark Standalone as cluster manager to be able to execute batch jobs on this infraestructure.
  4. clusterdelete
    • Deletes Spark clusters previously deployed with EasySpark and its associated files.
  5. submit
    • Allows users to submit batch jobs to the Spark cluster in a more user-friendly manner

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

easysparkcli-1.0.0.tar.gz (35.7 kB view hashes)

Uploaded Source

Built Distribution

easysparkcli-1.0.0-py3-none-any.whl (39.8 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page