Skip to main content

Command Line Tool created with the purpose of ease the use of Apache Spark, allowing users to deploy/delete a Spark Cluster and submit batch jobs easily through different options.

Project description

EasySpark

EasySpark aims to make easier for users the execution of batch jobs with Apache Spark framework. It provides a user-friendly command line tool to create, setup and manage on-premise Spark clusters and ease the execution of batch jobs against those deployed clusters.

Its main objective is to facilitate the whole process of running batch jobs in Spark, providing subcommands to deploy/delete the necessary infrastructure and submit the desired jobs to that infraestructure.

The EasySpark project supports multiple operating systems (Windows, MacOS and Linux) and is open source licensed under the GNU General Public License v3.

Features

Easyspark CLI receives parameters via .INI configuration files, and offers the following functionalities:

  • Setup and manage on-premise Spark clusters.
  • Easy batch job submission to Spark clusters.

In order to facilitate the use of the mentioned features, the CLI tool has subcommands that allow validating and creating .INI configuration files with the accepted parameters.

Software Requirements

It is mandatory to have Apache Spark downloaded on the workstation and the SPARK_HOME environment variable correctly configured.

Additionally, depending on the requested cluster manager for the Spark cluster, it will be necessary:

  • Standalone Spark cluster:
    • Minikube
    • Docker
  • Kubernetes Spark cluster:
    • VirtualBox
    • Vagrant

Installation

Run the following to install:

pip install easysparkcli

Installation from source

Clone EasySpark's repository from GitHub. Next open a terminal, cd into that path and run:

pip install -e .

Quick Start

Usage: easysparkcli COMMAND [ARGS]

Once installed, you can check the different options of the tool through the command:

easysparkcli --help

EasySpark counts with 5 subcommands:

  1. template
    • Creates, as a template, a .INI configuration file with all the available options for the tool.
  2. validate
    • Validates the provided .INI configuration file, checking if it meet the requirements.
  3. clusterinit
    • Create and setup on-premise Spark Cluster using Kubernetes or Spark Standalone as cluster manager to be able to execute batch jobs on this infraestructure.
  4. clusterdelete
    • Deletes Spark clusters previously deployed with EasySpark and its associated files.
  5. submit
    • Allows users to submit batch jobs to the Spark cluster in a more user-friendly manner

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

easysparkcli-1.0.0.tar.gz (35.7 kB view details)

Uploaded Source

Built Distribution

easysparkcli-1.0.0-py3-none-any.whl (39.8 kB view details)

Uploaded Python 3

File details

Details for the file easysparkcli-1.0.0.tar.gz.

File metadata

  • Download URL: easysparkcli-1.0.0.tar.gz
  • Upload date:
  • Size: 35.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/32.0 requests/2.25.1 requests-toolbelt/0.9.1 urllib3/1.26.6 tqdm/4.62.3 importlib-metadata/4.10.1 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.9.4

File hashes

Hashes for easysparkcli-1.0.0.tar.gz
Algorithm Hash digest
SHA256 317940ac919fbf395ef1506c94c4364832c993ed8203e8f9ef4f231e0306b58c
MD5 c4865c0dfc4ac8c4767b5f1cad681ffe
BLAKE2b-256 e2fdc019b633adbbaf361d1ce717e365ac1af31e29ee052ad7cada76135115c1

See more details on using hashes here.

File details

Details for the file easysparkcli-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: easysparkcli-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 39.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/32.0 requests/2.25.1 requests-toolbelt/0.9.1 urllib3/1.26.6 tqdm/4.62.3 importlib-metadata/4.10.1 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.9.4

File hashes

Hashes for easysparkcli-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 228599884f31d89984ca772865cbde0a05e9b6517d35d057d5311364b7e72312
MD5 bb3644ef496b194ca76447183b9fc2f2
BLAKE2b-256 e21ebd5d957d50e9c1533c337512eb2d71af9689b5b25484302f66dd45daad5d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page