Skip to main content

This project provides some utilities function and CLI commands to run Spark on K8s.

Project description

spark8t toolkit

PyPI - Version Tests

A set of Python scripts facilitating Spark interactions over Kubernetes, using an OCI image.

Description

The main purpose of the spark8t toolkit is to provide a seamless, user-friendly interface to Spark functionalities over Kubernetes. As much for administrator tasks (such as account registration) or data scientist functions (such as job submission or Spark interactive shell access). Various wrapper scripts allow for persistent (and user-friendly) configuration and execution of related tools.

Dependencies and Requirements

  • Kubernetes
  • Apache Spark

Installation

Below we describe the essential steps on how to set up a Spark cluster together with the spark8t tool.

(However note that most of the "hassle" described below can be saved, in case you choose to use the canonical/spark-client-snap Snap installation, that would both install dependencies, both prepare critical parts of the environment for you.)

Kubernetes

In order to be able to run Spark on Kubernetes, you'll sure need to have a Kubernetes cluster installed :-)

A simple installation of a lightweight Kubernetes implementation (Canonical's microk8s) can be found in our Discourse Spark Tutorial

Keep in mind to set the following environment variable:

  • KUBECONFIG: the location of the Kubernetes cluster configuration (typically: /home/$USER/.kube/config)

Spark

You will need to install Spark as instructed at the official Apache Spark pages.

Related settings:

  • SPARK_HOME: location of your Spark installation

spark8t

You could install the contents of this repository either by direct checkout, or using pip such as

pip install git+https://github.com/canonical/spark-k8s-toolkit-py.git

You'll need to add a mandatory configuration for the tool, which points to the OCI image to be used for the Spark workers. The configuration file must be called spark-defaults.conf, and could have a list of contents according to possible Spark-accepted command-line parameters. However the following specific one has to be defined:

spark.kubernetes.container.image=ghcr.io/canonical/charmed-spark:<version>

(See the Spark rock releases GitHub page for available versions)

Then you would need to assign the correct values for the following spark8t environment variables:

  • SPARK_CONFS: location of the spark8t configuration file
  • HOME: the home of the Spark user (typically: /home/spark)
  • SPARK_USER_DATA: the location of Spark user data, such as interactive shell history (typically: same as HOME)

Basic Usage

spark8t is "built around" Spark itself, thus the usage is very similar to the known Spark client tools.

The toolkit offers access to Spark functionalities via two interfaces:

  • interactive CLI
  • programmatic access via the underlying Python library

We provide the following functionalities (see related documentation on Discourse):

Contributing

Canonical welcomes contributions to the spark8t toolkit. Please check out our guidelines if you're interested in contributing to the solution. Also, if you truly enjoy working on open-source projects like this one and you would like to be part of the OSS revolution, please don't forget to check out the open positions we have at Canonical.

License

The spark8t toolkit is free software, distributed under the Apache Software License, version 2.0. See LICENSE for more information.

See LICENSE for more information.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spark8t-1.1.0.tar.gz (30.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

spark8t-1.1.0-py3-none-any.whl (41.4 kB view details)

Uploaded Python 3

File details

Details for the file spark8t-1.1.0.tar.gz.

File metadata

  • Download URL: spark8t-1.1.0.tar.gz
  • Upload date:
  • Size: 30.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for spark8t-1.1.0.tar.gz
Algorithm Hash digest
SHA256 2f61232f3c4b328b05affab5872fedc28ae5b7a0fe225aadab3cec24f590fc42
MD5 4f8b45fb39b09db4dd79cdd4c51d8a29
BLAKE2b-256 761b8ec91b0a182fa203a666b626fe2c17aa396bae6aab4d2ccb333a54690643

See more details on using hashes here.

Provenance

The following attestation bundles were made for spark8t-1.1.0.tar.gz:

Publisher: release.yaml on canonical/spark-k8s-toolkit-py

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file spark8t-1.1.0-py3-none-any.whl.

File metadata

  • Download URL: spark8t-1.1.0-py3-none-any.whl
  • Upload date:
  • Size: 41.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for spark8t-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 99dcfa46afed9cccf7c7a3c07fc43f805ff4c9fd41b48de79950e3565534ee3f
MD5 c3689e90f83b6d5cf981f04f74cc41ed
BLAKE2b-256 2d873b6163521201ff66f3aa8faafceab2d3aaad9554f3a7b51db8aad553b0fd

See more details on using hashes here.

Provenance

The following attestation bundles were made for spark8t-1.1.0-py3-none-any.whl:

Publisher: release.yaml on canonical/spark-k8s-toolkit-py

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page