Skip to main content

Orchestrates Spark standalone clusters on HPCs.

Project description

sparkctl

This package implements configuration and orchestration of Spark clusters with standalone cluster managers. This is useful in environments like HPCs where the infrastructure implemented by cloud providers, such as AWS, is not available. It is particularly helpful when users want to deploy Spark but do not have administrative control of the servers.

Example usage

There are two main ways to use this package:

First, allocate compute nodes. For example, with Slurm (1 compute node for the Spark master and 4 compute nodes for Spark workers):

$ salloc -t 01:00:00 -n4 --partition=shared --mem=30G : -N4 --account=<your-account> --mem=240G
  1. Configure a Spark cluster and run Spark jobs with spark-submit or pyspark.
$ sparkctl configure
$ sparkctl start
$ spark-submit --master spark://$(hostname):7077 my-job.py
$ sparkctl stop
  1. Run Spark jobs in a Python script using the sparkctl library to manage the cluster.
from sparkctl import ClusterManager, make_default_spark_config

config = make_default_spark_config()
mgr = ClusterManager(config)
with mgr.managed_cluster() as spark:
    df = spark.createDataFrame([(x, x + 1) for x in range(1000)], ["a", "b"])
    df.show()

Refer to the user documentation for a description of features and detailed usage instructions.

Project Status

The package is actively maintained and used at the National Renewable Energy Laboratory (NREL). The software is primarily geared toward HPCs that use Slurm. It also supports a generic list of servers as long as the servers have access to a shared filesystem and are accessible via SSH without password login.

It would be straightforward to extend the functionality to support other HPC resource managers. Please submit an issue or idea or discussion if you have interest in this package but need that support.

Contributions are welcome.

License

sparkctl is released under a BSD 3-Clause license.

Software Record

This package is developed under NREL Software Record SWR-25-109.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sparkctl-0.3.1.tar.gz (29.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sparkctl-0.3.1-py3-none-any.whl (32.6 kB view details)

Uploaded Python 3

File details

Details for the file sparkctl-0.3.1.tar.gz.

File metadata

  • Download URL: sparkctl-0.3.1.tar.gz
  • Upload date:
  • Size: 29.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sparkctl-0.3.1.tar.gz
Algorithm Hash digest
SHA256 486e865f4048a164b5da230141c4c689f3e2c505bcbf2e548222668cbf49d5ec
MD5 4e08e106b67b6cd6e2d1ec8c6e9e6fa7
BLAKE2b-256 83d29e3f38434cd7a8e63bfa1c16dad4eeb3c1c0b514632730f0bea26f82bcd2

See more details on using hashes here.

Provenance

The following attestation bundles were made for sparkctl-0.3.1.tar.gz:

Publisher: publish_to_pypi.yml on NREL/sparkctl

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sparkctl-0.3.1-py3-none-any.whl.

File metadata

  • Download URL: sparkctl-0.3.1-py3-none-any.whl
  • Upload date:
  • Size: 32.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sparkctl-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 fef16f8b5d71e7cf82b6e01c3036f50c53778e05c4b15be9d39cc921044ebf6d
MD5 ea71c7ac912e73fa886cef86f6927d1e
BLAKE2b-256 443c8d1645cd58806f7d98c07e9845951a4e54a2bbd26ebd1cc232621bc5f9f2

See more details on using hashes here.

Provenance

The following attestation bundles were made for sparkctl-0.3.1-py3-none-any.whl:

Publisher: publish_to_pypi.yml on NREL/sparkctl

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page