Skip to main content

A Spark helper class to create and configure optimized Spark sessions with Delta Lake support

Project description

Spark Helper

A Python utility library for creating and configuring optimized Apache Spark sessions with Delta Lake support.

Features

  • Optimized Spark Configuration: Pre-configured settings for optimal performance
  • Delta Lake Support: Built-in support for Delta Lake operations
  • Memory Management: Intelligent memory allocation for driver and executors
  • Customizable: Extensive configuration options via parameters or environment variables
  • Timezone Support: Configure session timezone
  • Resource Monitoring: Built-in configuration verification

Installation

pip install kalyfo-spark-helper

Quick Start

from kalyfo_spark_helper import SparkHelper

# Create a Spark helper instance
spark_helper = SparkHelper(
    available_memory_gb=64,
    driver_memory_gb=40,
    cores=16
)

# Create a Spark session
spark = spark_helper.create_spark_session("My Spark App")

try:
    # Your Spark code here
    df = spark.read.parquet("data.parquet")
    df.show()
finally:
    # Always stop the session when done
    spark_helper.stop_spark_session()

Configuration

Basic Parameters

  • available_memory_gb: Total memory available for Spark (required)
  • driver_memory_gb: Memory allocated to the driver (default: 40GB)
  • logical_cores: Number of logical CPU cores to use (default: all available cores)
  • logical_cores_per_executor: Logical cores per executor (default: 5, 2-5 recommended)
  • enable_delta_lake: Enable Delta Lake support (default: True)

Environment Variables

You can also configure Spark Helper using environment variables:

export AVAILABLE_MEMORY_GB=64
export AVAILABLE_LOGICAL_CORES=16
export SPARK_MEMORY_FRACTION=0.8
export SPARK_MEMORY_STORAGE_FRACTION=0.6
export TIMEZONE="Europe/Athens"

or by creating a .env file:

AVAILABLE_MEMORY_GB=64
AVAILABLE_LOGICAL_CORES=16
SPARK_MEMORY_FRACTION=0.8
SPARK_MEMORY_STORAGE_FRACTION=0.6
TIMEZONE="Europe/Athens"

Advanced Usage

Custom Memory Settings

from kalyfo_spark_helper import SparkHelper

spark_helper = SparkHelper(
    available_memory_gb=128,
    driver_memory_gb=40,
    logical_cores=32,
    spark_memory_fraction=0.8,
    spark_memory_storage_fraction=0.6,
    logical_cores_per_executor=5
)

Requirements

  • Python >= 3.9
  • PySpark == 4.0.1
  • delta-spark == 4.0.0
  • findspark == 2.0.1
  • python-dotenv == 1.1.1

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Support

For issues and questions, please open an issue on the GitHub repository.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kalyfo_spark_helper-0.0.1.tar.gz (3.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

kalyfo_spark_helper-0.0.1-py3-none-any.whl (3.4 kB view details)

Uploaded Python 3

File details

Details for the file kalyfo_spark_helper-0.0.1.tar.gz.

File metadata

  • Download URL: kalyfo_spark_helper-0.0.1.tar.gz
  • Upload date:
  • Size: 3.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for kalyfo_spark_helper-0.0.1.tar.gz
Algorithm Hash digest
SHA256 c3b40b568d1be4f82ca5a54c5bd9d1b63ed400c98a8b347f02b5b1da413a4776
MD5 812fd616b3483933b17d9bbe19e48c01
BLAKE2b-256 c1e88c28522bf1312112aedecedfb98434cec0e5e6d8bc02c32629039195d226

See more details on using hashes here.

File details

Details for the file kalyfo_spark_helper-0.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for kalyfo_spark_helper-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 a1e1f5da9a65725cbfbc9d7b036e1cd6ff2b4d16c53934075d28bce8aa6f1f81
MD5 b0e0325357da8ebb926d09c2ab61f08b
BLAKE2b-256 43767e3654f287add2bb2fceff4b21284899dff24713a12b74312204412d76d0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page