Skip to main content

A Spark helper class to create and configure optimized Spark sessions with Delta Lake support

Project description

Spark Helper

A Python utility library for creating and configuring optimized Apache Spark sessions with Delta Lake support.

Features

  • Optimized Spark Configuration: Pre-configured settings for optimal performance
  • Delta Lake Support: Built-in support for Delta Lake operations
  • Memory Management: Intelligent memory allocation for driver and executors
  • Customizable: Extensive configuration options via parameters or environment variables
  • Timezone Support: Configure session timezone
  • Resource Monitoring: Built-in configuration verification

Installation

pip install kalyfo-spark-helper

Quick Start

from kalyfo_spark_helper import SparkHelper

# Create a Spark helper instance
spark_helper = SparkHelper(
    available_memory_gb=64,
    driver_memory_gb=40,
    logical_cores=16
)

# Create a Spark session
spark = spark_helper.create_spark_session("My Spark App")

try:
    # Your Spark code here
    df = spark.read.parquet("data.parquet")
    df.show()
finally:
    # Always stop the session when done
    spark_helper.stop_spark_session()

Configuration

Basic Parameters

  • available_memory_gb: Total memory available for Spark (required)
  • driver_memory_gb: Memory allocated to the driver (default: 40GB)
  • logical_cores: Number of logical CPU cores to use (default: all available cores)
  • logical_cores_per_executor: Logical cores per executor (default: 5, 2-5 recommended)
  • enable_delta_lake: Enable Delta Lake support (default: True)

Environment Variables

You can also configure Spark Helper using environment variables:

export AVAILABLE_MEMORY_GB=64
export AVAILABLE_LOGICAL_CORES=16
export SPARK_MEMORY_FRACTION=0.8
export SPARK_MEMORY_STORAGE_FRACTION=0.6
export TIMEZONE="Europe/Athens"

or by creating a .env file:

AVAILABLE_MEMORY_GB=64
AVAILABLE_LOGICAL_CORES=16
SPARK_MEMORY_FRACTION=0.8
SPARK_MEMORY_STORAGE_FRACTION=0.6
TIMEZONE="Europe/Athens"

Advanced Usage

Custom Memory Settings

from kalyfo_spark_helper import SparkHelper

spark_helper = SparkHelper(
    available_memory_gb=128,
    driver_memory_gb=40,
    logical_cores=32,
    spark_memory_fraction=0.8,
    spark_memory_storage_fraction=0.6,
    logical_cores_per_executor=5
)

Requirements

  • Python >= 3.9
  • PySpark == 4.0.1
  • delta-spark == 4.0.0
  • findspark == 2.0.1
  • python-dotenv == 1.1.1

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Support

For issues and questions, please open an issue on the GitHub repository.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kalyfo_spark_helper-0.0.2.tar.gz (3.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

kalyfo_spark_helper-0.0.2-py3-none-any.whl (3.4 kB view details)

Uploaded Python 3

File details

Details for the file kalyfo_spark_helper-0.0.2.tar.gz.

File metadata

  • Download URL: kalyfo_spark_helper-0.0.2.tar.gz
  • Upload date:
  • Size: 3.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for kalyfo_spark_helper-0.0.2.tar.gz
Algorithm Hash digest
SHA256 56e128ba1cf07b9c1d04d2d1318255849295d5706945d7961fbb6d799f56ab0d
MD5 6c14454a81b607316b7b5f51d023d8c8
BLAKE2b-256 706439d9533eac94fd9389c77a0d63baac9228fe6f6a4a8d03e2b429c698cac3

See more details on using hashes here.

File details

Details for the file kalyfo_spark_helper-0.0.2-py3-none-any.whl.

File metadata

File hashes

Hashes for kalyfo_spark_helper-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 29ea71c6a1e9fd686963841cf05b3354ef002aecda241971e365155943062753
MD5 ae17e53549bb42916d086525600139a0
BLAKE2b-256 245c51d2e75cf173b925d06e044dd51bf45d6be11874a8aaa9ad2e2fd0ca62e8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page