Skip to main content

A Spark helper class to create and configure optimized Spark sessions with Delta Lake support

Project description

Spark Helper

A Python utility library for creating and configuring optimized Apache Spark sessions with Delta Lake support.

Features

  • Optimized Spark Configuration: Pre-configured settings for optimal performance
  • Delta Lake Support: Built-in support for Delta Lake operations
  • Memory Management: Intelligent memory allocation for driver and executors
  • Customizable: Extensive configuration options via parameters or environment variables
  • Timezone Support: Configure session timezone
  • Resource Monitoring: Built-in configuration verification

Installation

pip install kalyfo-spark-helper

Quick Start

from kalyfo_spark_helper import SparkHelper

# Create a Spark helper instance
spark_helper = SparkHelper(
    available_memory_gb=64,
    driver_memory_gb=40,
    logical_cores=16
)

# Create a Spark session
spark = spark_helper.create_spark_session("My Spark App")

try:
    # Your Spark code here
    df = spark.read.parquet("data.parquet")
    df.show()
finally:
    # Always stop the session when done
    spark_helper.stop_spark_session()

Configuration

Basic Parameters

  • available_memory_gb: Total memory available for Spark (required)
  • driver_memory_gb: Memory allocated to the driver (default: 40GB)
  • logical_cores: Number of logical CPU cores to use (default: all available cores)
  • logical_cores_per_executor: Logical cores per executor (default: 5, 2-5 recommended)
  • enable_delta_lake: Enable Delta Lake support (default: True)

Environment Variables

You can also configure Spark Helper using environment variables:

export AVAILABLE_MEMORY_GB=64
export AVAILABLE_LOGICAL_CORES=16
export SPARK_MEMORY_FRACTION=0.8
export SPARK_MEMORY_STORAGE_FRACTION=0.6
export TIMEZONE="Europe/Athens"

or by creating a .env file:

AVAILABLE_MEMORY_GB=64
AVAILABLE_LOGICAL_CORES=16
SPARK_MEMORY_FRACTION=0.8
SPARK_MEMORY_STORAGE_FRACTION=0.6
TIMEZONE="Europe/Athens"

Advanced Usage

Custom Memory Settings

from kalyfo_spark_helper import SparkHelper

spark_helper = SparkHelper(
    available_memory_gb=128,
    driver_memory_gb=40,
    logical_cores=32,
    spark_memory_fraction=0.8,
    spark_memory_storage_fraction=0.6,
    logical_cores_per_executor=5
)

Requirements

  • Python >= 3.9
  • PySpark == 4.0.1
  • delta-spark == 4.0.0
  • findspark == 2.0.1
  • python-dotenv == 1.1.1

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Support

For issues and questions, please open an issue on the GitHub repository.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kalyfo_spark_helper-0.0.3.tar.gz (7.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

kalyfo_spark_helper-0.0.3-py3-none-any.whl (7.6 kB view details)

Uploaded Python 3

File details

Details for the file kalyfo_spark_helper-0.0.3.tar.gz.

File metadata

  • Download URL: kalyfo_spark_helper-0.0.3.tar.gz
  • Upload date:
  • Size: 7.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for kalyfo_spark_helper-0.0.3.tar.gz
Algorithm Hash digest
SHA256 0226adff64794acfc3c5f6f63aba214511e56c8a0342d99307e5c2031bfad70a
MD5 ab4c4a51927af87bbce3a4fa50adc641
BLAKE2b-256 4204f6afa662c907d66cfc9570bdef2b97fb577f2b74d69ea1df02806e876934

See more details on using hashes here.

File details

Details for the file kalyfo_spark_helper-0.0.3-py3-none-any.whl.

File metadata

File hashes

Hashes for kalyfo_spark_helper-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 085facaa7d7900783bc479c098b5633e8ecad027424c44ec94f6e7d4773d11b1
MD5 74ef95a474c2cd332e50fe6a43ddb20b
BLAKE2b-256 a931f0c133b942b3c951a6ea79a3a85cb5ab8de65ea2cf829bca6980fb15c3b3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page