Create truly fresh local Spark sessions with isolated temp dirs and reliable teardown.
Project description
freshspark
Small helpers for local PySpark that start each run in a clean sandbox and tear sessions down reliably: isolated warehouse temp dirs, optional embedded Derby kept out of the working tree, in-memory catalog by default, randomized Spark UI port, and aggressive Py4J / JVM shutdown so the process can exit normally.
Use it when notebooks or scripts leave metastore locks, derby.log in the wrong place, Spark UI port collisions, or JVMs that refuse to die after SparkSession.stop().
Requirements
| Python | 3.9 or newer |
| JDK | On PATH (or via JAVA_HOME). Spark 3.x line: Java 8, 11, or 17; PySpark 3.5+ is also validated against Java 21. |
| PySpark | Declared dependency is PySpark 3.5.x (pyspark>=3.5,<4) for predictable local startup. Spark 4 is not pinned here; if you override to PySpark 4.x, use Java 17 or 21 and expect to manage compatibility yourself. |
Install
pip install freshspark
Development (editable install, tests, Ruff):
pip install -e ".[dev]"
ruff format --check freshspark tests
ruff check freshspark tests
mypy freshspark tests
pytest
Quick start
from freshspark import fresh_local_spark, get_fresh_local_spark
# Context manager: new session every `with` block, cleanup on exit
with fresh_local_spark(app_name="etl", preset="dev") as spark:
spark.range(10).show()
# Manual lifecycle: always call cleanup() when finished (or use try/finally)
spark, cleanup = get_fresh_local_spark(app_name="demo", preset="fat")
try:
spark.range(1000).summary().show()
finally:
cleanup()
Public API
| Symbol | Role |
|---|---|
fresh_local_spark(...) |
Context manager yielding a new SparkSession per with block (no reuse). |
get_fresh_local_spark(...) |
Returns (spark, cleanup). You must call cleanup() when done unless you use the context manager. |
reset_active_session() |
Stops the active session, closes the gateway, and clears in-process reuse cache entries that pointed at that session (or are already dead). Safe to call repeatedly. |
ensure_fresh(...) |
Decorator that runs the wrapped function inside fresh_local_spark; injects spark as a keyword argument. Do not pass spark= yourself (a TypeError is raised if you do). |
Configuration highlights
Presets
preset is one of tiny, dev, or fat. They set driver memory and maxResultSize to sensible defaults. Any other string logs a warning and applies no preset keys (you can still set everything via extra_confs).
| Preset | spark.driver.memory |
spark.driver.maxResultSize |
|---|---|---|
tiny |
1g |
512m |
dev |
2g |
1g |
fat |
4g |
2g |
Catalog and warehouse
- Default (
hive_metastore=False):spark.sql.catalogImplementation=in-memoryand an isolatedspark.sql.warehouse.dirunder a temp root—no embedded Derby in the default path, so you avoid the usual Derby lock files in the project directory. - Hive-style metastore (
hive_metastore=True): warehouse and Derby home (-Dderby.system.home=...) both live under the same isolated temp tree.
If you pass extra_confs with spark.driver.extraJavaOptions while hive_metastore=True, that value is merged after the required Derby system home flag so your JVM flags do not accidentally wipe metastore configuration.
Other knobs
enable_ui/print_ui_url: Spark UI on a free port (spark.ui.port=0); optionally print the URL once the session is up.extra_confs: flatdict[str, str]merged last so you can override presets or Spark defaults.reuse_within_process=True: same Python process + sameapp_namereturns the same(spark, cleanup)untilcleanup()orreset_active_session()runs; dead cached sessions are replaced automatically on the next request.
CLI
# Python REPL with `spark` already constructed
freshspark repl --preset fat
# Stop the active SparkSession in this interpreter (also reconciles reuse cache)
freshspark reset
| Command | Common flags |
|---|---|
freshspark repl |
--app-name, `--preset tiny |
freshspark reset |
(none) |
Jupyter and long-running kernels
Prefer an explicit cleanup cell so temp dirs and the JVM are released even if the kernel stays alive:
from freshspark import get_fresh_local_spark
spark, cleanup = get_fresh_local_spark(app_name="nb", preset="dev")
# ... work ...
cleanup()
If another library left a sticky session in this kernel, call reset_active_session() here. The freshspark reset CLI only affects the interpreter where that command runs (for example a terminal REPL), not a separate Jupyter kernel.
Environment variables
| Variable | Effect |
|---|---|
FRESHSPARK_SKIP_JAVA_CHECK |
If set to 1, true, or yes, an unsupported Java / Spark pairing warns instead of raising during session construction. |
Why this exists
Local PySpark is great until it is not: JVMs that linger, Derby files under cwd, warehouse dirs shared across runs, and UI ports that collide. freshspark centralizes a small set of Spark configs and lifecycle rules so each run gets an isolated temp layout and a cleanup path that actually runs (including an atexit safety net, with idempotent cleanup so manual cleanup() plus process exit does not misbehave).
Project links
- Homepage / source: github.com/eddiethedean/freshspark
- Issues: github.com/eddiethedean/freshspark/issues
License
Apache 2.0 (see LICENSE).
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file freshspark-0.2.0.tar.gz.
File metadata
- Download URL: freshspark-0.2.0.tar.gz
- Upload date:
- Size: 24.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
db739a70d35145dd63eec76c893bc26b239faa3a5902c4917143b7b1c6b32ef7
|
|
| MD5 |
34e4fcb8fa39ab7cc0626c2f8948c215
|
|
| BLAKE2b-256 |
6d444ffbf7ba45d4f7aaf9c55ec04f07cd0b4681864e838be34551ea85ecb80b
|
File details
Details for the file freshspark-0.2.0-py3-none-any.whl.
File metadata
- Download URL: freshspark-0.2.0-py3-none-any.whl
- Upload date:
- Size: 19.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
89e35254c4108d3519a2bd56725c9c05d13b9342590158fd98c32dca09fcc521
|
|
| MD5 |
9ec769b2e8d59edf1c3faf8833321e21
|
|
| BLAKE2b-256 |
a4c707c7e53e0eb153ee94a54d33d572615439b33c52d6ad0b20114d6800993e
|