An Apache Beam pipeline Runner built on Apache Spark's python API
Project description
PySpark Apache Beam Runner
Overview
(WHY? Doesn't Beam ship with a Spark runner?)
This project introduces a custom Apache Beam runner that leverages PySpark directly. This is not a 'portability' framework compliant runner! It is designed for environments where a SparkSession is available but a Spark master server is not. This is useful for e.g. serverless environments where jobs are triggered without a long-running cluster, sidestepping the expectations of Beam's default Spark runner.
The other benefit is that this strategy for building a runner helps to keep the stack as python-centric as possible. The compilation process, the optimizations, the execution planning - these all happen in python (for better or worse). Depending on your needs, this might be a significant advantage.
Features
- Direct Integration with PySpark: Utilizes a PySpark assumed SparkSession directly.
- Serverless Compatibility: Ideal for environments without a dedicated Spark master, supporting execution in serverless frameworks.
- Simplified Setup: Potentially reduces the complexity of job submission by avoiding the need for port listening on a Spark master.
Getting Started
Prerequisites
- Apache Spark
- Apache Beam
- Python 3.8 or later
Installation
To use this custom runner, just pip install
as you would any library
pip install beam-pyspark-runner
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for beam_pyspark_runner-0.0.3.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1a02ecbf325f9d8c8885a92218c79edfbd43ade2d4f4502a76afa05dd3ccd44d |
|
MD5 | 359cbf0b0dfda90b45a69694f9a2332f |
|
BLAKE2b-256 | cc0ca51d5b39b2beda69129da1e52825dcdc99b055ac11810258b21deb348fac |
Hashes for beam_pyspark_runner-0.0.3-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 33c458c2f1b48d7a5042732d4fd55b2739cfad19f6d7a1f485d270e57c1d5141 |
|
MD5 | ae2c6a090c4ed8839def0abe3cd1ab44 |
|
BLAKE2b-256 | 5ec44e55ec84c154902a1b334ecc12caab5fb38cdf79e0ea7bd809136dd581fe |