Framework for simpler Spark Pipelines

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 4 - Beta
License
- OSI Approved :: Apache Software License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

Build and Test

Upload Python Package

SparkPipelineFramework

SparkPipelineFramework implements a few design patterns to make it easier to create Spark applications that:

Separate data transformation logic from the pipeline execution code so you can compose pipelines by just stringing together transformers. (Based on the SparkML Pipeline class but enhanced to work for both ML and non-ML transformations)
Enables running SQL transformations without writing any code
Enables versioning of transformations so different pipelines can use older or newer versions of each transformer. This enables you to upgrade each pipeline at your own choice
Enables Autocomplete of transformations when creating pipelines (in PyCharm).
Implement many separation-of-concerns e.g., logging, performance monitoring, error reporting
Supports both non-ML, ML and mixed workloads
Has an additional(optional) library SparkAutoMapper(https://github.com/icanbwell/SparkAutoMapper) that enables data engineers and data analysts to easily map data without writing code.
Has an additional(optional) library SparkPipelineFramework.Testing (https://github.com/icanbwell/SparkPipelineFramework.Testing) that allows you to create unit tests without writing code; you just put in the input file and confirm the output file

PyPi Package

This code is available as a package to import into your project. https://pypi.org/project/sparkpipelineframework/

Using it in your project

(For an example project that uses SparkPipelineFramework, see https://github.com/imranq2/TestSparkPipelineFramework)

Add sparkpipelineframework package to your project requirements.txt/Pipefile
make init. (this will setup Spark, Docker (to run Spark) )
Create a folder called library in your project

using Pycharm

You can run SparkPipelineFrame project from Pycharm

Add a new Docker Compose interpreter
Choose docker-compose.yml for the configuration file
Choose dev for the Service
Click OK and give Pycharm a couple of minutes to index the content of the docker container
Right click on the Test folder and click "Run 'pytest in tests'"

To create a new pipeline

Create a class derived from FrameworkPipeline
In your init function set self.transformers to the list of transformers to run for this pipeline. For example:

class MyPipeline(FrameworkPipeline):
    def __init__(self, parameters: Dict[str, Any], progress_logger: ProgressLogger):
        super().__init__(parameters=parameters,
                                         progress_logger=progress_logger)
        self.transformers = self.create_steps([
            FrameworkCsvLoader(
                view="flights",
                path_to_csv=parameters["flights_path"]
            ),
            FeaturesCarriers(parameters=parameters).transformers,
        ])

To Add a SQL transformation

Create a new folder and a .sql file in that folder. This folder should be in the library folder or any subfolder you choose under the library folder.
The name of the file is the name of the view that will be created/updated to store the result of your sql code e.g., carriers.sql means we will create/update a view called carriers with the results of your sql code.
Add your sql to it. This can be any valid SparkSQL and can refer to any view created by the pipeline before this transformer is run. For example:

SELECT carrier, crsarrtime FROM flights

Run the generate_proxies command as shown in the Generating Proxies section below
Now go to your Pipeline class init and add to self.transformers. Start the folder name and hit ctrl-space for PyCharm to autocomplete the name
That's it. Your sql has been automaticaly wrapped in a Transformer which will do logging, monitor performance and do error checking

To Add a Python transformation

Create a new folder and .py file in that folder. This folder should be in the library folder or any subfolder you choose under the library folder.
In the .py file, create a new class and derive from Transformer (from spark ML). Implement the _transform() function For example:

from typing import Optional, Dict, Any

from pyspark import keyword_only
from pyspark.sql.dataframe import DataFrame

from spark_pipeline_framework.progress_logger.progress_logger import ProgressLogger
from spark_pipeline_framework.proxy_generator.python_proxy_base import PythonProxyBase


class FeatureTransformer(PythonProxyBase):
    # noinspection PyUnusedLocal
    @keyword_only
    def __init__(self,
                 name: str = None,
                 parameters: Optional[Dict[str, Any]] = None,
                 progress_logger: Optional[ProgressLogger] = None,
                 verify_count_remains_same: bool = False
                 ) -> None:
        super(FeatureTransformer, self).__init__(name=name,
                                                 parameters=parameters,
                                                 progress_logger=progress_logger,
                                                 verify_count_remains_same=verify_count_remains_same)

    def _transform(self, df: DataFrame) -> DataFrame:
        pass

Run the generate_proxies command as shown in the Generating Proxies section below
Now go to your Pipeline class init and add to self.transformers. Start the folder name and hit ctrl-space for PyCharm to autocomplete the name

To Add a Machine Learning training transformation (called `fit` or `Estimator` in SparkML lingo)

Create a new folder and .py file in that folder. This folder should be in the library folder or any subfolder you choose under the library folder.
In the .py file, create a new class and derive from Estimator (from spark ML). Implement the fit() function
Run the generate_proxies command as shown in the Generating Proxies section below
Now go to your Pipeline class init and add to self.estimators. Start the folder name and hit ctrl-space for PyCharm to autocomplete the name

To Add a Machine Learning prediction transformation

Create a new folder and .py file in that folder. This folder should be in the library folder or any subfolder you choose under the library folder.
In the .py file, create a new class and derive from Estimator (from spark ML). Implement the _transform() function. Note that that can be the same class you use for training and prediction.
Run the generate_proxies command as shown in the Generating Proxies section below
Now go to your Pipeline class init and add to self.transformers. Start the folder name and hit ctrl-space for PyCharm to autocomplete the name

Including pipelines in other pipelines

Pipelines are fully composable so you can include one pipeline as a transformer in another pipeline. For example:

class MyPipeline(FrameworkPipeline):
    def __init__(self, parameters: Dict[str, Any], progress_logger: ProgressLogger):
        super(MyPipeline, self).__init__(parameters=parameters,
                                         progress_logger=progress_logger)
        self.transformers = self.create_steps([
            FrameworkCsvLoader(
                view="flights",
                path_to_csv=parameters["flights_path"]
            ),
            PipelineFoo(parameters=parameters).transformers,
            FeaturesCarriers(parameters=parameters).transformers,
        ])

Generating Proxies

Run the following command to generate proxy classes. These automatically wrap your sql with a Spark Transformer that can be included in a Pipeline with no additional code.

python3 spark_pipeline_framework/proxy_generator/generate_proxies.py.

You can also add this to your project Makefile to make it easier to run:

.PHONY:proxies
proxies:
	python3 spark_pipeline_framework/proxy_generator/generate_proxies.py

Testing

Test a pipeline

A pipeline can be tested by providing test data in csv (or parquet), running the pipeline and then asserting for data in any view or dataframe.

from pathlib import Path
from typing import Dict, Any

from pyspark.sql.dataframe import DataFrame
from pyspark.sql.session import SparkSession
from pyspark.sql.types import StructType

from library.features.carriers.v1.features_carriers_v1 import FeaturesCarriersV1
from library.features.carriers_python.v1.features_carriers_python_v1 import FeaturesCarriersPythonV1
from spark_pipeline_framework.pipelines.framework_pipeline import FrameworkPipeline
from spark_pipeline_framework.progress_logger.progress_logger import ProgressLogger
from spark_pipeline_framework.transformers.framework_csv_loader import FrameworkCsvLoader
from spark_pipeline_framework.utilities.flattener import flatten


class MyPipeline(FrameworkPipeline):
    def __init__(self, parameters: Dict[str, Any], progress_logger: ProgressLogger):
        super(MyPipeline, self).__init__(parameters=parameters,
                                         progress_logger=progress_logger)
        self.transformers = self.create_steps([
            FrameworkCsvLoader(
                view="flights",
                path_to_csv=parameters["flights_path"],
                progress_logger=progress_logger
            ),
            FeaturesCarriersV1(parameters=parameters, progress_logger=progress_logger).transformers,
            FeaturesCarriersPythonV1(parameters=parameters, progress_logger=progress_logger).transformers
        ])


def test_can_run_framework_pipeline(spark_session: SparkSession) -> None:
    # Arrange
    data_dir: Path = Path(__file__).parent.joinpath('./')
    flights_path: str = f"file://{data_dir.joinpath('flights.csv')}"

    schema = StructType([])

    df: DataFrame = spark_session.createDataFrame(
        spark_session.sparkContext.emptyRDD(), schema)

    spark_session.sql("DROP TABLE IF EXISTS default.flights")

    # Act
    parameters = {
        "flights_path": flights_path
    }

    with ProgressLogger() as progress_logger:
        pipeline: MyPipeline = MyPipeline(parameters=parameters, progress_logger=progress_logger)
        transformer = pipeline.fit(df)
        transformer.transform(df)

    # Assert
    result_df: DataFrame = spark_session.sql("SELECT * FROM flights2")
    result_df.show()

    assert result_df.count() > 0

Testing a single Transformer directly

Each Transformer can be tested individually by setting up the data to pass into it (e.g., loading from csv) and then testing the result of running the transformer.

from pathlib import Path

from pyspark.sql.dataframe import DataFrame
from pyspark.sql.session import SparkSession
from pyspark.sql.types import StructType
from spark_pipeline_framework.transformers.framework_csv_loader import FrameworkCsvLoader
from spark_pipeline_framework.utilities.attr_dict import AttrDict

from library.features.carriers.v1.features_carriers_v1 import FeaturesCarriersV1


def test_carriers_v1(spark_session: SparkSession):
    # Arrange
    data_dir: Path = Path(__file__).parent.joinpath('./')
    flights_path: str = f"file://{data_dir.joinpath('flights.csv')}"

    schema = StructType([])

    df: DataFrame = spark_session.createDataFrame(
        spark_session.sparkContext.emptyRDD(), schema)

    spark_session.sql("DROP TABLE IF EXISTS default.flights")

    FrameworkCsvLoader(
        view="flights",
        path_to_csv=flights_path
    ).transform(dataset=df)

    parameters = {}

    FeaturesCarriersV1(parameters=parameters).transformers[0].transform(dataset=df)

    result_df: DataFrame = spark_session.sql("SELECT * FROM flights2")
    result_df.show()

    assert result_df.count() > 0

Contributing

Run make init This will install Java, Scala, Spark and other packages

Publishing a new package

Create a new release
The GitHub Action should automatically kick in and publish the package
You can see the status in the Actions tab

Project details

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 4 - Beta
License
- OSI Approved :: Apache Software License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

4.0.23

Feb 22, 2026

4.0.23a0 pre-release

Jan 20, 2026

4.0.22

Dec 15, 2025

4.0.21

Oct 29, 2025

4.0.21a0 pre-release

Oct 27, 2025

4.0.20

Oct 1, 2025

4.0.19

Sep 30, 2025

4.0.18

Sep 23, 2025

4.0.18a0 pre-release

Sep 22, 2025

4.0.17

Sep 19, 2025

4.0.17b0 pre-release

Aug 12, 2025

4.0.17a0 pre-release

Jul 29, 2025

4.0.16

Jul 24, 2025

4.0.15

Jul 3, 2025

4.0.14

Jun 25, 2025

4.0.13

May 30, 2025

4.0.12

May 30, 2025

4.0.11

May 19, 2025

4.0.10

May 16, 2025

4.0.9

May 16, 2025

4.0.8

May 5, 2025

4.0.7

Apr 28, 2025

4.0.6

Apr 25, 2025

4.0.5

Apr 25, 2025

4.0.4

Apr 22, 2025

4.0.3

Apr 21, 2025

4.0.2

Apr 11, 2025

4.0.1

Apr 7, 2025

3.0.89

Apr 1, 2025

3.0.88

Apr 1, 2025

3.0.87

Apr 1, 2025

3.0.86

Apr 1, 2025

3.0.85

Mar 28, 2025

3.0.84

Mar 28, 2025

3.0.83

Mar 27, 2025

3.0.82

Mar 26, 2025

3.0.81

Mar 25, 2025

3.0.80

Mar 23, 2025

3.0.79

Mar 18, 2025

3.0.78

Mar 18, 2025

3.0.77

Mar 18, 2025

3.0.76

Mar 18, 2025

3.0.75

Mar 17, 2025

3.0.74

Mar 17, 2025

3.0.73

Mar 16, 2025

3.0.72

Mar 16, 2025

3.0.71

Mar 16, 2025

3.0.70

Mar 16, 2025

3.0.69

Mar 16, 2025

3.0.68

Mar 15, 2025

3.0.67

Mar 14, 2025

3.0.66

Mar 14, 2025

3.0.65

Mar 14, 2025

3.0.64

Mar 14, 2025

3.0.63

Mar 14, 2025

3.0.62

Mar 14, 2025

3.0.61

Mar 13, 2025

3.0.60

Mar 12, 2025

3.0.59

Mar 11, 2025

3.0.58

Mar 11, 2025

3.0.57

Mar 10, 2025

3.0.56

Mar 10, 2025

3.0.55

Mar 7, 2025

3.0.54

Mar 7, 2025

3.0.53

Mar 6, 2025

3.0.52

Mar 6, 2025

3.0.51

Mar 5, 2025

3.0.50

Mar 5, 2025

3.0.49

Mar 5, 2025

3.0.48

Mar 3, 2025

3.0.47

Feb 23, 2025

3.0.46

Feb 22, 2025

3.0.45

Feb 22, 2025

3.0.44

Feb 19, 2025

3.0.43

Feb 17, 2025

3.0.42

Feb 13, 2025

3.0.41

Feb 13, 2025

3.0.40

Feb 13, 2025

3.0.39

Feb 12, 2025

3.0.38

Feb 12, 2025

3.0.37

Feb 11, 2025

3.0.36

Feb 7, 2025

3.0.35

Feb 6, 2025

3.0.34

Jan 30, 2025

3.0.33

Jan 29, 2025

3.0.32

Jan 15, 2025

3.0.31

Dec 26, 2024

3.0.30

Nov 22, 2024

3.0.29

Nov 21, 2024

3.0.28

Nov 20, 2024

3.0.27

Nov 14, 2024

3.0.26

Nov 13, 2024

3.0.25

Nov 10, 2024

3.0.24

Nov 5, 2024

3.0.23

Nov 4, 2024

3.0.22

Nov 3, 2024

3.0.21

Nov 1, 2024

3.0.20

Oct 31, 2024

3.0.19

Oct 31, 2024

3.0.18

Oct 30, 2024

3.0.17

Oct 30, 2024

3.0.16

Oct 30, 2024

3.0.15

Oct 28, 2024

3.0.14

Oct 3, 2024

3.0.13

Sep 30, 2024

3.0.12

Sep 5, 2024

3.0.11

Sep 4, 2024

3.0.10

Sep 4, 2024

3.0.9

Sep 2, 2024

3.0.8

Aug 24, 2024

3.0.7

Aug 23, 2024

3.0.6

Aug 23, 2024

3.0.5

Aug 23, 2024

3.0.4

Aug 21, 2024

3.0.3

Aug 21, 2024

3.0.2

Aug 21, 2024

3.0.1

Aug 21, 2024

3.0.0

Aug 21, 2024

2.0.75

Aug 23, 2024

2.0.74

Aug 23, 2024

2.0.73

Aug 21, 2024

2.0.72

Aug 21, 2024

2.0.71

Aug 19, 2024

2.0.70

Aug 18, 2024

2.0.69

Aug 18, 2024

2.0.68

Aug 18, 2024

2.0.67

Aug 16, 2024

2.0.66

Aug 16, 2024

2.0.65

Aug 15, 2024

2.0.64

Aug 15, 2024

2.0.63

Aug 13, 2024

2.0.62

Aug 13, 2024

2.0.61

Aug 13, 2024

2.0.60

Aug 13, 2024

2.0.59

Aug 12, 2024

2.0.58

Aug 12, 2024

2.0.57

Aug 12, 2024

2.0.56

Aug 12, 2024

2.0.55

Aug 12, 2024

2.0.54

Aug 7, 2024

2.0.53

Aug 7, 2024

2.0.52

Aug 7, 2024

2.0.51

Aug 6, 2024

2.0.50

Aug 6, 2024

2.0.49

Aug 6, 2024

2.0.48

Aug 6, 2024

2.0.47

Aug 5, 2024

2.0.46

Aug 5, 2024

2.0.45

Aug 5, 2024

2.0.44

Aug 5, 2024

2.0.43

Aug 2, 2024

2.0.42

Jul 31, 2024

2.0.41

Jul 31, 2024

2.0.40

Jul 30, 2024

2.0.39

Jul 30, 2024

2.0.38

Jul 30, 2024

2.0.33

Jun 19, 2024

2.0.32

Jun 16, 2024

2.0.31

May 29, 2024

2.0.30

May 8, 2024

2.0.29

May 7, 2024

2.0.28

May 6, 2024

2.0.27

May 6, 2024

2.0.26

Apr 29, 2024

2.0.25

Apr 26, 2024

2.0.24

Apr 19, 2024

2.0.23

Apr 5, 2024

2.0.22

Feb 19, 2024

2.0.21

Feb 12, 2024

2.0.20

Nov 27, 2023

2.0.19

Nov 23, 2023

2.0.18

Nov 14, 2023

2.0.17

Nov 10, 2023

2.0.16

Nov 9, 2023

2.0.15

Nov 7, 2023

2.0.14

Oct 30, 2023

2.0.13

Oct 25, 2023

2.0.12

Oct 23, 2023

2.0.11

Oct 21, 2023

2.0.10

Oct 6, 2023

2.0.9

Sep 25, 2023

2.0.8

Sep 19, 2023

2.0.7

Aug 31, 2023

2.0.6

Aug 30, 2023

2.0.5

Aug 23, 2023

2.0.3

Aug 14, 2023

2.0.2

Aug 10, 2023

2.0.1

Aug 9, 2023

2.0.0

Aug 8, 2023

1.0.121

Aug 8, 2023

1.0.120

Jul 27, 2023

1.0.119

Jul 5, 2023

1.0.118

Jun 21, 2023

1.0.117

Jun 15, 2023

1.0.116

May 23, 2023

1.0.115

May 16, 2023

1.0.114

May 11, 2023

1.0.113

May 2, 2023

1.0.112

May 1, 2023

1.0.111

Apr 28, 2023

1.0.110

Apr 27, 2023

1.0.109

Apr 24, 2023

1.0.108

Apr 20, 2023

1.0.107

Apr 14, 2023

1.0.106

Apr 12, 2023

1.0.105

Apr 11, 2023

1.0.104

Apr 3, 2023

1.0.103

Mar 28, 2023

1.0.102

Mar 23, 2023

1.0.101

Mar 23, 2023

1.0.100

Mar 22, 2023

1.0.99

Mar 22, 2023

1.0.98

Mar 21, 2023

1.0.97

Mar 20, 2023

1.0.96

Mar 19, 2023

1.0.95

Mar 19, 2023

1.0.94

Mar 19, 2023

1.0.93

Mar 18, 2023

1.0.92

Mar 17, 2023

1.0.90

Mar 8, 2023

1.0.89

Feb 25, 2023

1.0.88

Feb 24, 2023

1.0.87

Feb 22, 2023

1.0.85

Feb 2, 2023

1.0.84

Feb 2, 2023

1.0.83

Jan 24, 2023

1.0.82

Jan 23, 2023

1.0.81

Jan 20, 2023

1.0.80

Jan 20, 2023

1.0.79

Jan 13, 2023

1.0.78

Jan 12, 2023

1.0.77

Jan 12, 2023

1.0.76

Jan 12, 2023

1.0.75

Jan 12, 2023

1.0.74

Jan 11, 2023

1.0.73

Jan 6, 2023

1.0.72

Jan 3, 2023

1.0.71

Dec 31, 2022

1.0.70

Dec 19, 2022

1.0.69

Dec 13, 2022

1.0.68

Dec 6, 2022

1.0.67

Dec 1, 2022

1.0.66

Nov 30, 2022

1.0.65

Nov 30, 2022

1.0.64

Nov 30, 2022

1.0.63

Nov 30, 2022

1.0.62

Nov 30, 2022

1.0.61

Nov 30, 2022

1.0.60

Nov 30, 2022

1.0.59

Nov 24, 2022

1.0.58

Nov 22, 2022

1.0.57

Nov 21, 2022

1.0.56

Nov 11, 2022

1.0.55

Nov 9, 2022

1.0.54

Nov 8, 2022

1.0.53

Nov 4, 2022

1.0.52

Oct 31, 2022

1.0.51

Oct 31, 2022

1.0.50

Oct 30, 2022

1.0.49

Oct 28, 2022

1.0.48

Oct 28, 2022

1.0.47

Oct 28, 2022

1.0.46

Oct 27, 2022

1.0.45

Oct 27, 2022

1.0.44

Oct 27, 2022

1.0.43

Oct 27, 2022

1.0.42

Oct 27, 2022

1.0.41

Oct 26, 2022

1.0.40

Oct 26, 2022

1.0.39

Oct 26, 2022

1.0.38

Oct 25, 2022

1.0.37

Oct 25, 2022

1.0.36

Oct 23, 2022

1.0.35

Oct 20, 2022

1.0.34

Oct 19, 2022

1.0.33

Oct 18, 2022

1.0.32

Oct 18, 2022

1.0.31

Oct 18, 2022

1.0.30

Oct 14, 2022

1.0.29

Oct 11, 2022

1.0.28

Sep 25, 2022

1.0.27

Sep 10, 2022

1.0.26

Aug 17, 2022

1.0.26a5 pre-release

Sep 8, 2022

1.0.26a4 pre-release

Sep 8, 2022

1.0.26a3 pre-release

Sep 7, 2022

1.0.26a2 pre-release

Sep 4, 2022

1.0.26a1 pre-release

Sep 4, 2022

1.0.25a1 pre-release

Jul 21, 2022

1.0.24

Jul 20, 2022

1.0.23

Jul 13, 2022

1.0.22

Jul 8, 2022

1.0.21

Jun 28, 2022

1.0.20

Jun 14, 2022

1.0.19

Jun 14, 2022

1.0.18

Jun 8, 2022

1.0.17

May 18, 2022

1.0.16

May 10, 2022

1.0.15

Apr 27, 2022

1.0.14

Apr 7, 2022

1.0.13

Apr 3, 2022

1.0.12

Mar 20, 2022

1.0.11

Mar 2, 2022

This version

1.0.10

Feb 22, 2022

1.0.9

Feb 21, 2022

1.0.8

Feb 20, 2022

1.0.7

Feb 11, 2022

1.0.6

Feb 3, 2022

1.0.5

Jan 26, 2022

1.0.4

Dec 13, 2021

1.0.3

Dec 10, 2021

1.0.2

Nov 11, 2021

1.0.1

Nov 11, 2021

1.0.0

Nov 10, 2021

0.1.118

Oct 11, 2021

0.1.117

Oct 8, 2021

0.1.116

Oct 7, 2021

0.1.115

Oct 7, 2021

0.1.114

Oct 7, 2021

0.1.113

Oct 7, 2021

0.1.112

Oct 6, 2021

0.1.111

Oct 6, 2021

0.1.110

Oct 6, 2021

0.1.109

Oct 5, 2021

0.1.108

Sep 16, 2021

0.1.107

Sep 14, 2021

0.1.106

Sep 7, 2021

0.1.105

Aug 23, 2021

0.1.104

Jul 25, 2021

0.1.103

Jul 23, 2021

0.1.102

Jul 20, 2021

0.1.100

Jul 20, 2021

0.1.99

Jul 8, 2021

0.1.98

Jun 18, 2021

0.1.97

Jun 14, 2021

0.1.96

Jun 7, 2021

0.1.95

Jun 7, 2021

0.1.94

Jun 5, 2021

0.1.93

May 9, 2021

0.1.92

May 6, 2021

0.1.91

May 5, 2021

0.1.90

May 5, 2021

0.1.89

May 5, 2021

0.1.88

May 5, 2021

0.1.87

May 5, 2021

0.1.86

May 4, 2021

0.1.85

May 3, 2021

0.1.81

Mar 22, 2021

0.1.80

Mar 18, 2021

0.1.79

Mar 13, 2021

0.1.78

Mar 13, 2021

0.1.77

Mar 13, 2021

0.1.76

Mar 13, 2021

0.1.75

Mar 13, 2021

0.1.74

Mar 12, 2021

0.1.73

Mar 12, 2021

0.1.72

Mar 12, 2021

0.1.71

Mar 12, 2021

0.1.70

Mar 12, 2021

0.1.69

Feb 17, 2021

0.1.68

Feb 15, 2021

0.1.67

Feb 10, 2021

0.1.66

Feb 9, 2021

0.1.65

Jan 23, 2021

0.1.64

Jan 22, 2021

0.1.63

Jan 8, 2021

0.1.62

Dec 23, 2020

0.1.61

Dec 15, 2020

0.1.60

Dec 14, 2020

0.1.59

Dec 13, 2020

0.1.58

Dec 11, 2020

0.1.57

Dec 11, 2020

0.1.56

Dec 10, 2020

0.1.55

Dec 7, 2020

0.1.54

Dec 3, 2020

0.1.53

Dec 2, 2020

0.1.52

Nov 30, 2020

0.1.51

Nov 30, 2020

0.1.50

Nov 23, 2020

0.1.49

Nov 23, 2020

0.1.48

Nov 23, 2020

0.1.47

Nov 22, 2020

0.1.46

Nov 22, 2020

0.1.45

Nov 22, 2020

0.1.44

Nov 17, 2020

0.1.43

Nov 17, 2020

0.1.42

Nov 17, 2020

0.1.41

Nov 15, 2020

0.1.40

Nov 13, 2020

0.1.39

Nov 11, 2020

0.1.38

Nov 8, 2020

0.1.37

Nov 8, 2020

0.1.36

Nov 6, 2020

0.1.35

Nov 6, 2020

0.1.34

Nov 5, 2020

0.1.33

Nov 4, 2020

0.1.32

Nov 4, 2020

0.1.31

Nov 3, 2020

0.1.30

Nov 2, 2020

0.1.29

Oct 31, 2020

0.1.28

Oct 28, 2020

0.1.27

Oct 28, 2020

0.1.26

Oct 28, 2020

0.1.25

Oct 27, 2020

0.1.24

Oct 19, 2020

0.1.23

Oct 13, 2020

0.1.22

Oct 12, 2020

0.1.21

Oct 11, 2020

0.1.20

Oct 11, 2020

0.1.19

Oct 11, 2020

0.1.18

Oct 9, 2020

0.1.17

Oct 7, 2020

0.1.16

Oct 7, 2020

0.1.15

Oct 6, 2020

0.1.14

Oct 6, 2020

0.1.13

Oct 6, 2020

0.1.12

Oct 5, 2020

0.1.11

Oct 2, 2020

0.1.10

Oct 2, 2020

0.1.9

Oct 1, 2020

0.1.6

Oct 1, 2020

0.1.5

Oct 1, 2020

0.1.4

Oct 1, 2020

0.1.3

Oct 1, 2020

0.1.2

Oct 1, 2020

0.1.1

Oct 1, 2020

0.1.0

Oct 1, 2020

0.0.19

Oct 1, 2020

0.0.18

Oct 1, 2020

0.0.17

Sep 30, 2020

0.0.16

Sep 30, 2020

0.0.15

Sep 30, 2020

0.0.14

Sep 30, 2020

0.0.13

Sep 30, 2020

0.0.12

Sep 30, 2020

0.0.11

Sep 30, 2020

0.0.9

Sep 30, 2020

0.0.8

Sep 30, 2020

0.0.7

Sep 29, 2020

0.0.6

Sep 25, 2020

0.0.5

Sep 21, 2020

0.0.3

Sep 20, 2020

0.0.2

Sep 20, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sparkpipelineframework-1.0.10.tar.gz (77.0 kB view details)

Uploaded Feb 22, 2022 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

sparkpipelineframework-1.0.10-py3-none-any.whl (179.8 kB view details)

Uploaded Feb 22, 2022 Python 3

File details

Details for the file sparkpipelineframework-1.0.10.tar.gz.

File metadata

Download URL: sparkpipelineframework-1.0.10.tar.gz
Upload date: Feb 22, 2022
Size: 77.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/32.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.62.3 importlib-metadata/4.2.0 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.7.12

File hashes

Hashes for sparkpipelineframework-1.0.10.tar.gz
Algorithm	Hash digest
SHA256	`ca65ef8e48f417ab6f12bf91fc3c7dcddd0e08e70807ec45f7b18c6843b718e8`
MD5	`c3585226feeaa2a487bdb563032afc17`
BLAKE2b-256	`7b5354ba8436d2f9f72ef06b1d1e287a513bc9d3ade4c5f6d532c74409ba2db1`

See more details on using hashes here.

File details

Details for the file sparkpipelineframework-1.0.10-py3-none-any.whl.

File metadata

Download URL: sparkpipelineframework-1.0.10-py3-none-any.whl
Upload date: Feb 22, 2022
Size: 179.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/32.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.62.3 importlib-metadata/4.2.0 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.7.12

File hashes

Hashes for sparkpipelineframework-1.0.10-py3-none-any.whl
Algorithm	Hash digest
SHA256	`500589c06648b26acedc93d17cf33114c83da025a42ecfccb78d32460eb9945b`
MD5	`216389c83b5d2f2e5503fb80ea1e45cb`
BLAKE2b-256	`6c603073617777d73cd682d2476980510b5d7b76c402c592d3153bff84c16cde`

See more details on using hashes here.

sparkpipelineframework 1.0.10

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

SparkPipelineFramework

PyPi Package

Using it in your project

using Pycharm

To create a new pipeline

To Add a SQL transformation

To Add a Python transformation

To Add a Machine Learning training transformation (called fit or Estimator in SparkML lingo)

To Add a Machine Learning prediction transformation

Including pipelines in other pipelines

Generating Proxies

Testing

Test a pipeline

Testing a single Transformer directly

Contributing

Publishing a new package

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

To Add a Machine Learning training transformation (called `fit` or `Estimator` in SparkML lingo)