Python library for converting Apache Spark ML pipelines to PMML

These details have not been verified by PyPI

Project links

Development Status
- 5 - Production/Stable
Intended Audience
- Developers
- Science/Research
Operating System
- OS Independent
Programming Language
- Python
Topic
- Scientific/Engineering
- Software Development

Project description

PySpark2PMML

Python package for converting Apache Spark ML pipelines to PMML.

Features

This package is a thin PySpark wrapper for the JPMML-SparkML library.

News and Updates

See the NEWS.md file.

Prerequisites

PySpark 3.0.X through 3.5.X, 4.0.X or 4.1.X.
Python 3.8 or newer.
Java 8 or newer (as required by PySpark).

Installation

Install a release version from PyPI:

pip install pyspark2pmml

Alternatively, install the latest snapshot version from GitHub:

pip install --upgrade git+https://github.com/jpmml/pyspark2pmml.git

Configuration

One and the same PySpark2PMML version works across all supported PySpark release lines. Version variance is confined to the underlying JPMML-SparkML library, where each Apache Spark release line maps to a dedicated JPMML-SparkML release line.

PySpark2PMML must be paired with JPMML-SparkML based on the following compatibility matrix:

Apache Spark version	JPMML-SparkML branch	Latest JPMML-SparkML version
4.1.X	`master`	3.3.3
4.0.X	`3.2.X`	3.2.10
3.5.X	`3.1.X`	3.1.11
3.4.X	`3.0.X`	3.0.11

Additionally, PySpark2PMML should be interoperable with now-legacy Apache Spark 3.0 through 3.3 release lines. Please see the JPMML-SparkML documentation for extended compatibility matrices.

Local setup

PySpark2PMML version 0.11.0 and newer bundle JPMML-SparkML JAR files for quick programmatic setup.

Use the pyspark2pmml.spark_jars() utility function to obtain a PySpark-version dependent classpath string, and pass it as spark.jars configuration entry when building a Spark session:

from pyspark.sql import SparkSession

import pyspark2pmml

spark = SparkSession.builder \
	.config("spark.jars", pyspark2pmml.spark_jars()) \
	.getOrCreate()

Cluster setup

Use the pyspark2pmml.spark_jars_packages() utility function to obtain a PySpark-version dependent Apache Maven package coordinates string:

import pyspark2pmml

print(pyspark2pmml.spark_jars_packages())

Pass this value to pyspark or spark-submit using the --packages command-line option:

$SPARK_HOME/bin/pyspark --packages $(python -c "import pyspark2pmml; print(pyspark2pmml.spark_jars_packages())")

Usage

PySpark2PMML is designed to operate on fitted pipeline models.

The PMML representation can capture pipelines of any size and complexity, ranging from isolated models to multi-model chains (with feature pre-processing and decision post-processing stages interspersed between the model stages).

The main requirement for a successful conversion is that every transformer class used in the pipeline is known to the underlying JPMML-SparkML library. Check the list of supported transformer classes, and develop and register converters for custom transformer classes as needed.

Fitting a Spark ML pipeline:

from pyspark.ml import Pipeline
from pyspark.ml.classification import DecisionTreeClassifier
from pyspark.ml.feature import RFormula

df = spark.read.csv("Iris.csv", header = True, inferSchema = True)

formula = RFormula(formula = "Species ~ .")
classifier = DecisionTreeClassifier()
pipeline = Pipeline(stages = [formula, classifier])
pipelineModel = pipeline.fit(df)

The PySpark2PMML API mirrors the JPMML-SparkML API:

Construct a PMMLBuilder object based on the data schema and pipeline model. The data schema (column names and types) may be fetched from the training data frame, or constructed manually.
Configure the PMML builder by calling putOption(stage: Transformer, key: str, value: Any) and verify(df: DataFrame) methods on it.
Get the PMML XML text in memory by calling one of the buildString() or buildByteArray() methods, or dump it to a file by calling the buildFile(pmml_path: str) method.

Exporting the fitted Spark ML pipeline to a PMML file:

from pyspark2pmml import PMMLBuilder

pmmlBuilder = PMMLBuilder(df.schema, pipelineModel) \
	.verify(df.sample(0.05))

# Dump PMML to file in the driver's filesystem
pmml_path = pmmlBuilder.buildFile("DecisionTreeIris.pmml")
print(pmml_path)

# Keep PMML in memory
#pmml_str = pmmlBuilder.buildString()
#print(pmml_str)

The representation of individual Spark ML pipeline stages can be customized via conversion options:

from pyspark2pmml import PMMLBuilder

classifierModel = pipelineModel.stages[1]

pmmlBuilder = PMMLBuilder(df.schema, pipelineModel) \
	.putOption(classifierModel, "compact", False) \
	.putOption(classifierModel, "estimate_featureImportances", True) \
	.verify(df.sample(0.05))

pmmlBuilder.buildFile("DecisionTreeIris.pmml")

License

PySpark2PMML is licensed under the terms and conditions of the GNU Affero General Public License, Version 3.0.

If you would like to use PySpark2PMML in a proprietary software project, then it is possible to enter into a licensing agreement which makes PySpark2PMML available under the terms and conditions of the BSD 3-Clause License instead.

Additional information

PySpark2PMML is developed and maintained by Openscoring Ltd, Estonia.

Interested in using Java PMML API software in your company? Please contact info@openscoring.io

Project details

These details have not been verified by PyPI

Project links

Development Status
- 5 - Production/Stable
Intended Audience
- Developers
- Science/Research
Operating System
- OS Independent
Programming Language
- Python
Topic
- Scientific/Engineering
- Software Development

Release history Release notifications | RSS feed

This version

0.11.1

Jun 14, 2026

0.11.0

Apr 19, 2026

0.10.0

Mar 17, 2026

0.9.0

Mar 16, 2026

0.8.2

Dec 27, 2025

0.8.1

Nov 19, 2025

0.8.0

Nov 16, 2025

0.7.2

Nov 10, 2025

0.7.1

Nov 7, 2025

0.7.0

Nov 6, 2025

0.6.1

Aug 8, 2025

0.6.0

Aug 5, 2025

0.5.1

Apr 8, 2019

0.5.0

Feb 20, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyspark2pmml-0.11.1.tar.gz (7.0 MB view details)

Uploaded Jun 14, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pyspark2pmml-0.11.1-py3-none-any.whl (7.0 MB view details)

Uploaded Jun 14, 2026 Python 3

File details

Details for the file pyspark2pmml-0.11.1.tar.gz.

File metadata

Download URL: pyspark2pmml-0.11.1.tar.gz
Upload date: Jun 14, 2026
Size: 7.0 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for pyspark2pmml-0.11.1.tar.gz
Algorithm	Hash digest
SHA256	`9c88b4d60648293d58b065978e0b3efbdcc665c11cccc94f437fb4d60f1a11c6`
MD5	`7124a3eb76fcb3868441daf58b5ee6b8`
BLAKE2b-256	`123a8db71b4e9ab43de768c91c5491cfc2090ba0478e6c6184d632dc4626626d`

See more details on using hashes here.

File details

Details for the file pyspark2pmml-0.11.1-py3-none-any.whl.

File metadata

Download URL: pyspark2pmml-0.11.1-py3-none-any.whl
Upload date: Jun 14, 2026
Size: 7.0 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for pyspark2pmml-0.11.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`04b9103196b6d167f220fb661447976419b8994d79bb84137afbf666dcdc102a`
MD5	`49a2c7016946a11fe3fcc81b1be92291`
BLAKE2b-256	`cff9be9271dff1451c64ad22cf848ab239ee5e8a7ff1b47d0c616bf82273a7dd`

See more details on using hashes here.

pyspark2pmml 0.11.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

PySpark2PMML

Features

News and Updates

Prerequisites

Installation

Configuration

Local setup

Cluster setup

Usage

License

Additional information

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes