Skip to main content

A library to parse PMML models into Scikit-learn estimators.

Project description

sklearn-pmml-model

PyPI version codecov CircleCI ReadTheDocs

A library to effortlessly import models trained on different platforms and with programming languages into scikit-learn in Python. First export your model to PMML (widely supported). Next, load the exported PMML file with this library, and use the class as any other scikit-learn estimator.

Installation

The easiest way is to use pip:

$ pip install sklearn-pmml-model

Status

The library currently supports the following models:

Model Classification Regression Categorical features
Decision Trees 1
Random Forests 1
Gradient Boosting 1
Linear Regression 3
Ridge 2 3
Lasso 2 3
ElasticNet 2 3
Gaussian Naive Bayes 3
Support Vector Machines 3
Nearest Neighbors
Neural Networks

1 Categorical feature support using slightly modified internals, based on scikit-learn#12866.

2 These models differ only in training characteristics, the resulting model is of the same form. Classification is supported using PMMLLogisticRegression for regression models and PMMLRidgeClassifier for general regression models.

3 By one-hot encoding categorical features automatically.

Example

A minimal working example (using this PMML file) is shown below:

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
import pandas as pd
import numpy as np
from sklearn_pmml_model.ensemble import PMMLForestClassifier
from sklearn_pmml_model.auto_detect import auto_detect_estimator

# Prepare the data
iris = load_iris()
X = pd.DataFrame(iris.data)
X.columns = np.array(iris.feature_names)
y = pd.Series(np.array(iris.target_names)[iris.target])
y.name = "Class"
Xtr, Xte, ytr, yte = train_test_split(X, y, test_size=0.33, random_state=123)

# Specify the model type for the least overhead...
#clf = PMMLForestClassifier(pmml="models/randomForest.pmml")

# ...or simply let the library auto-detect the model type
clf = auto_detect_estimator(pmml="models/randomForest.pmml")

# Use the model as any other scikit-learn model
clf.predict(Xte)
clf.score(Xte, yte)

More examples can be found in the subsequent packages: tree, ensemble, linear_model, naive_bayes, svm, neighbors and neural_network.

Benchmark

Depending on the data set and model, sklearn-pmml-model is between 5 and a 1000 times faster than competing libraries, by leveraging the optimization and industry-tested robustness of sklearn. Source code for this benchmark can be found in the corresponding jupyter notebook.

Running times (load + predict, in seconds)

Linear model Naive Bayes Decision tree Random Forest Gradient boosting
Wine PyPMML 0.773291 0.77384 0.777425 0.895204 0.902355
sklearn-pmml-model 0.005813 0.006357 0.002693 0.108882 0.121823
Breast cancer PyPMML 3.849855 3.878448 3.83623 4.16358 4.13766
sklearn-pmml-model 0.015723 0.011278 0.002807 0.146234 0.044016

Improvement

Linear model Naive Bayes Decision tree Random Forest Gradient boosting
Wine Improvement 133× 122× 289×
Breast cancer Improvement 245× 344× 1,367× 28× 94×

Development

Prerequisites

Tests can be run using Py.test. Grab a local copy of the source:

$ git clone http://github.com/iamDecode/sklearn-pmml-model
$ cd sklearn-pmml-model

create a virtual environment and activating it:

$ python3 -m venv venv
$ source venv/bin/activate

and install the dependencies:

$ pip install -r requirements.txt

The final step is to build the Cython extensions:

$ python setup.py build_ext --inplace

Testing

You can execute tests with py.test by running:

$ python setup.py pytest

Contributing

Feel free to make a contribution. Please read CONTRIBUTING.md for more details.

License

This project is licensed under the BSD 2-Clause License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sklearn_pmml_model-1.0.7.tar.gz (895.4 kB view hashes)

Uploaded Source

Built Distributions

sklearn_pmml_model-1.0.7-cp312-cp312-win_amd64.whl (458.1 kB view hashes)

Uploaded CPython 3.12 Windows x86-64

sklearn_pmml_model-1.0.7-cp312-cp312-win32.whl (408.9 kB view hashes)

Uploaded CPython 3.12 Windows x86

sklearn_pmml_model-1.0.7-cp312-cp312-musllinux_1_1_x86_64.whl (2.3 MB view hashes)

Uploaded CPython 3.12 musllinux: musl 1.1+ x86-64

sklearn_pmml_model-1.0.7-cp312-cp312-musllinux_1_1_i686.whl (2.2 MB view hashes)

Uploaded CPython 3.12 musllinux: musl 1.1+ i686

sklearn_pmml_model-1.0.7-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.2 MB view hashes)

Uploaded CPython 3.12 manylinux: glibc 2.17+ x86-64

sklearn_pmml_model-1.0.7-cp312-cp312-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl (2.1 MB view hashes)

Uploaded CPython 3.12 manylinux: glibc 2.17+ i686 manylinux: glibc 2.5+ i686

sklearn_pmml_model-1.0.7-cp312-cp312-macosx_11_0_arm64.whl (462.6 kB view hashes)

Uploaded CPython 3.12 macOS 11.0+ ARM64

sklearn_pmml_model-1.0.7-cp312-cp312-macosx_10_9_x86_64.whl (482.8 kB view hashes)

Uploaded CPython 3.12 macOS 10.9+ x86-64

sklearn_pmml_model-1.0.7-cp311-cp311-win_amd64.whl (462.9 kB view hashes)

Uploaded CPython 3.11 Windows x86-64

sklearn_pmml_model-1.0.7-cp311-cp311-win32.whl (411.6 kB view hashes)

Uploaded CPython 3.11 Windows x86

sklearn_pmml_model-1.0.7-cp311-cp311-musllinux_1_1_x86_64.whl (2.4 MB view hashes)

Uploaded CPython 3.11 musllinux: musl 1.1+ x86-64

sklearn_pmml_model-1.0.7-cp311-cp311-musllinux_1_1_i686.whl (2.3 MB view hashes)

Uploaded CPython 3.11 musllinux: musl 1.1+ i686

sklearn_pmml_model-1.0.7-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.1 MB view hashes)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64

sklearn_pmml_model-1.0.7-cp311-cp311-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl (2.0 MB view hashes)

Uploaded CPython 3.11 manylinux: glibc 2.17+ i686 manylinux: glibc 2.5+ i686

sklearn_pmml_model-1.0.7-cp311-cp311-macosx_11_0_arm64.whl (465.3 kB view hashes)

Uploaded CPython 3.11 macOS 11.0+ ARM64

sklearn_pmml_model-1.0.7-cp311-cp311-macosx_10_9_x86_64.whl (482.8 kB view hashes)

Uploaded CPython 3.11 macOS 10.9+ x86-64

sklearn_pmml_model-1.0.7-cp310-cp310-win_amd64.whl (461.9 kB view hashes)

Uploaded CPython 3.10 Windows x86-64

sklearn_pmml_model-1.0.7-cp310-cp310-win32.whl (413.0 kB view hashes)

Uploaded CPython 3.10 Windows x86

sklearn_pmml_model-1.0.7-cp310-cp310-musllinux_1_1_x86_64.whl (2.2 MB view hashes)

Uploaded CPython 3.10 musllinux: musl 1.1+ x86-64

sklearn_pmml_model-1.0.7-cp310-cp310-musllinux_1_1_i686.whl (2.1 MB view hashes)

Uploaded CPython 3.10 musllinux: musl 1.1+ i686

sklearn_pmml_model-1.0.7-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.0 MB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

sklearn_pmml_model-1.0.7-cp310-cp310-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl (1.9 MB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.17+ i686 manylinux: glibc 2.5+ i686

sklearn_pmml_model-1.0.7-cp310-cp310-macosx_11_0_arm64.whl (464.9 kB view hashes)

Uploaded CPython 3.10 macOS 11.0+ ARM64

sklearn_pmml_model-1.0.7-cp310-cp310-macosx_10_9_x86_64.whl (481.9 kB view hashes)

Uploaded CPython 3.10 macOS 10.9+ x86-64

sklearn_pmml_model-1.0.7-cp39-cp39-win_amd64.whl (465.4 kB view hashes)

Uploaded CPython 3.9 Windows x86-64

sklearn_pmml_model-1.0.7-cp39-cp39-win32.whl (416.8 kB view hashes)

Uploaded CPython 3.9 Windows x86

sklearn_pmml_model-1.0.7-cp39-cp39-musllinux_1_1_x86_64.whl (2.2 MB view hashes)

Uploaded CPython 3.9 musllinux: musl 1.1+ x86-64

sklearn_pmml_model-1.0.7-cp39-cp39-musllinux_1_1_i686.whl (2.2 MB view hashes)

Uploaded CPython 3.9 musllinux: musl 1.1+ i686

sklearn_pmml_model-1.0.7-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.0 MB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

sklearn_pmml_model-1.0.7-cp39-cp39-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl (1.9 MB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.17+ i686 manylinux: glibc 2.5+ i686

sklearn_pmml_model-1.0.7-cp39-cp39-macosx_11_0_arm64.whl (468.4 kB view hashes)

Uploaded CPython 3.9 macOS 11.0+ ARM64

sklearn_pmml_model-1.0.7-cp39-cp39-macosx_10_9_x86_64.whl (485.4 kB view hashes)

Uploaded CPython 3.9 macOS 10.9+ x86-64

sklearn_pmml_model-1.0.7-cp38-cp38-win_amd64.whl (465.4 kB view hashes)

Uploaded CPython 3.8 Windows x86-64

sklearn_pmml_model-1.0.7-cp38-cp38-win32.whl (416.6 kB view hashes)

Uploaded CPython 3.8 Windows x86

sklearn_pmml_model-1.0.7-cp38-cp38-musllinux_1_1_x86_64.whl (2.3 MB view hashes)

Uploaded CPython 3.8 musllinux: musl 1.1+ x86-64

sklearn_pmml_model-1.0.7-cp38-cp38-musllinux_1_1_i686.whl (2.2 MB view hashes)

Uploaded CPython 3.8 musllinux: musl 1.1+ i686

sklearn_pmml_model-1.0.7-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.0 MB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64

sklearn_pmml_model-1.0.7-cp38-cp38-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl (1.9 MB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.17+ i686 manylinux: glibc 2.5+ i686

sklearn_pmml_model-1.0.7-cp38-cp38-macosx_11_0_arm64.whl (472.8 kB view hashes)

Uploaded CPython 3.8 macOS 11.0+ ARM64

sklearn_pmml_model-1.0.7-cp38-cp38-macosx_10_9_x86_64.whl (490.4 kB view hashes)

Uploaded CPython 3.8 macOS 10.9+ x86-64

sklearn_pmml_model-1.0.7-cp37-cp37m-win_amd64.whl (463.3 kB view hashes)

Uploaded CPython 3.7m Windows x86-64

sklearn_pmml_model-1.0.7-cp37-cp37m-win32.whl (412.7 kB view hashes)

Uploaded CPython 3.7m Windows x86

sklearn_pmml_model-1.0.7-cp37-cp37m-musllinux_1_1_x86_64.whl (2.0 MB view hashes)

Uploaded CPython 3.7m musllinux: musl 1.1+ x86-64

sklearn_pmml_model-1.0.7-cp37-cp37m-musllinux_1_1_i686.whl (2.0 MB view hashes)

Uploaded CPython 3.7m musllinux: musl 1.1+ i686

sklearn_pmml_model-1.0.7-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.9 MB view hashes)

Uploaded CPython 3.7m manylinux: glibc 2.17+ x86-64

sklearn_pmml_model-1.0.7-cp37-cp37m-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl (1.8 MB view hashes)

Uploaded CPython 3.7m manylinux: glibc 2.17+ i686 manylinux: glibc 2.5+ i686

sklearn_pmml_model-1.0.7-cp37-cp37m-macosx_10_9_x86_64.whl (487.4 kB view hashes)

Uploaded CPython 3.7m macOS 10.9+ x86-64

sklearn_pmml_model-1.0.7-cp36-cp36m-win_amd64.whl (454.4 kB view hashes)

Uploaded CPython 3.6m Windows x86-64

sklearn_pmml_model-1.0.7-cp36-cp36m-win32.whl (404.2 kB view hashes)

Uploaded CPython 3.6m Windows x86

sklearn_pmml_model-1.0.7-cp36-cp36m-musllinux_1_1_x86_64.whl (1.9 MB view hashes)

Uploaded CPython 3.6m musllinux: musl 1.1+ x86-64

sklearn_pmml_model-1.0.7-cp36-cp36m-musllinux_1_1_i686.whl (1.9 MB view hashes)

Uploaded CPython 3.6m musllinux: musl 1.1+ i686

sklearn_pmml_model-1.0.7-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.8 MB view hashes)

Uploaded CPython 3.6m manylinux: glibc 2.17+ x86-64

sklearn_pmml_model-1.0.7-cp36-cp36m-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl (1.7 MB view hashes)

Uploaded CPython 3.6m manylinux: glibc 2.17+ i686 manylinux: glibc 2.5+ i686

sklearn_pmml_model-1.0.7-cp36-cp36m-macosx_10_9_x86_64.whl (473.1 kB view hashes)

Uploaded CPython 3.6m macOS 10.9+ x86-64

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page