A library to parse PMML models into Scikit-learn estimators.
Project description
sklearn-pmml-model
A Python library that provides import functionality to all major estimator classes of the popular machine learning library scikit-learn using PMML. This enables portability and interoperability with a wide range of different languages, toolkits and enterprise software.
Installation
The easiest way is to use pip:
$ pip install sklearn-pmml-model
Status
This library is in beta, and currently not all models are supported. The library currently does support the following models:
Model | Classification | Regression | Categorical features |
---|---|---|---|
Decision Trees | ✅ | ✅ | ✅1 |
Random Forests | ✅ | ✅ | ✅1 |
Gradient Boosting | ✅ | ✅ | ✅1 |
Linear Regression | ✅ | ✅ | ✅3 |
Ridge | ✅2 | ✅ | ✅3 |
Lasso | ✅2 | ✅ | ✅3 |
ElasticNet | ✅2 | ✅ | ✅ |
Gaussian Naive Bayes | ✅ | ✅3 | |
Support Vector Machines | ✅ | ✅ | ✅3 |
1 Categorical feature support using slightly modified internals, based on scikit-learn#12866.
2 These models differ only in training characteristics, the resulting model is of the same form. Classification is supported using PMMLLogisticRegression
for regression models and PMMLRidgeClassifier
for general regression models.
3 By one-hot encoding categorical features automatically.
The following part of the specification is covered:
- Array (including typed variants)
- SparseArray (including typed variants)
- Indices
- Entries (including typed variants)
- DataDictionary
- DataField (continuous, categorical, ordinal)
- Value
- Interval
- DataField (continuous, categorical, ordinal)
- TransformationDictionary / LocalTransformations
- DerivedField
- TreeModel
- SimplePredicate
- SimpleSetPredicate
- Segmentation ('majorityVote' for Random Forests, 'modelChain' and 'sum' for Gradient Boosting)
- Regression
- RegressionTable
- NumericPredictor
- CategoricalPredictor
- RegressionTable
- GeneralRegressionModel (only linear models)
- PPMatrix
- PPCell
- ParamMatrix
- PCell
- PPMatrix
- NaiveBayesModel
- BayesInputs
- BayesInput
- TargetValueStats
- TargetValueStat
- GaussianDistribution
- TargetValueStat
- PairCounts
- TargetValueCounts
- TargetValueCount
- TargetValueCounts
- TargetValueStats
- BayesInput
- BayesInputs
- SupportVectorMachineModel
- LinearKernelType
- PolynomialKernelType
- RadialBasisKernelType
- SigmoidKernelType
- VectorDictionary
- VectorFields
- VectorInstance
- SupportVectorMachine
- SupportVectors
- SupportVector
- Coefficients
- Coefficient
- SupportVectors
Example
A minimal working example (using this PMML file) is shown below:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
import pandas as pd
import numpy as np
from sklearn_pmml_model.ensemble import PMMLForestClassifier
# Prepare data
iris = load_iris()
X = pd.DataFrame(iris.data)
X.columns = np.array(iris.feature_names)
y = pd.Series(np.array(iris.target_names)[iris.target])
y.name = "Class"
Xtr, Xte, ytr, yte = train_test_split(X, y, test_size=0.33, random_state=123)
clf = PMMLForestClassifier(pmml="models/randomForest.pmml")
clf.predict(Xte)
clf.score(Xte, yte)
More examples can be found in the subsequent packages: tree, ensemble, linear_model, naive_bayes and svm.
Benchmark
Depending on the data set and model, sklearn-pmml-model
is between 5 and a 1000 times faster than competing libraries, by leveraging the optimization and industry-tested robustness of sklearn
. Source code for this benchmark can be found in the corresponding jupyter notebook.
Running times (load + predict, in seconds)
Linear model | Naive Bayes | Decision tree | Random Forest | Gradient boosting | ||
---|---|---|---|---|---|---|
Wine | PyPMML |
0.773291 | 0.77384 | 0.777425 | 0.895204 | 0.902355 |
sklearn-pmml-model |
0.005813 | 0.006357 | 0.002693 | 0.108882 | 0.121823 | |
Breast cancer | PyPMML |
3.849855 | 3.878448 | 3.83623 | 4.16358 | 4.13766 |
sklearn-pmml-model |
0.015723 | 0.011278 | 0.002807 | 0.146234 | 0.044016 |
Improvement
Linear model | Naive Bayes | Decision tree | Random Forest | Gradient boosting | ||
---|---|---|---|---|---|---|
Wine | Improvement | 133× | 122× | 289× | 8× | 7× |
Breast cancer | Improvement | 245× | 344× | 1,367× | 28× | 94× |
Development
Prerequisites
Tests can be run using Py.test. Grab a local copy of the source:
$ git clone http://github.com/iamDecode/sklearn-pmml-model
$ cd sklearn-pmml-model
create a virtual environment and activating it:
$ python3 -m venv venv
$ source venv/bin/activate
and install the dependencies:
$ pip install -r requirements.txt
The final step is to build the Cython extensions:
$ python setup.py build_ext --inplace
Testing
You can execute tests with py.test by running:
$ python setup.py pytest
Contributing
Feel free to make a contribution. Please read CONTRIBUTING.md for more details.
License
This project is licensed under the BSD 2-Clause License - see the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for sklearn-pmml-model-0.0.20.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | d51a1725bcc582e8f8add3753014dab64ae0d1d4ac182c2ce867abcd89af2655 |
|
MD5 | 4c58553b4ec293d3757356d8296be153 |
|
BLAKE2b-256 | eb44a4009f9b0452d9beadeb6609cc1807b43d70d0c986200bf6efe183cb7153 |
Hashes for sklearn_pmml_model-0.0.20-pp37-pypy37_pp73-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 036ea1a1ae7a158e6bbf4bba6fdf6b87478b82afe421e3cabea11d83ad7baa4e |
|
MD5 | bb266aba07131307125fa73e8bd671d2 |
|
BLAKE2b-256 | 41c7e525e388d8ba7a6452631a9c5e37c27a106d21285db6c629ba339dbeae6e |
Hashes for sklearn_pmml_model-0.0.20-pp37-pypy37_pp73-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | bbb6d101091153e1960b9df53c1ff830a2c6eae48b44b0a2a9aa862247419c09 |
|
MD5 | 99df7dbf667a9c075e943ad0a07fa3f8 |
|
BLAKE2b-256 | 150e40bde42f912c35f4396c6833ff81202193c658b45afdf744b488d8c6f571 |
Hashes for sklearn_pmml_model-0.0.20-pp37-pypy37_pp73-manylinux_2_5_i686.manylinux1_i686.manylinux_2_12_i686.manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5800be2feb6f20cd18c54efff22ee226ac96376df068aca2023774eb50f6cea9 |
|
MD5 | 1ad9cffb38e1c3464802259f65689908 |
|
BLAKE2b-256 | 387c0446f9a4a8d879cb54c5be5a2ec85d652cfe3ba769abcf0b1c3338804f6e |
Hashes for sklearn_pmml_model-0.0.20-cp39-cp39-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2792e21dd975e1b3574d01b22aa6a53a4b824d75e3f49f79c349e9659b059347 |
|
MD5 | 25b8e2d7b95f9e6a07339f1d5ebf9a17 |
|
BLAKE2b-256 | d13e89cabb1fdd65893ad6d58d58560b751db1c0f0aa35fe943644047bbfd77e |
Hashes for sklearn_pmml_model-0.0.20-cp39-cp39-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 16725079fd416866424732893da36352694b688b9d748d2de8f24fa187eb699d |
|
MD5 | 4f8c23ffe829ab332f387eef6ea8fbce |
|
BLAKE2b-256 | e52a15640608cb37d7b219e794254d084cefab2aec5bc9e4bc10d84831f6fc00 |
Hashes for sklearn_pmml_model-0.0.20-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e196b4a2d4a799e010e7e1ecd1bc1bc0d69139ee16ae55b29d19d77bc5ecb50d |
|
MD5 | 8617f83e53dcbf06213e3a7e5bd139f4 |
|
BLAKE2b-256 | 8ba987a4976b1680aab77eaad54d75af679989167e811ded7143061e072f1ed4 |
Hashes for sklearn_pmml_model-0.0.20-cp39-cp39-manylinux_2_5_i686.manylinux1_i686.manylinux_2_12_i686.manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 071152f320ccf09cf0d987169508ba1a0b7e2e8d8d5ee9ddb9d8869b6cfa2e6e |
|
MD5 | cce1ba62021508b5a7fe7f69580b519e |
|
BLAKE2b-256 | 7a795d63adac10a8c141d8bdbe7e8f2de2acf5145aabbe73776ebe87699bd922 |
Hashes for sklearn_pmml_model-0.0.20-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b0b15ffd4481fdeabe0d8fce2af19b7f0e03ee88b7878c61eae9441d6df421b9 |
|
MD5 | 0dd46bbad0f518a7f1afd2413c8a307f |
|
BLAKE2b-256 | 6b0a425b5a0e3fa2bc6798ad04ab1cf4d58435af784a54d632e66c1f93448f74 |
Hashes for sklearn_pmml_model-0.0.20-cp38-cp38-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ba870bc81d4b7a5388466e981fffdcfabe50076e31329f9efb0eef65099ef2c7 |
|
MD5 | 913c17b19246a7ce15c29997344c0605 |
|
BLAKE2b-256 | 222c699dc9eb0aeb55c2b4497580bf10019e3ec69b4cb177010afe5e8219860b |
Hashes for sklearn_pmml_model-0.0.20-cp38-cp38-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c91f561b1bbd8da59e8a920bebd5475f38d4b360a3ace992a159f09aca330cd1 |
|
MD5 | 8041de523de37131f812de783f112109 |
|
BLAKE2b-256 | f86191ed25706c83963ec67fcc477170de2400535499311a0e297aed39bccfb0 |
Hashes for sklearn_pmml_model-0.0.20-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e6ce8b8af57e649f8fc59df78f84cbda3986f528638e8be417d6e159d569be8e |
|
MD5 | ad7343f64e10867459f506d9d19da214 |
|
BLAKE2b-256 | 1870ebadd8ec6501f8e481a185e5e54dae1414325648e794db695617fa7ddd5c |
Hashes for sklearn_pmml_model-0.0.20-cp38-cp38-manylinux_2_5_i686.manylinux1_i686.manylinux_2_12_i686.manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | dff600ca8102a08cfa068755679c894ba65840040bf85159b82e109c8d8cacee |
|
MD5 | a3650dc9a20d16873ae9caf2ff062769 |
|
BLAKE2b-256 | c5424bad0a943b8fcb7d5887b9bb179dd324f7f3fcb598c3e4081fe2c71dfb17 |
Hashes for sklearn_pmml_model-0.0.20-cp38-cp38-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5b2c14e6e98430f17b4d98e5291a3380531bfe0d2e2ab675857794370d41f02c |
|
MD5 | e234a7c2c0a1938836dbdeaa7db4735d |
|
BLAKE2b-256 | 3188e4b9a885eb9fe0eeb28b81c62c94ebd5bd749a51a88c6ec89dcc38accd60 |
Hashes for sklearn_pmml_model-0.0.20-cp37-cp37m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a1d39049116037e3fa3f9b81b3fbfb86b2ba14a8b8b3edf30668b2dc78a3f79b |
|
MD5 | aaec50173c955fbb5fff6cad4f6243e8 |
|
BLAKE2b-256 | a69a49dbebc90f47a5204df5e479ea51df9d67745d6bfefa8a5dc0f920162dea |
Hashes for sklearn_pmml_model-0.0.20-cp37-cp37m-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 64e85f8568a578bee0b1cbc05a9c7d14ebdd2001fa72942d049f8d38059a26e0 |
|
MD5 | fe7dc306eedd33ce2296a5d110c6a1ec |
|
BLAKE2b-256 | ed086daa32b8ed1f700dd2ba896b4db6afe86799ae8378967eb3a2276579e131 |
Hashes for sklearn_pmml_model-0.0.20-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9ed879d6c9a45c6739a62b1cb56e7143c58776ac9ff1d33791cfbe7f9a79044e |
|
MD5 | 07502a9c1e012dc8de643aacbabbd310 |
|
BLAKE2b-256 | a416d14ba0e49d4e7e4274d3d25ab32c1573fe92d0cd5f2612dc7ecd22d9ee44 |
Hashes for sklearn_pmml_model-0.0.20-cp37-cp37m-manylinux_2_5_i686.manylinux1_i686.manylinux_2_12_i686.manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 81be6902c369e2465264d84c37251111bb0d08d687be68fa0c2d599295a502fd |
|
MD5 | 5c52276c5ff38554aecfb46541ccab95 |
|
BLAKE2b-256 | a835f8e7af60e22aed38de3d826b5ecf4b860c13187df598c1e20d7b04e7d563 |
Hashes for sklearn_pmml_model-0.0.20-cp37-cp37m-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 24a850ae2b4bb08966c1b29d7b32f74b8c1b4ef5c19bfc876448fb1cd146c453 |
|
MD5 | 4ac3307bed25b71cc67ed53123101003 |
|
BLAKE2b-256 | 0cb6d4cc06461894b251e2f7b124184678a7de3bdff9191cbc1b3aa4efa60972 |
Hashes for sklearn_pmml_model-0.0.20-cp36-cp36m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 79208329a98b0f91cc8fe5bd32fad739390f7ed230ad80033f1d657f47da1d38 |
|
MD5 | 75992efc052f239c664645808c29989c |
|
BLAKE2b-256 | 5d45cfa66c6a4e730463f817c15fe5d7c3a38199ab79c145a2aa4a2750bfac36 |
Hashes for sklearn_pmml_model-0.0.20-cp36-cp36m-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 62d6311b83c3104e551483cfd9125740d985212473ddc66720a63e91bd6a321e |
|
MD5 | af9b7cae068edfe219feb62e4ac8ea59 |
|
BLAKE2b-256 | f0a31b0bd25d318a3cb9a890490ed5910555cbff4a0351ed670f0db48b6ff519 |
Hashes for sklearn_pmml_model-0.0.20-cp36-cp36m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7efbfc406281e46ad2165923c5547b06eada43b034b39de500bb1f3832da5565 |
|
MD5 | 544b3e5857c182381b702518f78f731c |
|
BLAKE2b-256 | 6d26a3bdb8a7b4c5527f390b70c5bdff426b9ffce6007ccfb5f6fe0bb42410e1 |
Hashes for sklearn_pmml_model-0.0.20-cp36-cp36m-manylinux_2_5_i686.manylinux1_i686.manylinux_2_12_i686.manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2fc3121018e270d6576656ae9d89b9900c04d7de6e3e7720132fea8354a27f3f |
|
MD5 | 3cf1c9f487dc3fcd8f1ec46eea72c43b |
|
BLAKE2b-256 | dd2ea91a1aca637d6986cca6a3c191a99a228a36ce52c9dd5f1fd1f3a5669fda |
Hashes for sklearn_pmml_model-0.0.20-cp36-cp36m-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7f66fece8ef3dd7fd7a3dcf3cf41d9a943bc479f8ec9afaf31b12cfafa9fa53b |
|
MD5 | f431737eae9158bcb5ad1747e46a32fe |
|
BLAKE2b-256 | 8d1320bb8114e025b4145c5cd8a56506d8780232d6462b1205c389b99d0d35ac |