A library to parse PMML models into Scikit-learn estimators.
Project description
sklearn-pmml-model
A Python library that provides import functionality to all major estimator classes of the popular machine learning library scikit-learn using PMML.
Installation
The easiest way is to use pip:
$ pip install sklearn-pmml-model
Status
This library is in beta, and currently not all models are supported. The library currently does support the following models:
Model | Classification | Regression | Categorical features |
---|---|---|---|
Decision Trees | ✅ | ✅ | ✅1 |
Random Forests | ✅ | ✅ | ✅1 |
Gradient Boosting | ✅ | ✅ | ✅1 |
Linear Regression | ✅ | ✅ | ✅3 |
Ridge | ✅2 | ✅ | ✅3 |
Lasso | ✅2 | ✅ | ✅3 |
ElasticNet | ✅2 | ✅ | ✅ |
Gaussian Naive Bayes | ✅ | ✅3 | |
Support Vector Machines | ✅ | ✅ | ✅3 |
1 Categorical feature support using slightly modified internals, based on scikit-learn#12866.
2 These models differ only in training characteristics, the resulting model is of the same form. Classification is supported using PMMLLogisticRegression
for regression models and PMMLRidgeClassifier
for general regression models.
3 By one-hot encoding categorical features automatically.
The following part of the specification is covered:
- Array (including typed variants)
- SparseArray (including typed variants)
- Indices
- Entries (including typed variants)
- DataDictionary
- DataField (continuous, categorical, ordinal)
- Value
- Interval
- DataField (continuous, categorical, ordinal)
- TransformationDictionary / LocalTransformations
- DerivedField
- TreeModel
- SimplePredicate
- SimpleSetPredicate
- Segmentation ('majorityVote' for Random Forests, 'modelChain' and 'sum' for Gradient Boosting)
- Regression
- RegressionTable
- NumericPredictor
- CategoricalPredictor
- RegressionTable
- GeneralRegressionModel (only linear models)
- PPMatrix
- PPCell
- ParamMatrix
- PCell
- PPMatrix
- NaiveBayesModel
- BayesInputs
- BayesInput
- TargetValueStats
- TargetValueStat
- GaussianDistribution
- TargetValueStat
- PairCounts
- TargetValueCounts
- TargetValueCount
- TargetValueCounts
- TargetValueStats
- BayesInput
- BayesInputs
- SupportVectorMachineModel
- LinearKernelType
- PolynomialKernelType
- RadialBasisKernelType
- SigmoidKernelType
- VectorDictionary
- VectorFields
- VectorInstance
- SupportVectorMachine
- SupportVectors
- SupportVector
- Coefficients
- Coefficient
- SupportVectors
Example
A minimal working example is shown below:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
import pandas as pd
import numpy as np
from sklearn_pmml_model.ensemble import PMMLForestClassifier
# Prepare data
iris = load_iris()
X = pd.DataFrame(iris.data)
X.columns = np.array(iris.feature_names)
y = pd.Series(np.array(iris.target_names)[iris.target])
y.name = "Class"
Xtr, Xte, ytr, yte = train_test_split(X, y, test_size=0.33, random_state=123)
clf = PMMLForestClassifier(pmml="models/randomForest.pmml")
clf.predict(Xte)
clf.score(Xte, yte)
More examples can be found in the subsequent packages: tree, ensemble, linear_model and naive_bayes.
Benchmark
Depending on the data set and model, sklearn-pmml-model
is between 5 and a 1000 times faster than competing libraries, by leveraging the optimization and industry-tested robustness of sklearn
. Source code for this benchmark can be found in the corresponding jupyter notebook.
Running times (load + predict, in seconds)
Linear model | Naive Bayes | Decision tree | Random Forest | Gradient boosting | ||
---|---|---|---|---|---|---|
Wine | PyPMML |
0.773291 | 0.77384 | 0.777425 | 0.895204 | 0.902355 |
sklearn-pmml-model |
0.005813 | 0.006357 | 0.002693 | 0.108882 | 0.121823 | |
Breast cancer | PyPMML |
3.849855 | 3.878448 | 3.83623 | 4.16358 | 4.13766 |
sklearn-pmml-model |
0.015723 | 0.011278 | 0.002807 | 0.146234 | 0.044016 |
Improvement
Linear model | Naive Bayes | Decision tree | Random Forest | Gradient boosting | ||
---|---|---|---|---|---|---|
Wine | Improvement | 133× | 122× | 289× | 8× | 7× |
Breast cancer | Improvement | 245× | 344× | 1,367× | 28× | 94× |
Development
Prerequisites
Tests can be run using Py.test. Grab a local copy of the source:
$ git clone http://github.com/iamDecode/sklearn-pmml-model
$ cd sklearn-pmml-model
create a virtual environment and activating it:
$ python3 -m venv venv
$ source venv/bin/activate
and install the dependencies:
$ pip install -r requirements.txt
The final step is to build the Cython extensions:
$ python setup.py build_ext --inplace
Testing
You can execute tests with py.test by running:
$ python setup.py pytest
Contributing
Feel free to make a contribution. Please read CONTRIBUTING.md for more details.
License
This project is licensed under the BSD 2-Clause License - see the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for sklearn-pmml-model-0.0.19.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3ba009a56ac8904a961dba964394fb447b62a79f88222f6d0b166c8f6ed500fe |
|
MD5 | 0e067d439fd80cfc79db38eec3a8162e |
|
BLAKE2b-256 | f2abcce20139afd49c4d0bd841da83d06b65ed27c50a2e78fec04ec0a81901b6 |
Hashes for sklearn_pmml_model-0.0.19-pp37-pypy37_pp73-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3fb8bcdaf8a89716b9a1fa2d6a49be990b7d46ed0d1a5a2181bd9b3f22bedd50 |
|
MD5 | 0a4828ab6ad30e4501d8593dba359a38 |
|
BLAKE2b-256 | 7fb4d105a070000e0bebe2ab28a8619581fbdff198372c416be6edd6e4cbfa90 |
Hashes for sklearn_pmml_model-0.0.19-pp37-pypy37_pp73-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e636b3cdeb36e3d7d91c7e8b7563379792c390e0019c433e5785b5a9557f4433 |
|
MD5 | 7aa1a9600f9367ee1ef2c29d2809f0f0 |
|
BLAKE2b-256 | ce7d06aba077852044a3102024af0df88a6e8180e547f489a87a876144ab9bd0 |
Hashes for sklearn_pmml_model-0.0.19-pp37-pypy37_pp73-manylinux_2_5_i686.manylinux1_i686.manylinux_2_12_i686.manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b2c149591b8935275847fb7c117029b91dd3b3055a5da32741b1a336893a849f |
|
MD5 | ad8c71e34a02e72933933ed7c2c4a3f8 |
|
BLAKE2b-256 | fd662115643d20e36f9912096210c0a2306bcca6a95c0cd6fe15feccba799e11 |
Hashes for sklearn_pmml_model-0.0.19-cp39-cp39-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ecbd62750f0049094db5b75575dc5ec92abd092b2bf57aa1aee7359203705f8b |
|
MD5 | 301012454c18e7b3a96299b1e6ae4019 |
|
BLAKE2b-256 | b1b07843dbcf6cb376c71496fc87bf0736d55110bb48c601b49eb2d504fc66bc |
Hashes for sklearn_pmml_model-0.0.19-cp39-cp39-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 273563618952f548ceb28c4c836a8e7a90b3c471aa379ddcd553b4e82972064d |
|
MD5 | 89478b1a79b4bbb0ed356fd117d99464 |
|
BLAKE2b-256 | 35d4758d36c7205a3d90ae9d7c1665df916bd8d86b4539f914b62c64147131ab |
Hashes for sklearn_pmml_model-0.0.19-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d91c7a9adcae7ddb5d09e12870dfcca88bfc83d9e375b1187e68ed9af6b31325 |
|
MD5 | d1a85297081f129d05a3caed7720867d |
|
BLAKE2b-256 | f71eaf598303773d01fdfc007a1ba25a173807b65ddb33fe7cba32af191b427c |
Hashes for sklearn_pmml_model-0.0.19-cp39-cp39-manylinux_2_5_i686.manylinux1_i686.manylinux_2_12_i686.manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d5e2adb217c5dd2ba7af9ebad10474ddbe6df6c16809e0d1cef246103743877f |
|
MD5 | dbb7b7361a10e4041e671eb15ba8bd58 |
|
BLAKE2b-256 | c5f17b63fc5f3607186c09ef2c19d2d8cfd3ab3e49747e442580ca6f3fb86860 |
Hashes for sklearn_pmml_model-0.0.19-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3e8ad5ad96cc4645b25e100ab0114b8de5d3d882f4a3aa9fabb4bb323af85d1a |
|
MD5 | a1073720b4f5c70c0960f58bea6934af |
|
BLAKE2b-256 | 269a97a95c3c3ac959c8cf557f98e3805dc318969c0813638d314e4ddebc93b7 |
Hashes for sklearn_pmml_model-0.0.19-cp38-cp38-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 505d5489b3cb23f724c7a2bf6523ffbf863fe46d41be27314f995919cbdbc348 |
|
MD5 | 59e17efee615c670cce92c5c81fb6b42 |
|
BLAKE2b-256 | b963bb4b894a6160c3354f14ae165ae73b42bce11bfdeced5e39841262d9e40a |
Hashes for sklearn_pmml_model-0.0.19-cp38-cp38-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9b6c05044011e4e49b5bea9d3f70ce100df6725df9defc5f2789802477bcd362 |
|
MD5 | ceaa14f85216d9f1b205989430c8f9f9 |
|
BLAKE2b-256 | 5b3e7423d623d375c398d411ea5c0e93707a39903f9bc5be896f0828f3045b37 |
Hashes for sklearn_pmml_model-0.0.19-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ab96ea18489cf1c3ac9ca3eda01b62ddf35410b6d9ab795c2b70fecae49a478c |
|
MD5 | 408abd4ad158b08d001b8d091509e7d1 |
|
BLAKE2b-256 | 1e887e9662bca76729d6312118f82c8244e40115cbb2d4834a4b5a6cce517855 |
Hashes for sklearn_pmml_model-0.0.19-cp38-cp38-manylinux_2_5_i686.manylinux1_i686.manylinux_2_12_i686.manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | cab8239ac3fe56053f5aa31f921f8c5aea5955a9563f62427e69bf67ab2415b3 |
|
MD5 | 255b30cedfa2006d2e317d8d229fc9a9 |
|
BLAKE2b-256 | 40c274c722abc342b34fd3c022e4a327d45f5a09dac50968b451ccca79315a1d |
Hashes for sklearn_pmml_model-0.0.19-cp38-cp38-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d8f4ad1ac91b6b28686eee8b6b4fb003deb94908578c321bbaf447b0d0e7d4eb |
|
MD5 | 421a53e392c0622bbf80a95d8d295c45 |
|
BLAKE2b-256 | 153223ab5b11fed36a07d65b4a46d4e59720744f1c22c314e189ee089cf1889c |
Hashes for sklearn_pmml_model-0.0.19-cp37-cp37m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a721d96d4356d854416308d3facbb2c7f09fb75881b025497a919f63255ca43e |
|
MD5 | f81e5542ebb85f4eba5a592f8e45099f |
|
BLAKE2b-256 | f28242cc4a0f2577f3a9ae90019c12406b6075f39bd40fb56ef606ae7727a840 |
Hashes for sklearn_pmml_model-0.0.19-cp37-cp37m-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1590f6d5002793412282cec531cad285e04de05c864b3abe34ae053f5054a85a |
|
MD5 | c154fa646e190c1b35733960f106f39f |
|
BLAKE2b-256 | 8624a3fda3f4e3493c01d85e4a5f56894053e806f1673b2b2c9e98c2e7833566 |
Hashes for sklearn_pmml_model-0.0.19-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9ef86ab732a89f61d56ae9810811c38b2dd1b3b650e594fd4e520e575918b3ee |
|
MD5 | 0732687ae874a4e0434dfdb7faaa0d55 |
|
BLAKE2b-256 | 06b4ef45d8123447181e30f1090b6570527bd60a06e5c86ac09c9e2b5794cc97 |
Hashes for sklearn_pmml_model-0.0.19-cp37-cp37m-manylinux_2_5_i686.manylinux1_i686.manylinux_2_12_i686.manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 22374c96620068e11cb332a58245f1fec9bbd1c3b279cd49e234c7e450f5d153 |
|
MD5 | 554d9cbead42bbd1df5afc4e3d7bed8c |
|
BLAKE2b-256 | 9a37f8e2887ece9e1e2a2b96b51a9f41c7927b2c6d38583c11d065d666d52042 |
Hashes for sklearn_pmml_model-0.0.19-cp37-cp37m-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8683c9e7a891ae8446969c1a189eebd6078b2f5df02d1e9ee71bdadf0d2f1880 |
|
MD5 | 37b36daf06cadebc02bd568f245a3a0c |
|
BLAKE2b-256 | 17764d15e680e31dee4cb5eb278aeb8cff53e1239850e7075f65232692e0ae82 |
Hashes for sklearn_pmml_model-0.0.19-cp36-cp36m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f0acd0c1f543702e3e376a31a88d43c8fdb033327eb47b92cb1ec05a5689326b |
|
MD5 | 160534e6f953f5196cff7d68af35a252 |
|
BLAKE2b-256 | e660d88c09ae681d8b3819f4421f3a854cb840099476b41be5f0b5ea0e69162d |
Hashes for sklearn_pmml_model-0.0.19-cp36-cp36m-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | dd376b753836d33a42844af55c2e5ca5c4e60ce748681d198696b29968f17f9a |
|
MD5 | 360cfce8c25a245302159c4619437350 |
|
BLAKE2b-256 | c08530f8bbe07b37d29c645ff52bc8592c98802e2279640641cdd9858b7b7272 |
Hashes for sklearn_pmml_model-0.0.19-cp36-cp36m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | bb6a7b56dae620981257acb0ced386dccc1d750d4030faeec2b94213a4c4e2e8 |
|
MD5 | fc1c1e773bf22cd279b41ec0a27b7a7a |
|
BLAKE2b-256 | 4c99566231600909e1e0d10e0b145a54934195df14e73d96f4e328fee7b1428a |
Hashes for sklearn_pmml_model-0.0.19-cp36-cp36m-manylinux_2_5_i686.manylinux1_i686.manylinux_2_12_i686.manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d6aa647761b8de9ac6accd09a9d8ae032ee14c8d32691c6c6cc683ecd81c164d |
|
MD5 | 559fdc3fec6d515ba356a35c08192e1a |
|
BLAKE2b-256 | 252a308f2f085d534034db466d2d22b969d40dde2e7b94db7e1bc12db671b2a0 |
Hashes for sklearn_pmml_model-0.0.19-cp36-cp36m-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 46b8e53434acdf3b8feb05d556d9ed2156ffa0b7bb066647459cd553f104e88d |
|
MD5 | 86bff3f5d1a041296f305f81e6443871 |
|
BLAKE2b-256 | 3861a4309ee14592e057486f8da498b9cacab493bb8ea32b129beb9be769caec |