JSON serialization for scikit-learn models
Project description
sklearn-serialize
JSON serialization for scikit-learn pipelines and the Python/NumPy/SciPy/pandas types that appear inside them. The goal is lossless round-trips: json_to_data(data_to_json(obj)) reproduces the original object, including fitted model state.
from sklearn_serialize import data_to_json, json_to_data
json_str = data_to_json(fitted_pipeline)
restored = json_to_data(json_str)
restored.predict(X) # identical output to the original
Installation
pip install sklearn-serialize
Supported types
- sklearn:
Pipeline,FeatureUnion,ColumnTransformer, and anyBaseEstimatorsubclass - NumPy:
ndarray, scalar integer/float/complex types,datetime64,dtype,Generator,RandomState - SciPy: sparse matrices (
csr,csc,coo,lil,dok) - pandas:
Series,DataFrame - polars:
Series,DataFrame(optional — requirespolarsto be installed) - Python:
tuple,set,frozenset,bytes,bytearray,slice,complex,OrderedDict,namedtuple,datetime,date
Custom estimators
Custom estimators work out of the box as long as their class is importable at deserialization time. Call trust_module once at startup to allow deserialization from your package:
from sklearn_serialize import trust_module
trust_module("my_package.transformers")
The argument is a module prefix — "my_package" covers my_package.transformers, my_package.pipelines, etc. Only exact matches and dotted submodules are allowed; "my_pack" does not cover "my_package".
json_to_data will raise ValueError if it encounters a class from an untrusted module. This prevents arbitrary code execution when deserializing JSON from untrusted sources. Only call json_to_data on JSON you produced yourself or received from a trusted source.
The default trusted set covers sklearn, numpy, scipy, pandas, builtins, and sklearn_serialize.
To trust modules globally without calling trust_module in every script, create ~/.sklearnserialize:
[trusted_modules]
my_package
polars
One module prefix per line. Blank lines and lines starting with # are ignored. This file is loaded once at import time.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sklearn_serialize-0.1.tar.gz.
File metadata
- Download URL: sklearn_serialize-0.1.tar.gz
- Upload date:
- Size: 18.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d8f30d6e18b2b998ee00cd94e1771f439e9598ef304f3a2398b0f60613b240ec
|
|
| MD5 |
0a30941b3bccecb4908ee101fac812fa
|
|
| BLAKE2b-256 |
31537ef0faa6355abac0ae23a61960ef0faa9e1702fec3f9803fb9e0ec9d9103
|
Provenance
The following attestation bundles were made for sklearn_serialize-0.1.tar.gz:
Publisher:
release.yaml on jessegrabowski/sklearn-serialize
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
sklearn_serialize-0.1.tar.gz -
Subject digest:
d8f30d6e18b2b998ee00cd94e1771f439e9598ef304f3a2398b0f60613b240ec - Sigstore transparency entry: 1189187347
- Sigstore integration time:
-
Permalink:
jessegrabowski/sklearn-serialize@1035dc7e9bd29d35071620c507d6f4759b7caeb2 -
Branch / Tag:
refs/tags/v0.1 - Owner: https://github.com/jessegrabowski
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yaml@1035dc7e9bd29d35071620c507d6f4759b7caeb2 -
Trigger Event:
push
-
Statement type:
File details
Details for the file sklearn_serialize-0.1-py3-none-any.whl.
File metadata
- Download URL: sklearn_serialize-0.1-py3-none-any.whl
- Upload date:
- Size: 14.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fcc4f194bb2b84675bd4ed7b626f60754dc3617407752a7780e6a3c710cb4062
|
|
| MD5 |
fa61700359140ce24920086403abe85e
|
|
| BLAKE2b-256 |
54ca0c00e8e6b8605a5ae8bccea93d07bff40378b5de0884631d49dfdc9a362e
|
Provenance
The following attestation bundles were made for sklearn_serialize-0.1-py3-none-any.whl:
Publisher:
release.yaml on jessegrabowski/sklearn-serialize
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
sklearn_serialize-0.1-py3-none-any.whl -
Subject digest:
fcc4f194bb2b84675bd4ed7b626f60754dc3617407752a7780e6a3c710cb4062 - Sigstore transparency entry: 1189187350
- Sigstore integration time:
-
Permalink:
jessegrabowski/sklearn-serialize@1035dc7e9bd29d35071620c507d6f4759b7caeb2 -
Branch / Tag:
refs/tags/v0.1 - Owner: https://github.com/jessegrabowski
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yaml@1035dc7e9bd29d35071620c507d6f4759b7caeb2 -
Trigger Event:
push
-
Statement type: