Skip to main content

JSON serialization for scikit-learn models

Project description

sklearn-serialize

JSON serialization for scikit-learn pipelines and the Python/NumPy/SciPy/pandas types that appear inside them. The goal is lossless round-trips: json_to_data(data_to_json(obj)) reproduces the original object, including fitted model state.

from sklearn_serialize import data_to_json, json_to_data

json_str = data_to_json(fitted_pipeline)
restored  = json_to_data(json_str)

restored.predict(X)  # identical output to the original

Installation

pip install sklearn-serialize

Supported types

  • sklearn: Pipeline, FeatureUnion, ColumnTransformer, and any BaseEstimator subclass
  • NumPy: ndarray, scalar integer/float/complex types, datetime64, dtype, Generator, RandomState
  • SciPy: sparse matrices (csr, csc, coo, lil, dok)
  • pandas: Series, DataFrame
  • polars: Series, DataFrame (optional — requires polars to be installed)
  • Python: tuple, set, frozenset, bytes, bytearray, slice, complex, OrderedDict, namedtuple, datetime, date

Custom estimators

Custom estimators work out of the box as long as their class is importable at deserialization time. Call trust_module once at startup to allow deserialization from your package:

from sklearn_serialize import trust_module

trust_module("my_package.transformers")

The argument is a module prefix — "my_package" covers my_package.transformers, my_package.pipelines, etc. Only exact matches and dotted submodules are allowed; "my_pack" does not cover "my_package".

json_to_data will raise ValueError if it encounters a class from an untrusted module. This prevents arbitrary code execution when deserializing JSON from untrusted sources. Only call json_to_data on JSON you produced yourself or received from a trusted source.

The default trusted set covers sklearn, numpy, scipy, pandas, builtins, and sklearn_serialize.

To trust modules globally without calling trust_module in every script, create ~/.sklearnserialize:

[trusted_modules]
my_package
polars

One module prefix per line. Blank lines and lines starting with # are ignored. This file is loaded once at import time.

Project details


Release history Release notifications | RSS feed

This version

0.1

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sklearn_serialize-0.1.tar.gz (18.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sklearn_serialize-0.1-py3-none-any.whl (14.6 kB view details)

Uploaded Python 3

File details

Details for the file sklearn_serialize-0.1.tar.gz.

File metadata

  • Download URL: sklearn_serialize-0.1.tar.gz
  • Upload date:
  • Size: 18.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for sklearn_serialize-0.1.tar.gz
Algorithm Hash digest
SHA256 d8f30d6e18b2b998ee00cd94e1771f439e9598ef304f3a2398b0f60613b240ec
MD5 0a30941b3bccecb4908ee101fac812fa
BLAKE2b-256 31537ef0faa6355abac0ae23a61960ef0faa9e1702fec3f9803fb9e0ec9d9103

See more details on using hashes here.

Provenance

The following attestation bundles were made for sklearn_serialize-0.1.tar.gz:

Publisher: release.yaml on jessegrabowski/sklearn-serialize

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sklearn_serialize-0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for sklearn_serialize-0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 fcc4f194bb2b84675bd4ed7b626f60754dc3617407752a7780e6a3c710cb4062
MD5 fa61700359140ce24920086403abe85e
BLAKE2b-256 54ca0c00e8e6b8605a5ae8bccea93d07bff40378b5de0884631d49dfdc9a362e

See more details on using hashes here.

Provenance

The following attestation bundles were made for sklearn_serialize-0.1-py3-none-any.whl:

Publisher: release.yaml on jessegrabowski/sklearn-serialize

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page