A utility to migrate scikit-learn models between versions.

These details have not been verified by PyPI

Project links

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

sklearn-migrator 🧪

A Python library to serialize and migrate scikit-learn models across incompatible versions.

Python versions

sklearn-migrator

🚀 Motivation

Machine learning teams frequently store trained scikit-learn models using pickle or joblib.
However:

❌ These serialized models break when scikit-learn versions change

Internal attributes change
APIs evolve (e.g., affinity → metric)
Tree and boosting internals get reorganized
New default parameters appear

❌ This creates real problems:

Production services fail after dependency upgrades
Research becomes non-reproducible
Long-term model governance becomes impossible
Models can't be migrated or audited reliably

✅ What `sklearn-migrator` provides

✔ Serialize any supported model into a JSON-compatible dictionary

✔ Deserialize and reconstruct the model in a different scikit-learn version

✔ Remove dependency on pickle/joblib for long-term storage

✔ Enable reproducible ML pipelines across environments

This library has been validated across 1,024 version migration pairs (from → to), covering:

0.21.3 → 1.7.2

💡 Supported Models (21 models)

sklearn-migrator supports 21 core models across classification, regression, clustering, and dimensionality reduction.

📘 Classification

Model	Supported
DecisionTreeClassifier	✅
RandomForestClassifier	✅
GradientBoostingClassifier	✅
LogisticRegression	✅
KNeighborsClassifier	✅
SVC (Support Vector Classifier)	✅
MLPClassifier	✅

📗 Regression

Model	Supported
DecisionTreeRegressor	✅
RandomForestRegressor	✅
GradientBoostingRegressor	✅
LinearRegression	✅
Ridge	✅
Lasso	✅
KNeighborsRegressor	✅
SVR (Support Vector Regressor)	✅
AdaBoostRegressor	✅
MLPRegressor	✅

📙 Clustering

Model	Supported
KMeans	✅
MiniBatchKMeans	✅
Agglomerative	✅

📘 Dimensionality Reduction

Model	Supported
PCA	✅

🔢 Version Compatibility Matrix

The library supports model migrations across the full matrix:

32 versions
1,024 migration pairs
Fully tested using automated environments via CI/CD on every push

versions = [
    '0.21.3', '0.22.0', '0.22.1', '0.23.0', '0.23.1', '0.23.2',
    '0.24.0', '0.24.1', '0.24.2', '1.0.0', '1.0.1', '1.0.2',
    '1.1.0', '1.1.1', '1.1.2', '1.1.3', '1.2.0', '1.2.1', '1.2.2',
    '1.3.0', '1.3.1', '1.3.2', '1.4.0', '1.4.2', '1.5.0', '1.5.1',
    '1.5.2', '1.6.0', '1.6.1', '1.7.0', '1.7.1', '1.7.2'
]

From \ To	0.21.3	0.22.0	...	1.7.2
0.21.3	✅	✅	...	✅
0.22.0	✅	✅	...	✅
...	...	...	...	...
1.7.2	✅	✅	...	✅

📊 Validation Metric

Each migration pair (version_in, version_out) is validated using:

$$\max |y_{\text{in}} - y_{\text{out}}| < 10^{-2}$$

Where:

$y_{\text{in}}$: predictions from the model in the source version
$y_{\text{out}}$: predictions from the migrated model in the target version

The worst case across all 1,024 pairs is obtained via:

df_performance.abs().max().max()  # global worst case (32x32 matrix)

⚠️ All 1,024 combinations and 21 models are automatically tested on every push via CI/CD, using isolated Docker environments for each sklearn version. Each model is validated under a representative parameter configuration; exhaustive combinatorial testing of all parameter combinations is outside the current scope.

📂 Installation

pip install sklearn-migrator

📚 API Documentation

For full API documentation covering all 21 models, function signatures, parameters, return types, and usage examples, see API.md.

💥 Use Cases

Long-term model storage: Store models in a future-proof format across teams and systems.
Production model migration: Move models safely across major scikit-learn upgrades.
Auditing and inspection: Read serialized models as JSON, inspect structure, hyperparameters, and internals.
Cross-platform inference: Serialize in Python, serve elsewhere (e.g., microservices).

1. Using two python environments

You can serialize the model from an environment with a scikit-learn version (for example 1.5.0) and then deserialize the model from another environment with a different version (for example 1.7.0).

The deserialized model has the version of the environment where you deserialized it. In this case 1.7.0.

It is important to understand what version of scikit-learn you want to migrate from, and what version you want to migrate to, in order to create the appropriate environments for serialization and deserialization.

a. Serialize the model

import json
import sklearn
import numpy as np
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn_migrator.regression.random_forest_reg import serialize_random_forest_reg

X, y = make_regression(n_samples=200, n_features=10, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

model = RandomForestRegressor().fit(X_train, y_train)
predictions = model.predict(X_test)
data = serialize_random_forest_reg(model, sklearn.__version__)

with open("model.json", "w") as f:
    json.dump(data, f)

b. Deserialize the model

import json
import sklearn
from sklearn_migrator.regression.random_forest_reg import deserialize_random_forest_reg

with open("model.json") as f:
    data = json.load(f)

new_model = deserialize_random_forest_reg(data, sklearn.__version__)
new_predictions = new_model.predict(X_test)

2. Docker: Step by Step

You have a Random Forest Classifier saved in a .pkl format and it is called model.pkl. The version of this model is 1.5.0.

i. Create in your Desktop the next folder:

/test_github

And copy your model.pkl in this folder.

ii. The Dockerfiles and requirements for all supported input versions are available in the integration/environments/input/ directory of this repository. Copy the files for your input version (e.g., 1.5.0):

/test_github/input/1.5.0/Dockerfile_input
/test_github/input/1.5.0/requirements_input.txt

iii. The Dockerfiles and requirements for all supported output versions are available in the integration/environments/output/ directory of this repository. Copy the files for your output version (e.g., 1.7.0):

/test_github/output/1.7.0/Dockerfile_output
/test_github/output/1.7.0/requirements_output.txt

iv. Now you create your input.py:

import json
import joblib
import sklearn
import numpy as np
import pandas as pd
from joblib import load

from sklearn.ensemble import RandomForestClassifier

from sklearn_migrator.classification.random_forest_clf import serialize_random_forest_clf

version_sklearn_in = sklearn.__version__

model = load('model.pkl')

all_data = serialize_random_forest_clf(model, version_sklearn_in)

def convert(o):
    if isinstance(o, (np.integer, np.int64)):
        return int(o)
    elif isinstance(o, (np.floating, np.float64)):
        return float(o)
    elif isinstance(o, np.ndarray):
        return o.tolist()
    else:
        raise TypeError(f"Object of type {type(o).__name__} is not JSON serializable")

with open("input_model/all_data.json", "w") as f:
    json.dump(all_data, f, default=convert)

fake_row = np.array([[0.5, -1.2, 0.3, 1.1, -0.7, 0.9, 0.0, -0.3, 1.5, 0.2]])

y_pred = pd.DataFrame(model.predict_proba(fake_row))
y_pred.to_csv('input_model/y_pred.csv', index=False)

v. Now you create your output.py:

import json
import joblib
import sklearn
import numpy as np
import pandas as pd
from joblib import load

from sklearn.ensemble import RandomForestClassifier

from sklearn_migrator.classification.random_forest_clf import deserialize_random_forest_clf

version_sklearn_out = sklearn.__version__

with open("input_model/all_data.json", "r") as f:
    all_data = json.load(f)

new_model = deserialize_random_forest_clf(all_data, version_sklearn_out)

joblib.dump(new_model, 'output_model/new_model.pkl')

fake_row = np.array([[0.5, -1.2, 0.3, 1.1, -0.7, 0.9, 0.0, -0.3, 1.5, 0.2]])

y_pred_new = pd.DataFrame(new_model.predict_proba(fake_row))
y_pred_new.to_csv('output_model/y_pred_new.csv', index=False)

vi. Now you copy all the files:

cp input/1.5.0/* output/1.7.0/* .

vii. Now you create two folders: input_model/ and output_model/.

viii. Execute the next commands in your terminal (you should be in the root of test_github/ folder):

docker build -f Dockerfile_input -t image_input_1.5.0 .
docker build -f Dockerfile_output -t image_output_1.7.0 .

docker run --rm \
  -v "$(pwd)/input_model:/app/input_model" \
  -v "$(pwd)/model.pkl:/app/model.pkl" \
  image_input_1.5.0

docker run --rm \
  -v "$(pwd)/input_model:/app/input_model" \
  -v "$(pwd)/output_model:/app/output_model" \
  image_output_1.7.0

ix. Finally you can find your migrated model in the folder /output_model and its name is new_model.pkl. This model is a scikit-learn model of version 1.7.0.

🔧 Development

Run tests locally

pytest tests/

Integration tests run automatically on every push via CI/CD.

🤝 Contributing

Fork the repository
Create a new branch feature/my-feature
Open a pull request
Please ensure your pull request is tested for all combinations of functions; otherwise, it may be rejected.

We welcome bug reports, suggestions, and contributions of new models.

📄 License

MIT License — see LICENSE for details.

🔍 Author

Alberto Valdés

ML/AI Engineer | MLOps Engineer | Open Source Contributor

GitHub: @anvaldes

Project details

These details have not been verified by PyPI

Project links

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

This version

0.22.0

May 8, 2026

0.21.2

Apr 5, 2026

0.21.1

Dec 13, 2025

0.20.0

Dec 7, 2025

0.19.0

Dec 6, 2025

0.18.0

Dec 3, 2025

0.17.0

Dec 2, 2025

0.16.0

Nov 30, 2025

0.15.1

Nov 29, 2025

0.15.0

Nov 29, 2025

0.14.0

Nov 25, 2025

0.13.0

Nov 23, 2025

0.12.0

Nov 17, 2025

0.11.0

Nov 16, 2025

0.10.0

Nov 9, 2025

0.9.1

Oct 12, 2025

0.9.0

Oct 12, 2025

0.8.3

Aug 3, 2025

0.8.2

Aug 3, 2025

0.8.1

Aug 3, 2025

0.8.0

Aug 1, 2025

0.7.0

Jul 31, 2025

0.6.1

Jul 30, 2025

0.6.0

Jul 28, 2025

0.5.0

Jul 27, 2025

0.4.1

Jul 25, 2025

0.4.0

Jul 20, 2025

0.3.1

Jul 19, 2025

0.3.0

Jul 19, 2025

0.2.0

Jul 17, 2025

0.1.0

Jul 13, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sklearn_migrator-0.22.0.tar.gz (26.5 kB view details)

Uploaded May 8, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

sklearn_migrator-0.22.0-py3-none-any.whl (40.8 kB view details)

Uploaded May 8, 2026 Python 3

File details

Details for the file sklearn_migrator-0.22.0.tar.gz.

File metadata

Download URL: sklearn_migrator-0.22.0.tar.gz
Upload date: May 8, 2026
Size: 26.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for sklearn_migrator-0.22.0.tar.gz
Algorithm	Hash digest
SHA256	`aae6698345b39c56057b9db0c5418a4ced1e94c05d9b586c3748ec31183a7e22`
MD5	`b909f8a4112aa30a9f8b116830073bef`
BLAKE2b-256	`a52516473a029de8e14410d11edf1576fd5b553b86659fd2ebf0b0368eef23cd`

See more details on using hashes here.

File details

Details for the file sklearn_migrator-0.22.0-py3-none-any.whl.

File metadata

Download URL: sklearn_migrator-0.22.0-py3-none-any.whl
Upload date: May 8, 2026
Size: 40.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for sklearn_migrator-0.22.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5213c15b0541acde94b16e248097d29bad38e46377026497e09506db208211b4`
MD5	`2cf7795b3131219d309c23bc02e5541d`
BLAKE2b-256	`f081f8ba1ac67ccfee961b66f478171f8070852a95a4660908517933798497c2`

See more details on using hashes here.

sklearn-migrator 0.22.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

sklearn-migrator 🧪

🚀 Motivation

❌ These serialized models break when scikit-learn versions change

❌ This creates real problems:

✅ What sklearn-migrator provides

✔ Serialize any supported model into a JSON-compatible dictionary

✔ Deserialize and reconstruct the model in a different scikit-learn version

✔ Remove dependency on pickle/joblib for long-term storage

✔ Enable reproducible ML pipelines across environments

💡 Supported Models (21 models)

📘 Classification

📗 Regression

📙 Clustering

📘 Dimensionality Reduction

🔢 Version Compatibility Matrix

📊 Validation Metric

📂 Installation

📚 API Documentation

💥 Use Cases

1. Using two python environments

a. Serialize the model

b. Deserialize the model

2. Docker: Step by Step

🔧 Development

Run tests locally

🤝 Contributing

📄 License

🔍 Author

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

✅ What `sklearn-migrator` provides