Skip to main content

A python package for efficient pickling of ML models.

Project description

Slim Trees

CI conda-forge pypi-version python-version

slim-trees is a Python package for saving and loading compressed sklearn Tree-based and lightgbm models. The compression is performed by modifying how the model is pickled by Python's pickle module.

We presented this library at PyData Berlin 2023, check out the slides!

Installation

pip install slim-trees
# or
micromamba install slim-trees -c conda-forge
# or
pixi add slim-trees

Usage

Using slim-trees does not affect your training pipeline. Simply call dump_sklearn_compressed or dump_lgbm_compressed to save your model.

[!WARNING] slim-trees does not save all the data that would be saved by sklearn: only the parameters that are relevant for inference are saved. If you want to save the full model including impurity etc. for analytic purposes, we suggest saving both the original using pickle.dump for analytics and the slimmed down version using slim-trees for production.

Example for a RandomForestClassifier:

# example, you can also use other Tree-based models
from sklearn.ensemble import RandomForestClassifier
from slim_trees import dump_sklearn_compressed

# load training data
X, y = ...
model = RandomForestClassifier()
model.fit(X, y)

dump_sklearn_compressed(model, "model.pkl")
# or alternatively with compression
dump_sklearn_compressed(model, "model.pkl.lzma")

Example for a LGBMRegressor:

from lightgbm import LGBMRegressor
from slim_trees import dump_lgbm_compressed

# load training data
X, y = ...
model = LGBMRegressor()
model.fit(X, y)

dump_lgbm_compressed(model, "model.pkl")
# or alternatively with compression
dump_lgbm_compressed(model, "model.pkl.lzma")

Later, you can load the model using load_compressed or pickle.load.

import pickle
from slim_trees import load_compressed

model = load_compressed("model.pkl")

# or alternatively with pickle.load
with open("model.pkl", "rb") as f:
    model = pickle.load(f)

Save your model as bytes

You can also save the model as bytes instead of in a file similar to the pickle.dumps method.

from slim_trees import dumps_sklearn_compressed, loads_compressed

X, y = ...
model = RandomForestClassifier()
model.fit(X, y)

data = dumps_sklearn_compressed(model, compression="lzma")
...
model_loaded = loads_compressed(data, compression="lzma")

Drop-in replacement for pickle

You can also use the slim_trees.sklearn_tree.dump or slim_trees.lgbm_booster.dump functions as drop-in replacements for pickle.dump.

from slim_trees import sklearn_tree, lgbm_booster

# for sklearn models
with open("model.pkl", "wb") as f:
    sklearn_tree.dump(model, f)  # instead of pickle.dump(...)

# for lightgbm models
with open("model.pkl", "wb") as f:
    lgbm_booster.dump(model, f)  # instead of pickle.dump(...)

Development Installation

You can install the package in development mode using the new conda package manager pixi:

 git clone https://github.com/quantco/slim-trees.git
❯ cd slim-trees

❯ pixi install
❯ pixi run postinstall
❯ pixi run test
[...] pixi run py312 python
>>> import slim_trees
[...]

Benchmark

As a general overview on what you can expect in terms of savings: This is a 1.2G large sklearn RandomForestRegressor.

benchmark

The new file is 9x smaller than the original pickle file.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

slim_trees-0.2.15.tar.gz (11.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

slim_trees-0.2.15-py3-none-any.whl (14.0 kB view details)

Uploaded Python 3

File details

Details for the file slim_trees-0.2.15.tar.gz.

File metadata

  • Download URL: slim_trees-0.2.15.tar.gz
  • Upload date:
  • Size: 11.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for slim_trees-0.2.15.tar.gz
Algorithm Hash digest
SHA256 587385f0161cb343c03fb9f16f8236ae4cf684c804f03d2a208bf4f36030784b
MD5 5f12f913ccd9e7b7d801b85e9e7419c6
BLAKE2b-256 58413e9d0b5d1cec9e5f2c02059e9190b0a5402abe74157f83f9f1dd9e21c6a1

See more details on using hashes here.

File details

Details for the file slim_trees-0.2.15-py3-none-any.whl.

File metadata

  • Download URL: slim_trees-0.2.15-py3-none-any.whl
  • Upload date:
  • Size: 14.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for slim_trees-0.2.15-py3-none-any.whl
Algorithm Hash digest
SHA256 7ef8e967cd3a38ad9be17980fb423f485d4eef0ff6d3c36c95eabc6bfcc0d06f
MD5 8def904ceb08931c3a873dacb0d7e470
BLAKE2b-256 db74458f39b45eae3392a4e9652a5768fdcbae168e1a7f3be41797cef9098bcc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page