A lightweight gradient boosting implementation in Rust.

These details have not been verified by PyPI

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

Forust

A lightweight gradient boosting package

Forust, is a lightweight package for building gradient boosted decision tree ensembles. All of the algorithm code is written in Rust, with a python wrapper. The rust package can be used directly, however, most examples shown here will be for the python wrapper. For a self contained rust example, see here. It implements the same algorithm as the XGBoost package, and in many cases will give nearly identical results.

I developed this package for a few reasons, mainly to better understand the XGBoost algorithm, additionally to have a fun project to work on in rust, and because I wanted to be able to experiment with adding new features to the algorithm in a smaller simpler codebase.

All of the rust code for the package can be found in the src directory, while all of the python wrapper code is in the py-forust directory.

Documentation

Documentation for the python API can be found here.

Installation

The package can be installed directly from pypi.

pip install forust

To use in a rust project add the following to your Cargo.toml file.

forust-ml = "0.4.7"

Usage

For details on all of the methods and their respective parameters, see the python api documentation.

The GradientBooster class is currently the only public facing class in the package, and can be used to train gradient boosted decision tree ensembles with multiple objective functions.

Training and Predicting

Once, the booster has been initialized, it can be fit on a provided dataset, and performance field. After fitting, the model can be used to predict on a dataset. In the case of this example, the predictions are the log odds of a given record being 1.

# Small example dataset
from seaborn import load_dataset

df = load_dataset("titanic")
X = df.select_dtypes("number").drop(columns=["survived"])
y = df["survived"]

# Initialize a booster with defaults.
from forust import GradientBooster
model = GradientBooster(objective_type="LogLoss")
model.fit(X, y)

# Predict on data
model.predict(X.head())
# array([-1.94919663,  2.25863229,  0.32963671,  2.48732194, -3.00371813])

# predict contributions
model.predict_contributions(X.head())
# array([[-0.63014213,  0.33880048, -0.16520798, -0.07798772, -0.85083578,
#        -1.07720813],
#       [ 1.05406709,  0.08825999,  0.21662544, -0.12083538,  0.35209258,
#        -1.07720813],

When predicting with the data, the maximum iteration that will be used when predicting can be set using the set_prediction_iteration method. If early_stopping_rounds has been set, this will default to the best iteration, otherwise all of the trees will be used.

If early stopping was used, the evaluation history can be retrieved with the get_evaluation_history method.

model = GradientBooster(objective_type="LogLoss")
model.fit(X, y, evaluation_data=[(X, y)])

model.get_evaluation_history()[0:3]

# array([[588.9158873 ],
#        [532.01055803],
#        [496.76933646]])

Inspecting the Model

Once the booster has been fit, each individual tree structure can be retrieved in text form, using the text_dump method. This method returns a list, the same length as the number of trees in the model.

model.text_dump()[0]
# 0:[0 < 3] yes=1,no=2,missing=2,gain=91.50833,cover=209.388307
#       1:[4 < 13.7917] yes=3,no=4,missing=4,gain=28.185467,cover=94.00148
#             3:[1 < 18] yes=7,no=8,missing=8,gain=1.4576768,cover=22.090348
#                   7:[1 < 17] yes=15,no=16,missing=16,gain=0.691266,cover=0.705011
#                         15:leaf=-0.15120,cover=0.23500
#                         16:leaf=0.154097,cover=0.470007

The json_dump method performs the same action, but returns the model as a json representation rather than a text string.

To see an estimate for how a given feature is used in the model, the partial_dependence method is provided. This method calculates the partial dependence values of a feature. For each unique value of the feature, this gives the estimate of the predicted value for that feature, with the effects of all features averaged out. This information gives an estimate of how a given feature impacts the model.

This information can be plotted to visualize how a feature is used in the model, like so.

from seaborn import lineplot
import matplotlib.pyplot as plt

pd_values = model.partial_dependence(X=X, feature="age", samples=None)

fig = lineplot(x=pd_values[:,0], y=pd_values[:,1],)
plt.title("Partial Dependence Plot")
plt.xlabel("Age")
plt.ylabel("Log Odds")

We can see how this is impacted if a model is created, where a specific constraint is applied to the feature using the monotone_constraint parameter.

model = GradientBooster(
    objective_type="LogLoss",
    monotone_constraints={"age": -1},
)
model.fit(X, y)

pd_values = model.partial_dependence(X=X, feature="age")
fig = lineplot(
    x=pd_values[:, 0],
    y=pd_values[:, 1],
)
plt.title("Partial Dependence Plot with Monotonicity")
plt.xlabel("Age")
plt.ylabel("Log Odds")

Feature importance values can be calculated with the calculate_feature_importance method. This function will return a dictionary of the features and their importances. It should be noted that if a feature was never used for splitting it will not be returned in importance dictionary. This function takes the following arguments.

model.calculate_feature_importance("Gain")
# {
#   'parch': 0.0713072270154953, 
#   'age': 0.11609109491109848,
#   'sibsp': 0.1486879289150238,
#   'fare': 0.14309120178222656,
#   'pclass': 0.5208225250244141
# }

Saving the model

To save and subsequently load a trained booster, the save_booster and load_booster methods can be used. Each accepts a path, which is used to write the model to. The model is saved and loaded as a json object.

trained_model.save_booster("model_path.json")

# To load a model from a json path.
loaded_model = GradientBooster.load_booster("model_path.json")

Project details

These details have not been verified by PyPI

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

This version

0.4.7

Apr 12, 2024

0.4.6

Mar 24, 2024

0.4.5

Dec 17, 2023

0.4.4

Dec 13, 2023

0.4.3

Dec 5, 2023

0.4.2

Oct 20, 2023

0.4.1

Oct 17, 2023

0.4.0

Oct 16, 2023

0.3.4

Oct 13, 2023

0.3.3

Oct 10, 2023

0.3.2

Oct 9, 2023

0.3.1

Oct 2, 2023

0.3.0

Oct 2, 2023

0.2.26

Sep 19, 2023

0.2.25

Sep 19, 2023

0.2.24

Sep 12, 2023

0.2.23

Sep 7, 2023

0.2.22

Sep 6, 2023

0.2.21

Aug 23, 2023

0.2.20

Aug 8, 2023

0.2.19

Aug 2, 2023

0.2.18

Jul 13, 2023

0.2.17

Jul 5, 2023

0.2.16

Jun 29, 2023

0.2.15

Jun 24, 2023

0.2.14

Jun 19, 2023

0.2.13

Jun 9, 2023

0.2.12

May 24, 2023

0.2.11

May 22, 2023

0.2.10

May 19, 2023

0.2.9

May 18, 2023

0.2.8

May 18, 2023

0.2.7

May 15, 2023

0.2.6

May 9, 2023

0.2.5

May 8, 2023

0.2.4

May 6, 2023

0.2.3

May 1, 2023

0.2.2

Apr 23, 2023

0.2.1

Apr 23, 2023

0.2.0

Apr 20, 2023

0.1.7

Aug 20, 2022

0.1.6

Aug 19, 2022

0.1.5

Jul 31, 2022

0.1.4

Jun 18, 2022

0.1.3

Jun 17, 2022

0.1.2

Jun 9, 2022

0.1.0

Jun 8, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

forust-0.4.7.tar.gz (1.4 MB view hashes)

Uploaded Apr 12, 2024 Source

Built Distributions

forust-0.4.7-cp312-none-win_amd64.whl (468.9 kB view hashes)

Uploaded Apr 12, 2024 CPython 3.12 Windows x86-64

forust-0.4.7-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (566.3 kB view hashes)

Uploaded Apr 12, 2024 CPython 3.12 manylinux: glibc 2.17+ x86-64

forust-0.4.7-cp312-cp312-macosx_10_12_x86_64.whl (517.7 kB view hashes)

Uploaded Apr 12, 2024 CPython 3.12 macOS 10.12+ x86-64

forust-0.4.7-cp311-none-win_amd64.whl (471.6 kB view hashes)

Uploaded Apr 12, 2024 CPython 3.11 Windows x86-64

forust-0.4.7-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (569.3 kB view hashes)

Uploaded Apr 12, 2024 CPython 3.11 manylinux: glibc 2.17+ x86-64

forust-0.4.7-cp311-cp311-macosx_10_12_x86_64.whl (521.2 kB view hashes)

Uploaded Apr 12, 2024 CPython 3.11 macOS 10.12+ x86-64

forust-0.4.7-cp310-none-win_amd64.whl (471.6 kB view hashes)

Uploaded Apr 12, 2024 CPython 3.10 Windows x86-64

forust-0.4.7-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (569.2 kB view hashes)

Uploaded Apr 12, 2024 CPython 3.10 manylinux: glibc 2.17+ x86-64

forust-0.4.7-cp310-cp310-macosx_10_12_x86_64.whl (521.1 kB view hashes)

Uploaded Apr 12, 2024 CPython 3.10 macOS 10.12+ x86-64

forust-0.4.7-cp39-none-win_amd64.whl (471.0 kB view hashes)

Uploaded Apr 12, 2024 CPython 3.9 Windows x86-64

forust-0.4.7-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (568.8 kB view hashes)

Uploaded Apr 12, 2024 CPython 3.9 manylinux: glibc 2.17+ x86-64

forust-0.4.7-cp39-cp39-macosx_10_12_x86_64.whl (520.5 kB view hashes)

Uploaded Apr 12, 2024 CPython 3.9 macOS 10.12+ x86-64

forust-0.4.7-cp38-none-win_amd64.whl (471.5 kB view hashes)

Uploaded Apr 12, 2024 CPython 3.8 Windows x86-64

forust-0.4.7-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (569.8 kB view hashes)

Uploaded Apr 12, 2024 CPython 3.8 manylinux: glibc 2.17+ x86-64

forust-0.4.7-cp38-cp38-macosx_10_12_x86_64.whl (521.4 kB view hashes)

Uploaded Apr 12, 2024 CPython 3.8 macOS 10.12+ x86-64

Hashes for forust-0.4.7.tar.gz

Hashes for forust-0.4.7.tar.gz
Algorithm	Hash digest
SHA256	`9cdeb3af6e1361e7d5a740cb4aa858bbd194fb6336a906a5aa76d71909bc478b`
MD5	`cde34ae890fe3f09b1f0e20b11f8e05d`
BLAKE2b-256	`ecfe1f7ed5fd2700a268721e622cdfe8e0df51244d22e0bf701bf8e263d494a8`

Hashes for forust-0.4.7-cp312-none-win_amd64.whl

Hashes for forust-0.4.7-cp312-none-win_amd64.whl
Algorithm	Hash digest
SHA256	`5bfeb82400d0070b53b587d419357432769e5ee508438ade6b96ffac07d726d1`
MD5	`36060f592f13ef70330ac7e6288962e4`
BLAKE2b-256	`881fdda8a7a91471306b412b16c5751efab229972b2e37f21462d8d0123fb5ae`

Hashes for forust-0.4.7-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl

Hashes for forust-0.4.7-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm	Hash digest
SHA256	`57a2bcdbd7604a636001d84d66b9c5bfafac708452485bf3d345b6b3c872a2fd`
MD5	`5ada6f54b698e8be32eba0ac8549aa6c`
BLAKE2b-256	`03f87acf1c276a068ea801f877f822d73a9d5cdc79b7b33026ef3faccca7f520`

Hashes for forust-0.4.7-cp312-cp312-macosx_10_12_x86_64.whl

Hashes for forust-0.4.7-cp312-cp312-macosx_10_12_x86_64.whl
Algorithm	Hash digest
SHA256	`7c831b8ae296b3d91224023eab55415aaab264c964b284d8bb2d63bb5b14d97a`
MD5	`55d11d31b9e72d8111d07cd67c1266eb`
BLAKE2b-256	`2c31ec800bbb9329a9302a362a73f1444ae190f9440a384f7c173f2c182839ee`

Hashes for forust-0.4.7-cp311-none-win_amd64.whl

Hashes for forust-0.4.7-cp311-none-win_amd64.whl
Algorithm	Hash digest
SHA256	`59aa42a9dd25b73e31829867cb4c1218867f26b20aaf221eb828984cc3773b8e`
MD5	`73401129e8e0a5380ad680c9c1099362`
BLAKE2b-256	`90e8f22c35bca940d5b8def33f959d88b14d8a0ed82e1ded774b4ee928d4db9d`

Hashes for forust-0.4.7-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl

Hashes for forust-0.4.7-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm	Hash digest
SHA256	`dc65ae35ddff9664254c230c67df6bc973af305df7cf51c09dee6d10c182bf40`
MD5	`baa449da03200712617492c0c6d4b2cc`
BLAKE2b-256	`de06562671b05b5dc51e9928008469cd7e0322846c124a6b4721d4d21a394fca`

Hashes for forust-0.4.7-cp311-cp311-macosx_10_12_x86_64.whl

Hashes for forust-0.4.7-cp311-cp311-macosx_10_12_x86_64.whl
Algorithm	Hash digest
SHA256	`5d59a5244fd694cc7654b3901d23965bbb9e2e09a1a76c8fb99cd54847d749fd`
MD5	`9a544fdb3706b8afe20874ad249e92b6`
BLAKE2b-256	`52212b5003b4496e68e2ad5a7cf06a8b84e593a9782b58ba33c99ca969bbda6b`

Hashes for forust-0.4.7-cp310-none-win_amd64.whl

Hashes for forust-0.4.7-cp310-none-win_amd64.whl
Algorithm	Hash digest
SHA256	`57f5194fd334e22ac3b09b42948bee014b2ec11204da2b6327cc774b7efe1577`
MD5	`7797038f7c6bfe88535ea787eace9f8c`
BLAKE2b-256	`64c60109903ca2662965294da2a20d044754391539e9b61fd3a4edf5ff533bb4`

Hashes for forust-0.4.7-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl

Hashes for forust-0.4.7-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm	Hash digest
SHA256	`36a90da806e955f1d0c30fa9c051d065be53e3b89b60eb3a1f1d2a4837851fae`
MD5	`115fe7fd9e91dbeee0ca6c79a12b625e`
BLAKE2b-256	`296ad84db53e4b337483e108074a257253d2bb4cd7bef79c04142f4d4e01c60e`

Hashes for forust-0.4.7-cp310-cp310-macosx_10_12_x86_64.whl

Hashes for forust-0.4.7-cp310-cp310-macosx_10_12_x86_64.whl
Algorithm	Hash digest
SHA256	`43758352b9eb0bdd44c7dd2a97c9b7d5f256908e9e1d6a02ce86ba930414569c`
MD5	`3e102d0a72e973542686e28fae4e7686`
BLAKE2b-256	`5306ff4d0750ff42f90a83355d6e982a4edd54c4e811b764f9c5d830e0e89a77`

Hashes for forust-0.4.7-cp39-none-win_amd64.whl

Hashes for forust-0.4.7-cp39-none-win_amd64.whl
Algorithm	Hash digest
SHA256	`a8fc97241c421ea705cddfb0761b185ad0fc3b535bd6090a566d7f0c74aa67f4`
MD5	`c81f810fb22afa33c45f483079ea9152`
BLAKE2b-256	`d38293ff03772a0b51f88ca1cf6a5d1a86e6111331c24443cb804046701e82f6`

Hashes for forust-0.4.7-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl

Hashes for forust-0.4.7-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm	Hash digest
SHA256	`c6d92e00bf6f4012b2499e5796db33994b1ff5a7a1eea87416ab0ae28f7769bf`
MD5	`1cc130d9afdd7cd53c2143d7aac98e09`
BLAKE2b-256	`2ac60df5f11e862d155932ec4aebc715d3aa7e7eb1d1f3f5e18a73eb8d501259`

Hashes for forust-0.4.7-cp39-cp39-macosx_10_12_x86_64.whl

Hashes for forust-0.4.7-cp39-cp39-macosx_10_12_x86_64.whl
Algorithm	Hash digest
SHA256	`b7d1b9aaa6a535cac5230e4dca48fa9b09e49b074320548fda84fcc33ad5820b`
MD5	`87bb2225c5d8e3af2d089ce07cd1b587`
BLAKE2b-256	`2a1b06b0ea2cda7ae07660872965d89cfd26e79ef205f02c576e6d5d9d0c2dc1`

Hashes for forust-0.4.7-cp38-none-win_amd64.whl

Hashes for forust-0.4.7-cp38-none-win_amd64.whl
Algorithm	Hash digest
SHA256	`11f6501c0dfeb9e8fbcd366be9d63e851ca11254cddaa264908c6e1b839524ff`
MD5	`525abd3ca72e499c83de11be58c3e084`
BLAKE2b-256	`c1e9b70f0cec337f5190babff3aacaa8880aad215f45e61ec5f9a58bcda0e6c7`

Hashes for forust-0.4.7-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl

Hashes for forust-0.4.7-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm	Hash digest
SHA256	`48856137b9c10bb077befe5e2ec092581ebec6d5c0a450a63886b9cfc7d97595`
MD5	`ce35746f76df336f7f386addf800255b`
BLAKE2b-256	`f282a8f6c994bae570cf646f52f0ce7d251bbff998069541ff4c54f54a9e32b6`

Hashes for forust-0.4.7-cp38-cp38-macosx_10_12_x86_64.whl

Hashes for forust-0.4.7-cp38-cp38-macosx_10_12_x86_64.whl
Algorithm	Hash digest
SHA256	`0026bb0b5fd9dbebc908ee7f605070c6c2059d09a657d521d7686aaa5b3abe40`
MD5	`750115fcb9395c47cf6ab72364bd03da`
BLAKE2b-256	`23c6e08dab08576d4efe262e2883755daf6c48911ee9f5ce9e33e71a4933edcf`