Distributed Gradient Boosting Forest — deep graph tree ensemble algorithm
Project description
DeepGBoost
Machine Learning algorithm based on gradient boosting forest that merges the power of tree ensembles with neural network architectures.
⚙️ Installation
pip install deepgboost
Optional plotting support:
pip install deepgboost[plotting]
To install from source with development dependencies:
git clone https://github.com/iamthinbaker/deepgboost.git
cd deepgboost
pip install -e .
🚀 Usage
Quick Start
from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split
from deepgboost import DeepGBoostRegressor
X, y = load_diabetes(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42)
model = DeepGBoostRegressor(
n_trees=10,
n_layers=15,
max_depth=4,
learning_rate=0.1,
).fit(X_train, y_train)
predictions = model.predict(X_test)
📓 Examples
Detailed usage examples are available in the examples/ directory:
- quickstart.ipynb — full tour of the API (regression, classification, callbacks, feature importances)
- classifier.ipynb — binary and multiclass classification walkthrough
- regressor.ipynb — regression walkthrough
- serialization.ipynb — saving and loading trained models with pickle
🧠 DeepGBoost
Algorithm
DeepGBoost implements the Distributed Gradient Boosting Forest (DGBF), a novel tree ensemble algorithm introduced in:
Delgado-Panadero, Á., Benítez-Andrades, J. A., & García-Ordás, M. T. (2023). A generalized decision tree ensemble based on the NeuralNetworks architecture: Distributed Gradient Boosting Forest (DGBF). Applied Intelligence, 53, 22991–23003. https://doi.org/10.1007/s10489-023-04735-w
Classical tree ensemble methods — RandomForest (bagging) and GradientBoosting (boosting) — are powerful for tabular data but cannot perform hierarchical representation learning as Neural Networks do. DGBF addresses this by mathematically combining both bagging and boosting into a unified formulation that defines a graph-structured tree ensemble with distributed representation learning, without requiring back-propagation or parametric models.
The core idea is to distribute the gradient descent of each boosting step across the individual trees of a RandomForest layer, so that each tree learns an independent gradient component:
$$F_i(x) = \sum_{l=1}^{L} RF_l(x) = \frac{1}{T} \sum_{l=0}^{L} \sum_{t=0}^{T} h_{l,t}(x)$$
where L is the number of boosting layers and T is the number of trees per layer. This structure is a direct analogue of a Dense Neural Network, where each RandomForest layer corresponds to a network layer, with distributed gradients replacing back-propagation.
Fig. 1 — NeuralNetwork vs DGBF architecture: In NN (left), each neuron's output feeds into the next layer via back-propagation. In DGBF (right), the distributed gradients of all trees from each layer are forwarded to every tree of the following layer.
Both RandomForest and GradientBoosting emerge naturally as special cases of DGBF: RandomForest is recovered with a single layer (L = 1) and GradientBoosting with a single tree per layer (T = 1).
Fig. 2 — RandomForest & GradientBoosting as DGBF special cases: RandomForest (left) and GradientBoosting (right) represented as particular graph architectures of DGBF.
📊 Benchmark
DGBF was evaluated against RandomForest (RF) and GradientBoosting (GBDT) on 9 regression datasets from the UCI Machine Learning Repository (Parkinson, Wine, Concrete, Obesity, NavalVessel, Temperature, Cargo2000, BikeSales, Superconduct), using 200 randomized simulations per dataset with an 80/20 train-test split.
[!note] Winner DeepGBoost 🏆 DGBF surpasses the mean R² score of both GradientBoosting and RandomForest in 7 out of 9 datasets
To reproduce the benchmark, run the experiment script from the benchmark/ directory:
cd benchmark
python run_experiments.py
The script reads its configuration from benchmark/config.json, where you can adjust the models, hyperparameters, datasets, and experiment settings (e.g. number of bootstrap runs). Results are saved to benchmark/results/.
🤝 Contributing
Contributions are welcome. See CONTRIBUTING.md for development setup, code style, and pull request guidelines.
📄 Citation
If you use DeepGBoost in your research, please cite using the metadata in CITATION.cff or the BibTeX entry provided by GitHub ("Cite this repository" button).
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file deepgboost-0.3.4.tar.gz.
File metadata
- Download URL: deepgboost-0.3.4.tar.gz
- Upload date:
- Size: 45.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
60a9a43e74bcaaf160f22a9107a3a614bec16e1819bee923474ac7892d54326c
|
|
| MD5 |
a14edf8c5d1d95b769c6248031bb02ca
|
|
| BLAKE2b-256 |
13196f5f7bcf7e834512b6406c6c02f39f1275c2454a85e84e041b93062b3684
|
Provenance
The following attestation bundles were made for deepgboost-0.3.4.tar.gz:
Publisher:
publish.yml on iamthinbaker/deepgboost
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
deepgboost-0.3.4.tar.gz -
Subject digest:
60a9a43e74bcaaf160f22a9107a3a614bec16e1819bee923474ac7892d54326c - Sigstore transparency entry: 1641876681
- Sigstore integration time:
-
Permalink:
iamthinbaker/deepgboost@776fb6f70de12dca0451001542ceb7d398364d62 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/iamthinbaker
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@776fb6f70de12dca0451001542ceb7d398364d62 -
Trigger Event:
push
-
Statement type:
File details
Details for the file deepgboost-0.3.4-py3-none-any.whl.
File metadata
- Download URL: deepgboost-0.3.4-py3-none-any.whl
- Upload date:
- Size: 45.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e1daae7867a5686bb8b275ed13634cf8ffe3575332c1a8f86c1da8674be02aad
|
|
| MD5 |
491c80463cf787c08a3596a15359ccbc
|
|
| BLAKE2b-256 |
4e95c8c582a45f3ce1d0d6e126305801a0cc3ac75e9951e7b2d34f684da12a1b
|
Provenance
The following attestation bundles were made for deepgboost-0.3.4-py3-none-any.whl:
Publisher:
publish.yml on iamthinbaker/deepgboost
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
deepgboost-0.3.4-py3-none-any.whl -
Subject digest:
e1daae7867a5686bb8b275ed13634cf8ffe3575332c1a8f86c1da8674be02aad - Sigstore transparency entry: 1641876791
- Sigstore integration time:
-
Permalink:
iamthinbaker/deepgboost@776fb6f70de12dca0451001542ceb7d398364d62 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/iamthinbaker
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@776fb6f70de12dca0451001542ceb7d398364d62 -
Trigger Event:
push
-
Statement type: