Implementations of algorithms from Prof. Dr. Ethem Alpaydın's research papers and ML textbook (MIT Press).
Project description
neural-trees
PyTorch + sklearn implementations of the tree and mixture-of-experts algorithms from Alpaydın's research papers — the ones that never got a proper open-source home.
Why?
I was reading through Alpaydın's Introduction to Machine Learning and his papers and kept hitting the same wall: interesting algorithms, no usable Python code anywhere. The Soft Decision Tree paper (ICPR 2012) alone has hundreds of citations but the implementations floating around are incomplete, undocumented, or years out of date.
So I wrote them myself — clean, tested, and fully compatible with the sklearn API.
Covered so far:
| Algorithm | Paper | Status |
|---|---|---|
| Soft Decision Trees | İrsoy, Yıldız, Alpaydın (ICPR 2012) | ✅ PyTorch + sklearn API |
| Omnivariate Decision Trees | Yıldız & Alpaydın (IEEE TNN 2001) | ✅ |
| Hierarchical Mixture of Experts + Dropout | İrsoy & Alpaydın (Neurocomputing 2021) | ✅ PyTorch |
| GAL: Grow and Learn Networks | Alpaydın (IJPRAI 1994) | ✅ |
| Combined 5×2cv F Test | Alpaydın (Neural Computation 1999) | ✅ Gold-standard classifier comparison |
| McNemar's Test | — | ✅ |
| Naive Bayes (Gaussian/Bernoulli/Multinomial) | Textbook Ch. 3 | ✅ |
| Distance-Weighted KNN + CNN | Alpaydın (AIR 1997) | ✅ |
Installation
pip install neural-trees
Or install from source:
git clone https://github.com/cgrtml/neural-trees.git
cd neural-trees
pip install -e ".[dev]"
Quick Start
Soft Decision Trees
The flagship algorithm. Unlike hard decision trees, every sample reaches every leaf with some probability — making the tree fully differentiable and trainable end-to-end with backpropagation.
from neural_trees import SoftDecisionTree
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)
sdt = SoftDecisionTree(depth=4, max_epochs=40, penalty_coef=1e-3)
sdt.fit(X_train, y_train)
print(f"Accuracy: {sdt.score(X_test, y_test):.4f}")
# Inspect what each leaf learned
leaf_distributions = sdt.get_leaf_distributions() # shape: (n_leaves, n_classes)
# Inspect the split direction at each internal node
split_weights = sdt.get_split_weights() # list of weight vectors
Key idea (Irsoy, Yıldız, Alpaydın, 2012):
At each internal node i:
$$p_i(\mathbf{x}) = \sigma(\mathbf{w}_i^\top \mathbf{x} + b_i)$$
The probability of reaching leaf $\ell$ is the product of gate values along the path. Final prediction:
$$P(y \mid \mathbf{x}) = \sum_\ell \mu_\ell(\mathbf{x}) \cdot Q_\ell(y)$$
Comparing Two Classifiers — The Gold Standard Test
Alpaydın's Combined 5×2cv F Test (Neural Computation, 1999) is the statistically correct way to compare two classifiers. It overcomes the inflated Type I error of the paired t-test.
from neural_trees.statistical_tests import combined_5x2cv_f_test
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import load_breast_cancer
X, y = load_breast_cancer(return_X_y=True)
result = combined_5x2cv_f_test(
clf_A=DecisionTreeClassifier(),
clf_B=SVC(kernel="rbf"),
X=X, y=y,
alpha=0.05
)
print(result)
StatisticalTestResult(
test = Alpaydın's Combined 5×2cv F Test
statistic = 12.4731
p-value = 0.0083
alpha = 0.05
decision = ✓ REJECT H0
note = Classifiers significantly differ
)
Why not just use a t-test? The paired t-test reuses training data across folds — the differences are correlated, inflating the false positive rate. Alpaydın's F test accounts for this by estimating variance within each 2-fold split, giving a much better calibrated test.
Hierarchical Mixture of Experts with Dropout
from neural_trees import HierarchicalMixtureOfExperts
from sklearn.datasets import load_digits
X, y = load_digits(return_X_y=True)
moe = HierarchicalMixtureOfExperts(
depth=2,
branching_factor=4, # 4^2 = 16 expert leaves
dropout_rate=0.3, # Dropout on gating networks (Irsoy & Alpaydın, 2021)
max_epochs=50,
verbose=True,
)
moe.fit(X, y)
print(f"Accuracy: {moe.score(X, y):.4f}")
GAL — Grow and Learn Networks
No need to specify architecture. The network grows when it can't learn and prunes itself when neurons become redundant.
from neural_trees.classical import GALNetwork
from sklearn.datasets import load_wine
X, y = load_wine(return_X_y=True)
gal = GALNetwork(
initial_hidden=2,
max_hidden=40,
grow_threshold=0.15,
prune_threshold=1e-4,
max_epochs=100,
verbose=True,
)
gal.fit(X, y)
print(f"Final hidden units: {gal.n_hidden_final_}")
print(f"Accuracy: {gal.score(X, y):.4f}")
Omnivariate Decision Trees
At each node, automatically selects the best split type (univariate, linear LDA, or nonlinear MLP) using cross-validation.
from neural_trees import OmnivariateDecisionTree
from sklearn.datasets import load_wine
X, y = load_wine(return_X_y=True)
odt = OmnivariateDecisionTree(max_depth=4, cv_folds=3)
odt.fit(X, y)
# See how many nodes used each split type
print(odt.get_split_type_distribution())
# {'univariate': 3, 'linear': 4, 'nonlinear': 1}
All sklearn-compatible
Every model follows the fit / predict / predict_proba / score interface:
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
pipe = Pipeline([
("scaler", StandardScaler()),
("sdt", SoftDecisionTree(depth=4, max_epochs=30)),
])
pipe.fit(X_train, y_train)
pipe.score(X_test, y_test)
from sklearn.model_selection import GridSearchCV
param_grid = {"sdt__depth": [3, 4, 5], "sdt__penalty_coef": [1e-4, 1e-3, 1e-2]}
gs = GridSearchCV(pipe, param_grid, cv=5)
gs.fit(X_train, y_train)
print(gs.best_params_)
Notebooks
| Notebook | Description |
|---|---|
01_soft_decision_trees.ipynb |
Training, visualization, comparison with CART |
02_classifier_comparison_tests.ipynb |
When to use which statistical test |
03_hierarchical_moe.ipynb |
HMoE training and expert specialization |
04_gal_network.ipynb |
Dynamic architecture growth/pruning |
05_omnivariate_trees.ipynb |
Node-level split type analysis |
About the Author
Prof. Dr. Ethem Alpaydın is one of the world's leading machine learning researchers.
- Professor Emeritus at Boğaziçi University (Istanbul), now at Özyeğin University
- Author of Introduction to Machine Learning (MIT Press, 4 editions, 2004–2020) — used in hundreds of universities globally
- Author of Machine Learning: The New AI (MIT Press, 2016)
- PhD from EPFL (1990); research stays at UC Berkeley, MIT, and IDIAP
- 34,000+ citations on Google Scholar
- IEEE Senior Member; Pattern Recognition journal editorial board
His 1999 paper on the Combined 5×2cv F Test is the standard reference for classifier comparison. His Soft Decision Trees paper (2012) remains one of the most elegant proposals for differentiable tree models — predating the modern neural tree literature.
Citation
If you use this library in academic work, please cite the original papers:
@book{alpaydin2020introduction,
title = {Introduction to Machine Learning},
author = {Alpayd{\i}n, Ethem},
year = {2020},
edition = {4th},
publisher = {MIT Press}
}
@article{irsoy2021dropout,
title = {Dropout Regularization in Hierarchical Mixture of Experts},
author = {\.{I}rsoy, O{\u{g}}uzhan and Alpayd{\i}n, Ethem},
journal = {Neurocomputing},
volume = {419},
pages = {148--156},
year = {2021}
}
@inproceedings{irsoy2012soft,
title = {Soft Decision Trees},
author = {\.{I}rsoy, O{\u{g}}uzhan and Y{\i}ld{\i}z, Olcay Taner and Alpayd{\i}n, Ethem},
booktitle = {Proceedings of the 21st International Conference on Pattern Recognition (ICPR)},
year = {2012}
}
@article{alpaydin1999combined,
title = {Combined 5x2cv {F} Test for Comparing Supervised Classification Learning Algorithms},
author = {Alpayd{\i}n, Ethem},
journal = {Neural Computation},
volume = {11},
number = {8},
pages = {1885--1892},
year = {1999}
}
Roadmap
Things I'm planning to add:
- Multiple Kernel Learning (Gönen & Alpaydın, JMLR 2011)
- Localized Multiple Kernel Learning (ICML 2008)
- Convolutional Soft Decision Trees (ICANN 2018)
- Decision boundary visualization utilities
- Benchmark comparison on UCI datasets
If you find a bug or want to implement one of these, open an issue.
License
MIT — see LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file neural_trees-0.1.1.tar.gz.
File metadata
- Download URL: neural_trees-0.1.1.tar.gz
- Upload date:
- Size: 24.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ef6b86bcbd76af3f810e89495b9be63d76ca76b9bca0b9a2bbc5bc194e50b9bb
|
|
| MD5 |
3a789bd14561b4709b75d83bd8597f7c
|
|
| BLAKE2b-256 |
4f59edd2310574bb8803d0d3b4b79066f7f86e2b27a22bc46029344ace21cf2b
|
File details
Details for the file neural_trees-0.1.1-py3-none-any.whl.
File metadata
- Download URL: neural_trees-0.1.1-py3-none-any.whl
- Upload date:
- Size: 25.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3581fe59fc98277ebad3eddd99fa8f73df09448fc48cb7086719f36facf6c069
|
|
| MD5 |
8e96d08bd54e197497f511e8d8bb3bfa
|
|
| BLAKE2b-256 |
dec50209775292b830ebdc715817fcdf165e73550fc17030840f8587db6f69fa
|