Multi-target Random Forest implementation that can mix both classification and regression tasks.
Project description
morfist: mixed-output-rf
Multi-target Random Forest implementation that can mix both classification and regression tasks.
Morfist implements the Random Forest algorithm (Breiman, 2001) with support for mixed-task multi-task learning, i.e., it is possible to train the model on any number of classification tasks and regression tasks, simultaneously. Morfist's mixed multi-task learning implementation follows that proposed by Linusson (2013).
- Breiman, L. (2001). Random forests. Machine learning, 45(1), 5-32.
- Linusson, H. (2013). Multi-output random forests.
Installation
With pip:
pip install decision-tree-morfist
With conda:
conda install -c systemallica decision-tree-morfist
Usage
Initialising the model
- Similarly to a scikit-learn RandomForestClassifier, a MixedRandomForest can be initialised in this way:
from morfist import MixedRandomForest
mrf = MixedRandomForest(
n_estimators=n_trees,
min_samples_leaf=1,
classification_targets=[0]
)
- The available parameters are:
-
n_estimators(int): the number of trees in the forest. Optional. Default value: 10.
-
max_features(int | float | str): the number of features to consider when looking for the best split. Optional. Default value: 'sqrt'.
- If int, then consider max_features features at each split.
- If float, then max_features is a fraction and int(max_features * n_features) features are considered at each split.
- If “sqrt”, then max_features=sqrt(n_features) (same as “auto”).
- If “log2”, then max_features=log2(n_features).
- If None, then max_features=n_features.
Note: the search for a split does not stop until at least one valid partition of the node samples is found, even if it requires to effectively inspect more than max_features features.
-
min_samples_leaf(int): the minimum number of samples required to be at a leaf node. Optional. Default value: 5.
Note: A split point at any depth will only be considered if it leaves at least min_samples_leaf training samples in each of the left and right branches. This may have the effect of smoothing the model, especially in regression.
-
choose_split(str): method used to find the best split. Optional. Default value: 'mean'.
By default, the mean information gain will be used.
- Possible values:
- 'mean': the mean information gain is used.
- 'max': the maximum information gain is used.
- Possible values:
-
classification_targets(int[]): features that are part of the classification task. Optional. Default value: None.
If no classification_targets are specified, the random forest will treat all variables as regression variables.
-
Training the model
- Once the model is initialised, it can be fitted like this:
Where X are the training examples and Y are their respective labels(if they are categorical) or values(if they are numerical)mrf.fit(X, y)
Prediction
- The model can be now used to predict new instances.
- Class/value:
mrf.predict(x)
- Probability:
mrf.predict_proba(x)
TODO:
- Speed up the learning algorithm implementation (morfist is currently much slower than the Random Forest implementation available in scikit-learn)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for decision-tree-morfist-0.1.4.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3c8c269ef567077a04680e5d9d1b368aa7e21274e77d96900395712221afec84 |
|
MD5 | 20a8e342d44273da9cba65917dcf0dfc |
|
BLAKE2b-256 | 97560e32a1644651d80607187d6c7051b0c88b732217a49d34cebbf10fef7715 |
Hashes for decision_tree_morfist-0.1.4-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e8ec658e9a4cc9e52c2bde60edb932ddc99a9e98225d49cfe49a057565602c3f |
|
MD5 | 85f63dfd7f417ce507865ff3350fb3e2 |
|
BLAKE2b-256 | 3a1da5277aab662b16f46ca8e2c3aeed1d872ebd18f3b42cbd028bc6d31f583c |