Combination Robust Cut Forests
Project description
Combination Robust Cut Forests
Isolation Forests [Liu+2008] and Robust Random Cut Trees [Guha+2016] are very similar in many ways, as outlined in the supporting overview. Most notably, they are extremes of the same outlier scoring function:
$$\theta \textrm{Depth} + (1 - \theta) \textrm{[Co]Disp}$$
The combination robust cut forest allows you to combine both scores by using an theta other than 0 or 1.
Install
You can install with through pip install crcf
. Alternatively, you can download the repository and run
python3 setup.py install
or pip3 install .
Please note that this package uses features from Python 3.7+
and is not compatible with earlier Python versions.
Tasks
- complete basic implementation
- provide clear documentation and usage instructions
- ensure interface allows for fitting and scoring on multiple points at the same time
- implement a better saving method than pickling
- use random tests with hypothesis
- implement tree down in cython
- accelerate forests with multi-threading
- incorporate categorical variable support, including categorical rules
- complete the write-up document with a benchmarking of performance
References
- [Liu+2008]: Liu, Fei Tony, Kai Ming Ting, and Zhi-Hua Zhou. "Isolation forest." In 2008 Eighth IEEE International Conference on Data Mining, pp. 413-422. IEEE, 2008.
- [Guha+2016]: Guha, Sudipto, Nina Mishra, Gourav Roy, and Okke Schrijvers. "Robust random cut forest based anomaly detection on streams." In International conference on machine learning, pp. 2712-2721. 2016.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.