Skip to main content

Implementation of Reconstruction-based Anomaly Detection with Completely Random Forest

Project description

This is the implementation of RecForest for anomaly detection, proposed in the paper “Reconstruction-based Anomaly Detection with Completely Random Forest,” SIAM International Conference on Data Mining (SDM), 2021. It is highly optimized and provides Scikit-Learn like APIs.

Installation

RecForest is available at PyPI:

$ pip install recforest

Build from Source

To use RecForest, you first need to install the package from source:

$ git clone https://github.com/xuyxu/RecForest.git
$ cd RecForest
$ python setup.py install

Notice that a C compiler is required to compile the pyx files (e.g., GCC on Linux, and MSVC on Windows). Please refer to Cython Installation for details.

Example

The code snippet below presents the minimal example on how to use RecForest for anomaly detection. Scripts on reproducing experiment results in the paper are available in the directory examples.

from recforest import RecForest
model = RecForest()
model.fit(X_train)
y_pred = model.predict(X_test)

Documentation

RecForest only has two hyper-parameters: n_estimators and max_depth. Docstrings on the input parameters are listed below.

  • n_estimators: Specify the number of decision trees in Recforest;

  • max_depth: Specify the maximum depth of decision trees in Recforest;

  • n_jobs: Specify the number of workers for joblib parallelization. -1 means using all processors;

  • random_state: Specify the random state for reproducibility.

RecForest has three public methods. Docstrings on these methods are listed below. Notice that for all methods, the accepted data format of input X is numpy array of the shape (n_samples, n_features).

  • fit(X): Fit a RecForest using the input data X;

  • apply(X): Return the leaf node ID of input data X in each decision tree;

  • predict(X): Return the anomaly score on the input data X.

Package Dependencies

  • numpy >= 1.13.3

  • scipy >= 0.19.1

  • joblib >= 0.12

  • cython >= 0.28.5

  • scikit-learn >= 0.22

A Python environment installed from conda is highly recommended. In this case, there is no need to install any package listed above.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

RecForest-0.1.0-cp38-cp38-win_amd64.whl (76.2 kB view hashes)

Uploaded CPython 3.8 Windows x86-64

RecForest-0.1.0-cp38-cp38-manylinux2010_x86_64.whl (370.9 kB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.12+ x86-64

RecForest-0.1.0-cp38-cp38-manylinux1_x86_64.whl (370.9 kB view hashes)

Uploaded CPython 3.8

RecForest-0.1.0-cp37-cp37m-win_amd64.whl (75.1 kB view hashes)

Uploaded CPython 3.7m Windows x86-64

RecForest-0.1.0-cp37-cp37m-manylinux2010_x86_64.whl (335.6 kB view hashes)

Uploaded CPython 3.7m manylinux: glibc 2.12+ x86-64

RecForest-0.1.0-cp37-cp37m-manylinux1_x86_64.whl (335.6 kB view hashes)

Uploaded CPython 3.7m

RecForest-0.1.0-cp36-cp36m-win_amd64.whl (75.1 kB view hashes)

Uploaded CPython 3.6m Windows x86-64

RecForest-0.1.0-cp36-cp36m-manylinux2010_x86_64.whl (335.5 kB view hashes)

Uploaded CPython 3.6m manylinux: glibc 2.12+ x86-64

RecForest-0.1.0-cp36-cp36m-manylinux1_x86_64.whl (335.5 kB view hashes)

Uploaded CPython 3.6m

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page