Skip to main content

Concentration Free Outlier Factor

Project description

CFOF (Concentration Free Outlier Factor)

🚧 Work In Progress..

Python implementation of Concentration Free Outlier Factor (CFOF) [1].

CFOF properties

  • Concentration free
  • Does not suffer of the hubness problem
  • Semi–locality
  • fast-CFOF algorithm allows to calculate reliably CFOF scores with linear cost both in the dataset size and dimensionality

Installation

To install the latest release:

$ pip install cfof

Usage

Import CFOF and FastCFOF.

>>> from cfof import CFOF, FastCFOF
>>> import numpy as np

Load data.

>>> X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])

Instantiate CFOF or FastCFOF, then call .compute(X) to calculate the scores. .compute(X) returns sc, where sc[i, l] is score of object i for ϱ_l (rhos[l]).

You can also calculate CFOF scores from a precomputed distance matrix using .compute_from_distance_matrix().

CFOF (hard-CFOF)

Use compute to compute CFOF scores directly from data.

>>> cfof_clf = CFOF(metric='euclidean', rhos=[0.5, 0.6], n_jobs=1)
>>> cfof_clf.compute(X)
array([[0.5       , 0.66666667],
       [0.33333333, 0.83333333],
       [0.5       , 1.        ],
       [0.5       , 0.66666667],
       [0.33333333, 0.83333333],
       [0.5       , 1.        ]])

Use compute_from_distance_matrix to compute CFOF scores from a precomputed distance matrix.

>>> from sklearn.metrics import pairwise_distances
>>> distance_matrix = pairwise_distances(X, metric='euclidean')
>>> cfof_clf.compute_from_distance_matrix(distance_matrix)
array([[0.5       , 0.66666667],
       [0.33333333, 0.83333333],
       [0.5       , 1.        ],
       [0.5       , 0.66666667],
       [0.33333333, 0.83333333],
       [0.5       , 1.        ]])

FastCFOF (soft-CFOF)

Use compute to compute CFOF scores directly from data.

>>> np.random.seed(10)
>>> X = np.random.randint(0, 100, size=(1000, 3))
>>>
>>> fast_cfof_clf = FastCFOF(metric='euclidean',
...                          rhos=[0.001, 0.005, 0.01, 0.05, 0.1],
...                          epsilon=0.1, delta=0.1, n_bins=50, n_jobs=1)
>>> fast_cfof_clf.compute(X)
array([[0.00954095, 0.00954095, 0.01930698, 0.05963623, 0.10481131],
       [0.00954095, 0.00954095, 0.01930698, 0.06866488, 0.10481131],
       [0.00954095, 0.00954095, 0.02559548, 0.06866488, 0.10481131],
       ...,
       [0.00954095, 0.00954095, 0.01930698, 0.05963623, 0.10481131],
       [0.00954095, 0.00954095, 0.03393222, 0.15998587, 0.24420531],
       [0.00954095, 0.00954095, 0.02559548, 0.0390694 , 0.09102982]])

Use compute_from_distance_matrix to compute CFOF scores from a precomputed distance matrix.

>>> from sklearn.metrics import pairwise_distances
>>> distance_matrix = pairwise_distances(X, metric='euclidean')
>>> fast_cfof_clf.compute_from_distance_matrix(distance_matrix)
array([[0.00954095, 0.00954095, 0.01930698, 0.05963623, 0.10481131],
       [0.00954095, 0.00954095, 0.01930698, 0.06866488, 0.10481131],
       [0.00954095, 0.00954095, 0.02559548, 0.06866488, 0.10481131],
       ...,
       [0.00954095, 0.00954095, 0.01930698, 0.05963623, 0.10481131],
       [0.00954095, 0.00954095, 0.03393222, 0.15998587, 0.24420531],
       [0.00954095, 0.00954095, 0.02559548, 0.0390694 , 0.09102982]])

CFOFiSAX

This library provides a wrapper for pyCFOFiSAX [1]

>>> from cfof.cfof_isax import CFOFiSAXWrapper

Refer to pyCFOFiSAX documentation for more details.

TODOs

  • Add support for faiss (GPU).
  • Parallelize FastCFOF.
  • Add unit tests.
  • Add benchmarks.
  • Wrap pyCFOFiSAX.

References

[1] ANGIULLI, Fabrizio. CFOF: a concentration free measure for anomaly detection. ACM Transactions on Knowledge Discovery from Data (TKDD), 2020, vol. 14, no 1, p. 1-53.

[2] FOULON, Lucas, FENET, Serge, RIGOTTI, Christophe, et al. Scoring Message Stream Anomalies in Railway Communication Systems. In : 2019 International Conference on Data Mining Workshops (ICDMW). IEEE, 2019. p. 769-776.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cfof-0.4.0.tar.gz (8.0 kB view hashes)

Uploaded source

Built Distribution

cfof-0.4.0-py3-none-any.whl (8.3 kB view hashes)

Uploaded py3

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Huawei Huawei PSF Sponsor Microsoft Microsoft PSF Sponsor NVIDIA NVIDIA PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page