Skip to main content

Concentration Free Outlier Factor

Project description

CFOF (Concentration Free Outlier Factor)

🚧 Work In Progress..

Python implementation of Concentration Free Outlier Factor (CFOF) [1].

CFOF properties

  • Concentration free
  • Does not suffer of the hubness problem
  • Semi–locality
  • fast-CFOF algorithm allows to calculate reliably CFOF scores with linear cost both in the dataset size and dimensionality

Installation

To install the latest release:

$ pip install cfof

Usage

Import CFOF and FastCFOF.

>>> from cfof import CFOF, FastCFOF
>>> import numpy as np

Load data.

>>> X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])

Instantiate CFOF or FastCFOF, then call .compute(X) to calculate the scores. .compute(X) returns sc, where sc[i, l] is score of object i for ϱ_l (rhos[l]).

You can also calculate CFOF scores from a precomputed distance matrix using .compute_from_distance_matrix().

CFOF (hard-CFOF)

Use compute to compute CFOF scores directly from data.

>>> cfof_clf = CFOF(metric='euclidean', rhos=[0.5, 0.6], n_jobs=1)
>>> cfof_clf.compute(X)
array([[0.5       , 0.66666667],
       [0.33333333, 0.83333333],
       [0.5       , 1.        ],
       [0.5       , 0.66666667],
       [0.33333333, 0.83333333],
       [0.5       , 1.        ]])

Use compute_from_distance_matrix to compute CFOF scores from a precomputed distance matrix.

>>> from sklearn.metrics import pairwise_distances
>>> distance_matrix = pairwise_distances(X, metric='euclidean')
>>> cfof_clf.compute_from_distance_matrix(distance_matrix)
array([[0.5       , 0.66666667],
       [0.33333333, 0.83333333],
       [0.5       , 1.        ],
       [0.5       , 0.66666667],
       [0.33333333, 0.83333333],
       [0.5       , 1.        ]])

FastCFOF (soft-CFOF)

Use compute to compute CFOF scores directly from data.

>>> np.random.seed(10)
>>> X = np.random.randint(0, 100, size=(1000, 3))
>>>
>>> fast_cfof_clf = FastCFOF(metric='euclidean',
...                          rhos=[0.001, 0.005, 0.01, 0.05, 0.1],
...                          epsilon=0.1, delta=0.1, n_bins=50, n_jobs=1)
>>> fast_cfof_clf.compute(X)
array([[0.00954095, 0.00954095, 0.01930698, 0.05963623, 0.10481131],
       [0.00954095, 0.00954095, 0.01930698, 0.06866488, 0.10481131],
       [0.00954095, 0.00954095, 0.02559548, 0.06866488, 0.10481131],
       ...,
       [0.00954095, 0.00954095, 0.01930698, 0.05963623, 0.10481131],
       [0.00954095, 0.00954095, 0.03393222, 0.15998587, 0.24420531],
       [0.00954095, 0.00954095, 0.02559548, 0.0390694 , 0.09102982]])

Use compute_from_distance_matrix to compute CFOF scores from a precomputed distance matrix.

>>> from sklearn.metrics import pairwise_distances
>>> distance_matrix = pairwise_distances(X, metric='euclidean')
>>> fast_cfof_clf.compute_from_distance_matrix(distance_matrix)
array([[0.00954095, 0.00954095, 0.01930698, 0.05963623, 0.10481131],
       [0.00954095, 0.00954095, 0.01930698, 0.06866488, 0.10481131],
       [0.00954095, 0.00954095, 0.02559548, 0.06866488, 0.10481131],
       ...,
       [0.00954095, 0.00954095, 0.01930698, 0.05963623, 0.10481131],
       [0.00954095, 0.00954095, 0.03393222, 0.15998587, 0.24420531],
       [0.00954095, 0.00954095, 0.02559548, 0.0390694 , 0.09102982]])

CFOFiSAX

This library provides a wrapper for pyCFOFiSAX [1]

>>> from cfof.cfof_isax import CFOFiSAXWrapper

Refer to pyCFOFiSAX documentation for more details.

TODOs

  • Add support for faiss (GPU).
  • Parallelize FastCFOF.
  • Add unit tests.
  • Add benchmarks.
  • Wrap pyCFOFiSAX.

References

[1] ANGIULLI, Fabrizio. CFOF: a concentration free measure for anomaly detection. ACM Transactions on Knowledge Discovery from Data (TKDD), 2020, vol. 14, no 1, p. 1-53.

[2] FOULON, Lucas, FENET, Serge, RIGOTTI, Christophe, et al. Scoring Message Stream Anomalies in Railway Communication Systems. In : 2019 International Conference on Data Mining Workshops (ICDMW). IEEE, 2019. p. 769-776.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cfof-0.4.0.tar.gz (8.0 kB view details)

Uploaded Source

Built Distribution

cfof-0.4.0-py3-none-any.whl (8.3 kB view details)

Uploaded Python 3

File details

Details for the file cfof-0.4.0.tar.gz.

File metadata

  • Download URL: cfof-0.4.0.tar.gz
  • Upload date:
  • Size: 8.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/0.0.0 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.8.8

File hashes

Hashes for cfof-0.4.0.tar.gz
Algorithm Hash digest
SHA256 a1e94858f291a7317114eee71a7a8b0b6da44be1e1a571ce8d027c894e101735
MD5 14aeb13ac2f2a1719cc3cca09ad03d15
BLAKE2b-256 c7a0bd82985fd239a891e9330642be4adc624e31c2179cd67322186842e2a973

See more details on using hashes here.

File details

Details for the file cfof-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: cfof-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 8.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/0.0.0 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.8.8

File hashes

Hashes for cfof-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 821024261efa5e290c85fa2ec1c44625ef05215be204173273e85abfcae7295b
MD5 137a1383ec4870345a1e27b29122ba67
BLAKE2b-256 8aad5d1ffb7d767611fb4af363694be5620ef80a91c23123b89863da93cdf716

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page