Concentration Free Outlier Factor
Project description
CFOF (Concentration Free Outlier Factor)
🚧 Work In Progress..
Python implementation of Concentration Free Outlier Factor (CFOF) [1].
CFOF properties
- Concentration free
- Does not suffer of the hubness problem
- Semi–locality
- fast-CFOF algorithm allows to calculate reliably CFOF scores with linear cost both in the dataset size and dimensionality
Installation
To install the latest release:
$ pip install cfof
Usage
Import CFOF
and FastCFOF
.
>>> from cfof import CFOF, FastCFOF
>>> import numpy as np
Load data.
>>> X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
Instantiate CFOF
or FastCFOF
, then call .compute(X)
to calculate the scores. .compute(X)
returns sc
, where sc[i, l]
is score of object i
for ϱ_l
(rhos[l]).
You can also calculate CFOF scores from a precomputed distance matrix using
.compute_from_distance_matrix()
.
CFOF (hard-CFOF)
Use compute
to compute CFOF scores directly from data.
>>> cfof_clf = CFOF(metric='euclidean', rhos=[0.5, 0.6], n_jobs=1)
>>> cfof_clf.compute(X)
array([[0.5 , 0.66666667],
[0.33333333, 0.83333333],
[0.5 , 1. ],
[0.5 , 0.66666667],
[0.33333333, 0.83333333],
[0.5 , 1. ]])
Use compute_from_distance_matrix
to compute CFOF scores from a precomputed
distance matrix.
>>> from sklearn.metrics import pairwise_distances
>>> distance_matrix = pairwise_distances(X, metric='euclidean')
>>> cfof_clf.compute_from_distance_matrix(distance_matrix)
array([[0.5 , 0.66666667],
[0.33333333, 0.83333333],
[0.5 , 1. ],
[0.5 , 0.66666667],
[0.33333333, 0.83333333],
[0.5 , 1. ]])
FastCFOF (soft-CFOF)
Use compute
to compute CFOF scores directly from data.
>>> np.random.seed(10)
>>> X = np.random.randint(0, 100, size=(1000, 3))
>>>
>>> fast_cfof_clf = FastCFOF(metric='euclidean',
... rhos=[0.001, 0.005, 0.01, 0.05, 0.1],
... epsilon=0.1, delta=0.1, n_bins=50, n_jobs=1)
>>> fast_cfof_clf.compute(X)
array([[0.00954095, 0.00954095, 0.01930698, 0.05963623, 0.10481131],
[0.00954095, 0.00954095, 0.01930698, 0.06866488, 0.10481131],
[0.00954095, 0.00954095, 0.02559548, 0.06866488, 0.10481131],
...,
[0.00954095, 0.00954095, 0.01930698, 0.05963623, 0.10481131],
[0.00954095, 0.00954095, 0.03393222, 0.15998587, 0.24420531],
[0.00954095, 0.00954095, 0.02559548, 0.0390694 , 0.09102982]])
Use compute_from_distance_matrix
to compute CFOF scores from a precomputed
distance matrix.
>>> from sklearn.metrics import pairwise_distances
>>> distance_matrix = pairwise_distances(X, metric='euclidean')
>>> fast_cfof_clf.compute_from_distance_matrix(distance_matrix)
array([[0.00954095, 0.00954095, 0.01930698, 0.05963623, 0.10481131],
[0.00954095, 0.00954095, 0.01930698, 0.06866488, 0.10481131],
[0.00954095, 0.00954095, 0.02559548, 0.06866488, 0.10481131],
...,
[0.00954095, 0.00954095, 0.01930698, 0.05963623, 0.10481131],
[0.00954095, 0.00954095, 0.03393222, 0.15998587, 0.24420531],
[0.00954095, 0.00954095, 0.02559548, 0.0390694 , 0.09102982]])
CFOFiSAX
This library provides a wrapper for pyCFOFiSAX [1]
>>> from cfof.cfof_isax import CFOFiSAXWrapper
Refer to pyCFOFiSAX
documentation
for more details.
TODOs
- Add support for
faiss
(GPU). - Parallelize FastCFOF.
- Add unit tests.
- Add benchmarks.
- Wrap pyCFOFiSAX.
References
[1] ANGIULLI, Fabrizio. CFOF: a concentration free measure for anomaly detection. ACM Transactions on Knowledge Discovery from Data (TKDD), 2020, vol. 14, no 1, p. 1-53.
[2] FOULON, Lucas, FENET, Serge, RIGOTTI, Christophe, et al. Scoring Message Stream Anomalies in Railway Communication Systems. In : 2019 International Conference on Data Mining Workshops (ICDMW). IEEE, 2019. p. 769-776.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file cfof-0.4.0.tar.gz
.
File metadata
- Download URL: cfof-0.4.0.tar.gz
- Upload date:
- Size: 8.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/0.0.0 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.8.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a1e94858f291a7317114eee71a7a8b0b6da44be1e1a571ce8d027c894e101735 |
|
MD5 | 14aeb13ac2f2a1719cc3cca09ad03d15 |
|
BLAKE2b-256 | c7a0bd82985fd239a891e9330642be4adc624e31c2179cd67322186842e2a973 |
File details
Details for the file cfof-0.4.0-py3-none-any.whl
.
File metadata
- Download URL: cfof-0.4.0-py3-none-any.whl
- Upload date:
- Size: 8.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/0.0.0 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.8.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 821024261efa5e290c85fa2ec1c44625ef05215be204173273e85abfcae7295b |
|
MD5 | 137a1383ec4870345a1e27b29122ba67 |
|
BLAKE2b-256 | 8aad5d1ffb7d767611fb4af363694be5620ef80a91c23123b89863da93cdf716 |