Concentration Free Outlier Factor
Project description
CFOF (Concentration Free Outlier Factor)
🚧 Work In Progress..
Python implementation of Concentration Free Outlier Factor (CFOF) [1].
CFOF properties
- Concentration free
- Does not suffer of the hubness problem
- Semi–locality
- fast-CFOF algorithm allows to calculate reliably CFOF scores with linear cost both in the dataset size and dimensionality
Installation
To install the latest release:
$ pip install cfof
Usage
Import CFOF and FastCFOF.
>>> from cfof import CFOF, FastCFOF
>>> import numpy as np
Load data.
>>> X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
Instantiate CFOF or FastCFOF, then call .compute(X) to calculate the scores. .compute(X) returns sc, where sc[i, l] is score of object i for ϱ_l (rhos[l]).
You can also calculate CFOF scores from a precomputed distance matrix using
.compute_from_distance_matrix().
CFOF (hard-CFOF)
Use compute to compute CFOF scores directly from data.
>>> cfof_clf = CFOF(metric='euclidean', rhos=[0.5, 0.6], n_jobs=1)
>>> cfof_clf.compute(X)
array([[0.5 , 0.66666667],
[0.33333333, 0.83333333],
[0.5 , 1. ],
[0.5 , 0.66666667],
[0.33333333, 0.83333333],
[0.5 , 1. ]])
Use compute_from_distance_matrix to compute CFOF scores from a precomputed
distance matrix.
>>> from sklearn.metrics import pairwise_distances
>>> distance_matrix = pairwise_distances(X, metric='euclidean')
>>> cfof_clf.compute_from_distance_matrix(distance_matrix)
array([[0.5 , 0.66666667],
[0.33333333, 0.83333333],
[0.5 , 1. ],
[0.5 , 0.66666667],
[0.33333333, 0.83333333],
[0.5 , 1. ]])
FastCFOF (soft-CFOF)
Use compute to compute CFOF scores directly from data.
>>> np.random.seed(10)
>>> X = np.random.randint(0, 100, size=(1000, 3))
>>>
>>> fast_cfof_clf = FastCFOF(metric='euclidean',
... rhos=[0.001, 0.005, 0.01, 0.05, 0.1],
... epsilon=0.1, delta=0.1, n_bins=50, n_jobs=1)
>>> fast_cfof_clf.compute(X)
array([[0.00954095, 0.00954095, 0.01930698, 0.05963623, 0.10481131],
[0.00954095, 0.00954095, 0.01930698, 0.06866488, 0.10481131],
[0.00954095, 0.00954095, 0.02559548, 0.06866488, 0.10481131],
...,
[0.00954095, 0.00954095, 0.01930698, 0.05963623, 0.10481131],
[0.00954095, 0.00954095, 0.03393222, 0.15998587, 0.24420531],
[0.00954095, 0.00954095, 0.02559548, 0.0390694 , 0.09102982]])
Use compute_from_distance_matrix to compute CFOF scores from a precomputed
distance matrix.
>>> from sklearn.metrics import pairwise_distances
>>> distance_matrix = pairwise_distances(X, metric='euclidean')
>>> fast_cfof_clf.compute_from_distance_matrix(distance_matrix)
array([[0.00954095, 0.00954095, 0.01930698, 0.05963623, 0.10481131],
[0.00954095, 0.00954095, 0.01930698, 0.06866488, 0.10481131],
[0.00954095, 0.00954095, 0.02559548, 0.06866488, 0.10481131],
...,
[0.00954095, 0.00954095, 0.01930698, 0.05963623, 0.10481131],
[0.00954095, 0.00954095, 0.03393222, 0.15998587, 0.24420531],
[0.00954095, 0.00954095, 0.02559548, 0.0390694 , 0.09102982]])
CFOFiSAX
This library provides a wrapper for pyCFOFiSAX [1]
>>> from cfof.cfof_isax import CFOFiSAXWrapper
Refer to pyCFOFiSAX documentation
for more details.
TODOs
- Add support for
faiss(GPU). - Parallelize FastCFOF.
- Add unit tests.
- Add benchmarks.
- Wrap pyCFOFiSAX.
References
[1] ANGIULLI, Fabrizio. CFOF: a concentration free measure for anomaly detection. ACM Transactions on Knowledge Discovery from Data (TKDD), 2020, vol. 14, no 1, p. 1-53.
[2] FOULON, Lucas, FENET, Serge, RIGOTTI, Christophe, et al. Scoring Message Stream Anomalies in Railway Communication Systems. In : 2019 International Conference on Data Mining Workshops (ICDMW). IEEE, 2019. p. 769-776.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cfof-0.4.0.tar.gz.
File metadata
- Download URL: cfof-0.4.0.tar.gz
- Upload date:
- Size: 8.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/0.0.0 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.8.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a1e94858f291a7317114eee71a7a8b0b6da44be1e1a571ce8d027c894e101735
|
|
| MD5 |
14aeb13ac2f2a1719cc3cca09ad03d15
|
|
| BLAKE2b-256 |
c7a0bd82985fd239a891e9330642be4adc624e31c2179cd67322186842e2a973
|
File details
Details for the file cfof-0.4.0-py3-none-any.whl.
File metadata
- Download URL: cfof-0.4.0-py3-none-any.whl
- Upload date:
- Size: 8.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/0.0.0 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.8.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
821024261efa5e290c85fa2ec1c44625ef05215be204173273e85abfcae7295b
|
|
| MD5 |
137a1383ec4870345a1e27b29122ba67
|
|
| BLAKE2b-256 |
8aad5d1ffb7d767611fb4af363694be5620ef80a91c23123b89863da93cdf716
|