Skip to main content

No project description provided

Project description

Module unavoids

Functions

getAllNCDFs(X, p=0.0625, ncpus=4) : Calculate the NCDF for all samples in parallel using a specified norm.

Parameters
----------
X : numpy array of shape (n_samples, m_features)
    Data matrix where `n_samples` is the number of samples
    and `n_features` is the number of features.
p : float or np.inf constant
    The norm to use when calculating the distance between
    samples in `X`. If np.inf is supplied, then Chebyshev
    distance is used.
ncpus : int
    The number of parallel processes.

Returns
----------
NCDFs : numpy array of shape (n_samples, n_samples)
    The i-th row equals the NCDF for the i-th sample in `X`,
    while the j-th column of the i-th row equals NCDF_xi(j)

getBetaFractions(NCDFs_L, BetaSorted, BetaRanks, fraction_WSS, index) : Calculate the UNVAOIDS outlier score for a given sample using the fractions of all gaps method.

Parameters
----------
NCDFs_L : numpy array of shape (n_samples, L_levels): 
    An array containing the intercepts for n NCDFs at L beta
    levels, where `n_samples` is the number of samples and
    `L_levels` is the number of beta levels.
BetaSorted : numpy array of shape (n_samples, L_levels): 
    Rhe same as `NCDFs_L` but the intercepts are sorted along
    the L beta levels (column-wise sort of NCDFs_L).
BetaRanks : numpy array of shape (n_samples, L_levels): 
    The same as `NCDFs_L` but the value at `NCDFs_L[i,j]` is
    replaced with the rank of `NCDFs_L[i,j]` on a given beta
    horizontal.
fraction_WSS : int 
    The number of nearest intercepts to be encompassed by the
    gap whose size will be the score for a given beta level
    and NCDF intercept. Assumed to be less than 
    `n_samples/2`.
index : int 
    The row index of the NCDF in `NCDFs_L` which we are
    finding the outlier score of.

Returns
----------
score : numpy array of shape (1, 1)
    The highest outlier score for `NCDF_L[index,:]` across
    all beta levels.

getBetaHist(NCDFs_L, BetaSorted, index) : Calculate the UNVAOIDS outlier score for a given sample using the histogram method.

Parameters
----------
NCDFs_L : numpy array of shape (n_samples, L_levels)
    An array containing the intercepts for n NCDFs at L beta
    levels, where `n_samples` is the number of samples and
    `L_levels` is the number of beta levels.
BetaSorted : numpy array of shape (n_samples, L_levels)
    Rhe same as `NCDFs_L` but the intercepts are sorted along
    the L beta levels (column-wise sort of NCDFs_L).
index : int 
    The row index of the NCDF in `NCDFs_L` which we are
    finding the outlier score of.

Returns
----------
score : numpy array of shape (1, 1)
    The highest outlier score for `NCDF_L[index,:]` across
    all beta levels.

getNCDF(X, p, index) : Calculate the NCDF for a single sample using a specified norm.

Parameters
----------
X : numpy array of shape (n_samples, m_features)
    Data matrix, assumed to be min max scaled to [0,1], where
    `n_samples` is the number of samples and `n_features` is
    the number of features.
p : float or np.inf constant
    The norm to use when calculating the distance between
    samples in `X`. If np.inf is supplied, then Chebyshev
    distance is used.
index : int
    The index of the sample in `X` which we are finding the
    NCDF of. Assumed to be less than `n_samples`.

Returns
----------
NCDFxi : numpy array of shape (1, m_features) 
    The NCDF of `X[i,:]` where i = `index` and the j-th value equals
    NCDF_xi(j)

unavoidsScore(X, precomputed=False, p=0.0625, returnNCDFs=True, method='fractions', r=0.01, L=100, ncpus=4) : Calculate the UNVAOIDS outlier score for all samples in 'X'.

Parameters
----------
X : numpy array of shape (n_samples, m_features)
    Data matrix where `n_samples` is the number of samples
    and `n_features` is the number of features.
precomputed : bool, default=True
    If True, `X` is assumed to be an NCDF array in the same
    format as that returned by `getAllNCDFs`.
p : float or np.inf constant
    The norm to use when calculating the distance between
    samples in `X`. If np.inf is supplied, then Chebyshev
    distance is used.
returnNCDFs : bool, default=True
    If True, NCDF array is returned along with outlier
    scores.
method : {"fractions", "histogram"}, default="fractions"
    Specifies which method to use for calculating outlier
    scores; either "fractions" or "histogram".
r : float
    Percentage of nearest intercepts to be encompassed by the
    gap whose size will be the score for a given beta and
    NCDF intercept in the "fractions" method. Ignored if
    `method` == "histogram".
L : int
    The number of beta levels to use.
ncpus : int
    The number of parallel processes to use.

Returns
----------
scores : numpy array of shape (n_samples, 1)
    The i-th element in scores is the UNAVOIDS outlier score
    for the i-th sample(row) in `X`.
NCDFs : numpy array of shape (n_samples, n_samples)
    The i-th row equals the NCDF for the i-th sample in `X`,
    while the j-th column of the i-th row equals NCDF_xi(j).
    Only returned if `returnNCDFs` == True.

References
----------
.. [1] W. A. Yousef, I. Traore and W. Briguglio, (2021)
   "UN-AVOIDS: Unsupervised and Nonparametric Approach for
   Visualizing Outliers and Invariant Detection Scoring",
   IEEE Transactions on Information Forensics and Security,
   vol. 16, pp. 5195-5210, [doi: 10.1109/TIFS.2021.3125608]

Examples
--------
>>> import numpy as np
>>> from joblib import load
>>> from unavoids import unavoids
>>> from sklearn import metrics
>>>
>>> X_all = load("simData.joblib")
>>> Y = np.zeros((X_all.shape[0],))
>>> Y[-3:] = 1         #last three samples are outliers
>>> X = X_all[:,:4]    #grab first 4 features
>>>
>>> scores, NCDFs = unavoids.unavoidsScore(X, p=0.0625, returnNCDFs=True, method="fractions")
>>> fpr, tpr, thresholds = metrics.roc_curve(Y, scores)
>>> metrics.auc(fpr, tpr)
1.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

unavoids-1.3.tar.gz (18.4 kB view hashes)

Uploaded Source

Built Distribution

unavoids-1.3-py3-none-any.whl (21.1 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page