Skip to main content

Outlier detection algorithm for graph datasets

Project description

outgraph

outgraph is a simple outlier detection algorithm for graph datasets. Given a list of graphs, it uses Mahalanobis distance detect which graphs are outliers based on either their topology or node attributes.

Note: outgraph only works for datasets where each graph has an equal number of nodes.

Installation

You can install outgraph with pip:

$ pip install outgraph

How it Works

Unlike most approaches to graph outlier detection, outgraph does not use machine learning. Instead, each graph is converted into a vector representation using one of three available methods:

  1. Averaging the node feature/attribute vectors
  2. Flattening the adjacency matrix
  3. A concatenation of 1 and 2

Then, the Mahalanobis distance between each vector and the distribution of vectors is calculated. Lastly, a Chi-Squared distribution is used to model the distribution of distances and identify the distances outside a cutoff threshold (e.g. p < 0.05).

This approach is based off this article.

Usage

Each graph in your dataset needs to be an instance of outgraph.Graph. This object has two parameters, node_attrs and adjacency_matrix –– both numpy arrays where the indices correspond to nodes. Example:

import numpy as np
from outgraph import Graph

node_attrs = np.array([[-1], [0], [1]])
adj_matrix = np.array([[1, 1, 0],
                       [1, 1, 1],
                       [0, 1, 1]])
graph = Graph(node_attrs, adj_matrix)


Once you have a list of Graph objects, simply submit them into outgraph.detect_outliers:

from outgraph import Graph, detect_outliers

graphs = [Graph(), ...]
outliers, indices = detect_outliers(graphs, method=1, p_value=0.05)

Notice the method and p_value parameters. The method parameter is an integer between 1 and 3 that corresponds to one of the three graph vectorization methods described in the ![How it Works](##How it Works) section. p_value is the outlier cutoff threshold.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

outgraph-1.0.0.tar.gz (3.7 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page