Outlier detection algorithm for graph datasets
Project description
outgraph
outgraph
is a simple outlier detection algorithm for graph datasets. Given a list of graphs, it uses Mahalanobis distance detect which graphs are outliers based on either their topology or node attributes.
Note:
outgraph
only works for datasets where each graph has an equal number of nodes.
Installation
You can install outgraph
with pip
:
$ pip install outgraph
How it Works
Unlike most approaches to graph outlier detection, outgraph
does not use machine learning. Instead, each graph is converted into a vector representation using one of three available methods:
- Averaging the node feature/attribute vectors
- Flattening the adjacency matrix
- A concatenation of 1 and 2
Then, the Mahalanobis distance between each vector and the distribution of vectors is calculated. Lastly, a Chi-Squared distribution is used to model the distribution of distances and identify the distances outside a cutoff threshold (e.g. p < 0.05).
This approach is based off this article.
Usage
Each graph in your dataset needs to be an instance of outgraph.Graph
. This object has two parameters, node_attrs
and adjacency_matrix
–– both numpy arrays where the indices correspond to nodes. Example:
import numpy as np
from outgraph import Graph
node_attrs = np.array([[-1], [0], [1]])
adj_matrix = np.array([[1, 1, 0],
[1, 1, 1],
[0, 1, 1]])
graph = Graph(node_attrs, adj_matrix)
Once you have a list of Graph
objects, simply submit them into outgraph.detect_outliers
:
from outgraph import Graph, detect_outliers
graphs = [Graph(), ...]
outliers, indices = detect_outliers(graphs, method=1, p_value=0.05)
Notice the method
and p_value
parameters. The method
parameter is an integer between 1 and 3 that corresponds to one of the three graph vectorization methods described in the ![How it Works](##How it Works) section. p_value
is the outlier cutoff threshold.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file outgraph-1.0.0.tar.gz
.
File metadata
- Download URL: outgraph-1.0.0.tar.gz
- Upload date:
- Size: 3.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.0 CPython/3.9.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | fef669301782d7d4faff566f65d745113c3bfdbb69571cf8e6158f18b51c7a95 |
|
MD5 | 0ff440ca00e0ef7e0dea902b83f5a7ed |
|
BLAKE2b-256 | a70613c3c47e446b31a682e7d5b44f4af3f5bcc7c3953229fcb6e2835072212d |