A Python Library for Graph Outlier Detection (Anomaly Detection)
Project description
PyGOD is a Python library for graph outlier detection (anomaly detection). This exciting yet challenging field has many key applications, e.g., detecting suspicious activities in social networks [1] and security systems [2].
PyGOD includes more than 10 latest graph-based detection algorithms, such as DOMINANT (SDM’19) and GUIDE (BigData’21). For consistency and accessibility, PyGOD is developed on top of PyTorch Geometric (PyG) and PyTorch, and follows the API design of PyOD. See examples below for detecting outliers with PyGOD in 5 lines!
PyGOD is featured for:
Unified APIs, detailed documentation, and interactive examples across various graph-based algorithms.
Comprehensive coverage of more than 10 latest graph outlier detectors.
Full support of detections at multiple levels, such as node-, edge- (WIP), and graph-level tasks (WIP).
Scalable design for processing large graphs via mini-batch and sampling.
Streamline data processing with PyG–fully compatible with PyG data objects.
Outlier Detection Using PyGOD with 5 Lines of Code:
# train a dominant detector
from pygod.models import DOMINANT
model = DOMINANT(num_layers=4, epoch=20) # hyperparameters can be set here
model.fit(data) # data is a Pytorch Geometric data object
# get outlier scores on the input data
outlier_scores = model.decision_scores # raw outlier scores on the input data
# predict on the new data in the inductive setting
outlier_scores = model.decision_function(test_data) # raw outlier scores on the input data # predict raw outlier scores on test
Citing PyGOD:
PyGOD paper is available on arxiv. If you use PyGOD in a scientific publication, we would appreciate citations to the following paper:
@article{pygod2022, author = {Liu, Kay and Dou, Yingtong and Zhao, Yue and Ding, Xueying and Hu, Xiyang and Zhang, Ruitong and Ding, Kaize and Chen, Canyu and Peng, Hao and Shu, Kai and Chen, George H. and Jia, Zhihao and Yu, Philip S.}, title = {PyGOD: A Python Library for Graph Outlier Detection}, journal = {arXiv preprint arXiv:2204.12095}, year = {2022}, }
or:
Liu, K., Dou, Y., Zhao, Y., Ding, X., Hu, X., Zhang, R., Ding, K., Chen, C., Peng, H., Shu, K., Chen, G.H., Jia, Z., and Yu, P.S. 2022. PyGOD: A Python Library for Graph Outlier Detection. arXiv preprint arXiv:2204.12095.
Installation
It is recommended to use pip or conda (wip) for installation. Please make sure the latest version is installed, as PyGOD is updated frequently:
pip install pygod # normal install
pip install --upgrade pygod # or update if needed
Alternatively, you could clone and run setup.py file:
git clone https://github.com/pygod-team/pygod.git
cd pygod
pip install .
Required Dependencies:
Python 3.6 +
numpy>=1.19.4
scikit-learn>=0.22.1
scipy>=1.5.2
setuptools>=50.3.1.post20201107
Note on PyG and PyTorch Installation: PyGOD depends on PyTorch Geometric (PyG), PyTorch, and networkx. To streamline the installation, PyGOD does NOT install these libraries for you. Please install them from the above links for running PyGOD:
torch>=1.10
pytorch_geometric>=2.0.3
networkx>=2.6.3
API Cheatsheet & Reference
Full API Reference: (https://docs.pygod.org). API cheatsheet for all detectors:
fit(X): Fit detector.
decision_function(G): Predict raw anomaly score of PyG data G using the fitted detector.
Key Attributes of a fitted model:
decision_scores_: The outlier scores of the training data. The higher, the more abnormal. Outliers tend to have higher scores.
labels_: The binary labels of the training data. 0 stands for inliers and 1 for outliers/anomalies.
For the inductive setting:
predict(G): Predict if nodes in PyG data G is an outlier or not using the fitted detector.
predict_proba(G): Predict the probability of nodes in PyG data G being outlier using the fitted detector.
predict_confidence(G): Predict the model’s node-wise confidence (available in predict and predict_proba) [3].
Input of PyGOD: Please pass in a PyTorch Geometric (PyG) data object. See PyG data processing examples.
Implemented Algorithms
PyGOD toolkit consists of two major functional groups:
(i) Node-level detection :
Type |
Backbone |
Abbr |
Year |
Sampling |
Ref |
---|---|---|---|---|---|
Unsupervised |
MLP |
MLPAE |
2014 |
Yes |
|
Unsupervised |
GNN |
GCNAE |
2016 |
Yes |
|
Unsupervised |
MF |
ONE |
2019 |
No |
|
Unsupervised |
GNN |
DOMINANT |
2019 |
Yes |
|
Unsupervised |
GNN |
DONE |
2020 |
Yes |
|
Unsupervised |
GNN |
AdONE |
2020 |
Yes |
|
Unsupervised |
GNN |
AnomalyDAE |
2020 |
Yes |
|
Unsupervised |
GAN |
GAAN |
2020 |
Yes |
|
Unsupervised |
GNN |
OCGNN |
2021 |
Yes |
|
Unsupervised/SSL |
GNN |
CoLA (beta) |
2021 |
In progress |
|
Unsupervised/SSL |
GNN |
ANEMONE (beta) |
2021 |
In progress |
|
Unsupervised |
GNN |
GUIDE |
2021 |
Yes |
|
Unsupervised/SSL |
GNN |
CONAD |
2022 |
Yes |
(ii) Utility functions :
Type |
Name |
Function |
Documentation |
---|---|---|---|
Metric |
eval_precision_at_k |
Calculating Precision@k |
|
Metric |
eval_recall_at_k |
Calculating Recall@k |
|
Metric |
eval_roc_auc |
Calculating ROC-AUC Score |
|
Metric |
eval_average_precision |
Calculating average precision |
|
Data |
gen_structure_outliers |
Generating structural outliers |
|
Data |
gen_attribute_outliers |
Generating attribute outliers |
Quick Start for Outlier Detection with PyGOD
“A Blitz Introduction” demonstrates the basic API of PyGOD using the dominant detector. It is noted that the API across all other algorithms are consistent/similar.
How to Contribute
You are welcome to contribute to this exciting project:
See contribution guide for more information.
PyGOD Team
PyGOD is a great team effort by researchers from UIC, IIT, BUAA, ASU, and CMU. Our core team members include:
Kay Liu (UIC), Yingtong Dou (UIC), Yue Zhao (CMU), Xueying Ding (CMU), Xiyang Hu (CMU), Ruitong Zhang (BUAA), Kaize Ding (ASU), Canyu Chen (IIT),
Reach out us by submitting an issue report or send an email to dev@pygod.org.
Reference
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.