Skip to main content

Mining graphs with Subgroup Discovery

Project description

GraphSD

GraphSD (Graph-based Subgroup Discovery) is a Python package for detecting exceptional interaction patterns in graphs. It builds spatio-temporal graphs from position and attribute data, then applies rule-based subgroup discovery and outlier detection techniques to uncover meaningful and rare behaviors.

PyPI version
License: BSD-3-Clause


✨ Features

  • Directed and multi-directed interaction graph construction
  • Subgroup discovery using interpretable rule-based conditions
  • Outlier detection and quality-based ranking
  • Spatio-temporal interaction filtering using distance and velocity
  • Binning and discretization utilities
  • Built-in graph visualizations with pattern overlays
  • Pure Python: no dependency on Orange3 or external mining engines

📦 Installation

Install via PyPI:

pip install graph-sd

🚀 Example Usage

from graphsd.mining import DigraphSDMining
from graphsd.utils import make_bins
from graphsd._base import load_data
from graphsd.viz import graph_viz
import networkx as nx

# Load sample position and social data
position_df, social_df = load_data("playground_a")

# Discretize social attributes
social_df = make_bins(social_df)

# Initialize the subgroup discovery engine
dig = DigraphSDMining(random_state=42)

# Build the interaction graph using position and attribute data
dig.read_data(position_df, social_df, time_step=10)

# Discover subgroups with quality constraints
subgroups = dig.subgroup_discovery(
    mode="to",
    min_support=0.2,
    metric="mean",
    quality_measure="global_proportion"
)

# Convert to a DataFrame and print
df = dig.to_dataframe(subgroups)
print(df)

# Visualize the graph and highlighted subgroups
graph_viz(dig.graph, layout=nx.spring_layout)

🧠 Code Structure

Module Purpose
mining.py Main API for graph construction and subgroup discovery
patterns.py Logic for rule quality, coverage, and pattern filters
outlier.py Tools for subgroup scoring and ranking
utils.py Preprocessing, binning, and distance computations
viz.py Graph and subgroup visualizations
_base.py Sample data loader (e.g. load_data("playground_a"))

📄 License

This project is licensed under the BSD 3-Clause License.


👥 Authors

  • Carolina Centeio Jorge – TU Delft
  • Cláudio Rebelo de Sá – Leiden University

🌐 Links


📚 Citation

If you use GraphSD in your research, please cite:

📝 Journal Article (Expert Systems, 2023)

Jorge, C.C., Atzmueller, M., Heravi, B.M., Gibson, J.L., Rossetti, R.J.F., & Rebelo de Sá, C.
"Want to come play with me?" Outlier subgroup discovery on spatio-temporal interactions.
Expert Systems, 40(5), 2023.
https://doi.org/10.1111/exsy.12686

@article{DBLP:journals/es/JorgeAHGRS23,
  author  = {Carolina Centeio Jorge and Martin Atzmueller and Behzad Momahed Heravi and
             Jenny L. Gibson and Rosaldo J. F. Rossetti and Cl{'a}udio Rebelo de S{'a}},
  title   = {"Want to come play with me?" Outlier subgroup discovery on spatio-temporal interactions},
  journal = {Expert Syst. J. Knowl. Eng.},
  volume  = {40},
  number  = {5},
  year    = {2023},
  doi     = {10.1111/EXSY.12686}
}

📘 Conference Paper (EPIA 2019)

Jorge, C.C., Atzmueller, M., Heravi, B.M., Gibson, J.L., Rebelo de Sá, C., & Rossetti, R.J.F.
Mining Exceptional Social Behaviour. In EPIA 2019, LNCS 11805, Springer.
https://doi.org/10.1007/978-3-030-30244-3_38

@inproceedings{DBLP:conf/epia/JorgeAHGSR19,
  author    = {Carolina Centeio Jorge and Martin Atzmueller and Behzad Momahed Heravi and
               Jenny L. Gibson and Cl{'a}udio Rebelo de S{'a} and Rosaldo J. F. Rossetti},
  title     = {Mining Exceptional Social Behaviour},
  booktitle = {Progress in Artificial Intelligence - 19th EPIA 2019},
  series    = {Lecture Notes in Computer Science},
  volume    = {11805},
  pages     = {460--472},
  publisher = {Springer},
  year      = {2019},
  doi       = {10.1007/978-3-030-30244-3_38}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

graph_sd-0.3.1.tar.gz (773.7 kB view details)

Uploaded Source

Built Distribution

graph_sd-0.3.1-py3-none-any.whl (770.8 kB view details)

Uploaded Python 3

File details

Details for the file graph_sd-0.3.1.tar.gz.

File metadata

  • Download URL: graph_sd-0.3.1.tar.gz
  • Upload date:
  • Size: 773.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.11

File hashes

Hashes for graph_sd-0.3.1.tar.gz
Algorithm Hash digest
SHA256 98d19df57a648b4491020f2bdac7cdb660874bd9ff9b872f55f21c51aaae8b12
MD5 956de214db3d9791ae3618a9a7d85729
BLAKE2b-256 e3a494cc6f0c7ff57e058de34f32cf49d02ea36663cf5792dc9ccf7994c550eb

See more details on using hashes here.

File details

Details for the file graph_sd-0.3.1-py3-none-any.whl.

File metadata

  • Download URL: graph_sd-0.3.1-py3-none-any.whl
  • Upload date:
  • Size: 770.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.11

File hashes

Hashes for graph_sd-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 aba6f9fbdd93a1b432e3632cc3d0b2fa3153e58e12af3d6aa342cb8b9b616b34
MD5 733c7fbfb2dcb3877118852b6c96da5a
BLAKE2b-256 9b9f2babeb4546a7e39e148ed6bdaa92a835dfab2f448ff380ea616a7d71c621

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page