Skip to main content

Implementation of frequent subgraph mining algorithm gSpan

Project description

gSpan

For Chinese readme, please go to README-Chinese.

gSpan is an algorithm for mining frequent subgraphs.

This program implements gSpan with Python. The repository on GitHub is https://github.com/betterenvi/gSpan. This implementation borrows some ideas from gboost.

Undirected Graphs

This program supports undirected graphs, and produces same results with gboost on the dataset graphdata/graph.data.

Directed Graphs

So far(date: 2016-10-29), gboost does not support directed graphs. This program implements gSpan for directed graphs. More specific, this program can mine frequent directed subgraph that has at least one node that can reach other nodes in the subgraph. But correctness is not guaranteed since the author did not do enough testing. After running several times on datasets graphdata/graph.data.directed.1 and graph.data.simple.5, there is no fault.

How to install

This program supports both Python 2 and Python 3.

Method 1

Install this project using pip:

pip install gspan-mining
Method 2

First, clone the project:

git clone https://github.com/betterenvi/gSpan.git
cd gSpan

You can optionally install this project as a third-party library so that you can run it under any path.

python setup.py install

How to run

The command is:

python -m gspan_mining [-s min_support] [-n num_graph] [-l min_num_vertices] [-u max_num_vertices] [-d True/False] [-v True/False] [-p True/False] [-w True/False] [-h] database_file_name 
Some examples
  • Read graph data from ./graphdata/graph.data, and mine undirected subgraphs given min support is 5000
python -m gspan_mining -s 5000 ./graphdata/graph.data
  • Read graph data from ./graphdata/graph.data, mine undirected subgraphs given min support is 5000, and visualize these frequent subgraphs(matplotlib and networkx are required)
python -m gspan_mining -s 5000 -p True ./graphdata/graph.data
  • Read graph data from ./graphdata/graph.data, and mine directed subgraphs given min support is 5000
python -m gspan_mining -s 5000 -d True ./graphdata/graph.data
  • Print help info
python -m gspan_mining -h

The author also wrote example code using Jupyter Notebook. Mining results and visualizations are presented. For detail, please refer to main.ipynb.

Running time

  • Environment

    • OS: Windows 10
    • Python version: Python 2.7.12
    • Processor: Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz 3.60 GHz
    • Ram: 8.00 GB
  • Running time On the dataset ./graphdata/graph.data, running time is listed below:

Min support Number of frequent subgraphs Time
5000 26 51.48 s
3000 52 69.07 s
1000 455 3 m 49 s
600 1235 7 m 29 s
400 2710 12 m 53 s

Reference

gSpan: Graph-Based Substructure Pattern Mining, by X. Yan and J. Han. Proc. 2002 of Int. Conf. on Data Mining (ICDM'02).

One C++ implementation of gSpan.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gspan_mining-0.2.3.tar.gz (9.0 kB view details)

Uploaded Source

Built Distribution

gspan_mining-0.2.3-py3-none-any.whl (10.6 kB view details)

Uploaded Python 3

File details

Details for the file gspan_mining-0.2.3.tar.gz.

File metadata

  • Download URL: gspan_mining-0.2.3.tar.gz
  • Upload date:
  • Size: 9.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.10.0 setuptools/35.0.1 requests-toolbelt/0.8.0 tqdm/4.23.4 CPython/2.7.12

File hashes

Hashes for gspan_mining-0.2.3.tar.gz
Algorithm Hash digest
SHA256 917257f9aaf6703cda275f366f290fe4c094469f290bd3b6e77db21201abe56a
MD5 d1e1dabd51448b3f18c71e8e08430713
BLAKE2b-256 54b8de0f5bcca31efcf658390e4711a8247a2050611dcd7de7d746114c9b6fd6

See more details on using hashes here.

File details

Details for the file gspan_mining-0.2.3-py3-none-any.whl.

File metadata

  • Download URL: gspan_mining-0.2.3-py3-none-any.whl
  • Upload date:
  • Size: 10.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.10.0 setuptools/35.0.1 requests-toolbelt/0.8.0 tqdm/4.23.4 CPython/2.7.12

File hashes

Hashes for gspan_mining-0.2.3-py3-none-any.whl
Algorithm Hash digest
SHA256 bc9609f3c78fa5946d0d0320ae4067fa90a22fdf99ee45d3ea6fe1ee17b155f3
MD5 39679639dd2cce5b2d7ce381e3a09c57
BLAKE2b-256 f19aeacb5074229108a4161f4d446ce70468816278f5ee215bdc46de60ca74eb

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page