Skip to main content

A Python library for graph kernels, graph edit distances, and graph pre-images

Project description

graphkit-learn

Build Status Build status codecov Documentation Status PyPI version

A Python package for graph kernels, graph edit distances and graph pre-image problem.

Requirements

  • python>=3.6
  • numpy>=1.16.2
  • scipy>=1.1.0
  • matplotlib>=3.1.0
  • networkx>=2.2
  • scikit-learn>=0.20.0
  • tabulate>=0.8.2
  • tqdm>=4.26.0
  • control>=0.8.2 (for generalized random walk kernels only)
  • slycot==0.3.3 (for generalized random walk kernels only, which requires a fortran compiler, gfortran for example)

How to use?

Install the library

  • Install stable version from PyPI (may not be up-to-date):
$ pip install graphkit-learn
  • Install latest version from GitHub:
$ git clone https://github.com/jajupmochi/graphkit-learn.git
$ cd graphkit-learn/
$ python setup.py install

Run the test

A series of tests can be run to check if the library works correctly:

$ pip install -U pip pytest codecov coverage pytest-cov
$ pytest -v --cov-config=.coveragerc --cov-report term --cov=gklearn gklearn/tests/

Check examples

A series of demos of using the library can be found on Google Colab and in the example folder.

Other demos

Check notebooks directory for more demos:

  • notebooks directory includes test codes of graph kernels based on linear patterns;
  • notebooks/tests directory includes codes that test some libraries and functions;
  • notebooks/utils directory includes some useful tools, such as a Gram matrix checker and a function to get properties of datasets;
  • notebooks/else directory includes other codes that we used for experiments.

Documentation

The docs of the library can be found here.

Main contents

1 List of graph kernels

A demo of computing graph kernels can be found on Google Colab and in the examples folder.

2 Graph Edit Distances

3 Graph preimage methods

A demo of generating graph preimages can be found on Google Colab and in the examples folder.

4 Interface to GEDLIB

GEDLIB is an easily extensible C++ library for (suboptimally) computing the graph edit distance between attributed graphs. A Python interface for GEDLIB is integrated in this library, based on gedlibpy library.

5 Computation optimization methods

  • Python’s multiprocessing.Pool module is applied to perform parallelization on the computations of all kernels as well as the model selection.
  • The Fast Computation of Shortest Path Kernel (FCSP) method [8] is implemented in the random walk kernel, the shortest path kernel, as well as the structural shortest path kernel where FCSP is applied on both vertex and edge kernels.
  • The trie data structure [9] is employed in the path kernel up to length h to store paths in graphs.

Issues

  • This library uses multiprocessing.Pool.imap_unordered function to do the parallelization, which may not be able to run correctly under Windows system. For now, Windows users may need to comment the parallel codes and uncomment the codes below them which run serially. We will consider adding a parameter to control serial or parallel computations as needed.

  • Some modules (such as Numpy, Scipy, sklearn) apply OpenBLAS to perform parallel computation by default, which causes conflicts with other parallelization modules such as multiprossing.Pool, highly increasing the computing time. By setting its thread to 1, OpenBLAS is forced to use a single thread/CPU, thus avoids the conflicts. For now, this procedure has to be done manually. Under Linux, type this command in terminal before running the code:

$ export OPENBLAS_NUM_THREADS=1

Or add export OPENBLAS_NUM_THREADS=1 at the end of your ~/.bashrc file, then run

$ source ~/.bashrc

to make this effective permanently.

Results

Check this paper for detailed description of graph kernels and experimental results:

Linlin Jia, Benoit Gaüzère, and Paul Honeine. Graph Kernels Based on Linear Patterns: Theoretical and Experimental Comparisons. working paper or preprint, March 2019. URL https://hal-normandie-univ.archives-ouvertes.fr/hal-02053946.

A comparison of performances of graph kernels on benchmark datasets can be found here.

How to contribute

Fork the library and open a pull request! Make your own contribute to the community!

Authors

Citation

Still waiting...

Acknowledgments

This research was supported by CSC (China Scholarship Council) and the French national research agency (ANR) under the grant APi (ANR-18-CE23-0014). The authors would like to thank the CRIANN (Le Centre Régional Informatique et d’Applications Numériques de Normandie) for providing computational resources.

References

[1] Thomas Gärtner, Peter Flach, and Stefan Wrobel. On graph kernels: Hardness results and efficient alternatives. Learning Theory and Kernel Machines, pages 129–143, 2003.

[2] H. Kashima, K. Tsuda, and A. Inokuchi. Marginalized kernels between labeled graphs. In Proceedings of the 20th International Conference on Machine Learning, Washington, DC, United States, 2003.

[3] Vishwanathan, S.V.N., Schraudolph, N.N., Kondor, R., Borgwardt, K.M., 2010. Graph kernels. Journal of Machine Learning Research 11, 1201–1242.

[4] K. M. Borgwardt and H.-P. Kriegel. Shortest-path kernels on graphs. In Proceedings of the International Conference on Data Mining, pages 74-81, 2005.

[5] Liva Ralaivola, Sanjay J Swamidass, Hiroto Saigo, and Pierre Baldi. Graph kernels for chemical informatics. Neural networks, 18(8):1093–1110, 2005.

[6] Suard F, Rakotomamonjy A, Bensrhair A. Kernel on Bag of Paths For Measuring Similarity of Shapes. InESANN 2007 Apr 25 (pp. 355-360).

[7] Mahé, P., Ueda, N., Akutsu, T., Perret, J.L., Vert, J.P., 2004. Extensions of marginalized graph kernels, in: Proc. the twenty-first international conference on Machine learning, ACM. p. 70.

[8] Lifan Xu, Wei Wang, M Alvarez, John Cavazos, and Dongping Zhang. Parallelization of shortest path graph kernels on multi-core cpus and gpus. Proceedings of the Programmability Issues for Heterogeneous Multicores (MultiProg), Vienna, Austria, 2014.

[9] Edward Fredkin. Trie memory. Communications of the ACM, 3(9):490–499, 1960.

[10] Gaüzere, B., Brun, L., Villemin, D., 2012. Two new graphs kernels in chemoinformatics. Pattern Recognition Letters 33, 2038–2047.

[11] Shervashidze, N., Schweitzer, P., Leeuwen, E.J.v., Mehlhorn, K., Borgwardt, K.M., 2011. Weisfeiler-lehman graph kernels. Journal of Machine Learning Research 12, 2539–2561.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

graphkit-learn-0.2.0.tar.gz (191.2 kB view details)

Uploaded Source

Built Distribution

graphkit_learn-0.2.0-py3-none-any.whl (272.4 kB view details)

Uploaded Python 3

File details

Details for the file graphkit-learn-0.2.0.tar.gz.

File metadata

  • Download URL: graphkit-learn-0.2.0.tar.gz
  • Upload date:
  • Size: 191.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.19.1 setuptools/39.1.0 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.6.9

File hashes

Hashes for graphkit-learn-0.2.0.tar.gz
Algorithm Hash digest
SHA256 7a20bc4ad5cb520d14312c81e097bd0fab9db58a12e37446a27c6f9f91f0ba07
MD5 13602606d56501c6ba525a83a8ccab82
BLAKE2b-256 215bc0ef78c5ff4d8d64a06264fa3bed1efd819adbc54d95c9db0c73640a39ce

See more details on using hashes here.

File details

Details for the file graphkit_learn-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: graphkit_learn-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 272.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.19.1 setuptools/39.1.0 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.6.9

File hashes

Hashes for graphkit_learn-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 dde5166f1a07bc89ccc1d66383943914352345a48df94e4e0be0121aa70ac126
MD5 d3bcc5834870f58cb64fe08f33b8fffa
BLAKE2b-256 fc3cc8ad5dd9f350cc2520602194d5001abf9f23e209db8fb37dcb3ad4b36b5e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page