Skip to main content

fastwlk is a Python package that implements a fast version of the Weisfeiler-Lehman kernel.

Project description

https://github.com/pjhartout/fastwlk/actions/workflows/main.yml/badge.svg https://img.shields.io/pypi/v/fastwlk.svg https://codecov.io/gh/pjhartout/fastwlk/branch/main/graph/badge.svg?token=U054MJONED https://img.shields.io/website-up-down-green-red/http/shields.io.svg

What does fastwlk do?

fastwlk is a Python package that implements a fast version of the Weisfeiler-Lehman kernel. It manages to outperform current state-of-the-art implementations on sparse graphs by implementing a number of improvements compared to vanilla implementations:

  1. It parallelizes the execution of Weisfeiler-Lehman hash computations since each graph’s hash can be computed independently prior to computing the kernel.

  2. It parallelizes the computation of similarity of graphs in RKHS by computing batches of the inner products independently.

  3. When comparing graphs, lots of computations are spent processing positions/hashes that do not actually overlap between Weisfeiler-Lehman histograms. As such, we manually loop over the overlapping keys, outperforming numpy dot product-based implementations on collections of sparse graphs.

This implementation works best when graphs have relatively few connections compared to the number of possible connections and are reasonably dissimilar from one another. If you are not sure the graphs you are using are either sparse or dissimilar enough, try to benchmark this package with others out there using this script.

How fast is fastwlk?

Running the benchmark script in examples/benchmark.py shows that for the graphs in data/graphs.pkl, we get an approximately 80% speed improvement over other implementations like grakel. The example dataset contains 2-nn graphs extracted from 100 random proteins from the human proteome from the AlphaFold EBI database.

To see how much faster this implementation is for your use case:

$ git clone git://github.com/pjhartout/fastwlk
$ poetry install
$ poetry run python examples/benchmark.py

You will need to swap out the provided graphs.pkl with a pickled iterable of graphs from the database you are interested in.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fastwlk-0.2.14.tar.gz (9.1 kB view details)

Uploaded Source

Built Distribution

fastwlk-0.2.14-py3-none-any.whl (8.6 kB view details)

Uploaded Python 3

File details

Details for the file fastwlk-0.2.14.tar.gz.

File metadata

  • Download URL: fastwlk-0.2.14.tar.gz
  • Upload date:
  • Size: 9.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.9.12

File hashes

Hashes for fastwlk-0.2.14.tar.gz
Algorithm Hash digest
SHA256 da305d0bf74f80ec54ab5c73844b0274f1a421e5c7caadd3fa83d2d8cb0d0d70
MD5 8396d3e60a7631848138f61943ed6535
BLAKE2b-256 aed7fd2033b4fa3ad2f6b7ae96bc5eb4eeae624414d96ca4ad5978d114b813ed

See more details on using hashes here.

File details

Details for the file fastwlk-0.2.14-py3-none-any.whl.

File metadata

  • Download URL: fastwlk-0.2.14-py3-none-any.whl
  • Upload date:
  • Size: 8.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.9.12

File hashes

Hashes for fastwlk-0.2.14-py3-none-any.whl
Algorithm Hash digest
SHA256 9d0b3bc8268cbbf9decc70d22e6b8e51e503f50be6ccd44e182680114a1b2720
MD5 c000d34607e2ef24ef21dc9c27add5a9
BLAKE2b-256 69839f46b558d5fa484e1338f1f516d1037b9131ac45c193f08d025dbb0c45fa

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page