Skip to main content

DeepWalk online learning of social representations.

Project description

DeepWalk uses short random walks to learn representations for vertices in graphs.

Usage

Example Usage

$deepwalk --input example_graphs/karate.adjlist --output karate.embeddings

–input: input_filename

  1. --format adjlist for an adjacency list, e.g:

    1 2 3 4 5 6 7 8 9 11 12 13 14 18 20 22 32
    2 1 3 4 8 14 18 20 22 31
    3 1 2 4 8 9 10 14 28 29 33
    ...
  2. --format edgelist for an edge list, e.g:

    1 2
    1 3
    1 4
    ...
  3. --format mat for a Matlab .mat file containing an adjacency matrix

    (note, you must also specify the variable name of the adjacency matrix --matfile-variable-name)

–output: output_filename

The output representations in skipgram format - first line is header, all other lines are node-id and d dimensional representation:

34 64
1 0.016579 -0.033659 0.342167 -0.046998 ...
2 -0.007003 0.265891 -0.351422 0.043923 ...
...
Full Command List

The full list of command line options is available with $deepwalk --help

Evaluation

Here, we will show how to evaluate DeepWalk on the BlogCatalog dataset used in the DeepWalk paper. First, we run the following command to produce its DeepWalk embeddings:

deepwalk --format mat --input example_graphs/blogcatalog.mat
--max-memory-data-size 0 --number-walks 80 --representation-size 128 --walk-length 40 --window-size 10
--workers 1 --output example_graphs/blogcatalog.embeddings

The parameters specified here are the same as in the paper. If you are using a multi-core machine, try to set --workers to a larger number for faster training. On a single machine with 24 Xeon E5-2620 @ 2.00GHz CPUs, this command takes about 20 minutes to finish (--workers is set to 20). Then, we evaluate the learned embeddings on a multi-label node classification task with example_graphs/scoring.py:

python example_graphs/scoring.py --emb example_graphs/blogcatalog.embeddings
--network example_graphs/blogcatalog.mat
--num-shuffle 10 --all

This command finishes in 8 minutes on the same machine. For faster evaluation, you can set --num-shuffle to a smaller number, but expect more fluctuation in performance. The micro F1 and macro F1 scores we get with different ratio of labeled nodes are as follows:

% Labeled Nodes

10%

20%

30%

40%

50%

60%

70%

80%

90%

Micro-F1 (%)

35.86

38.51

39.96

40.76

41.51

41.85

42.27

42.35

42.40

Macro-F1 (%)

21.08

23.98

25.71

26.73

27.68

28.28

28.88

28.70

28.21

Note that the current version of DeepWalk is based on a newer version of gensim, which may have a different implementation of the word2vec model. To completely reproduce the results in our paper, you will probably have to install an older version of gensim(version 0.10.2).

Requirements

  • numpy

  • scipy

(may have to be independently installed)

Installation

  1. cd deepwalk

  2. pip install -r requirements.txt

  3. python setup.py install

Citing

If you find DeepWalk useful in your research, we ask that you cite the following paper:

@inproceedings{Perozzi:2014:DOL:2623330.2623732,
 author = {Perozzi, Bryan and Al-Rfou, Rami and Skiena, Steven},
 title = {DeepWalk: Online Learning of Social Representations},
 booktitle = {Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining},
 series = {KDD '14},
 year = {2014},
 isbn = {978-1-4503-2956-9},
 location = {New York, New York, USA},
 pages = {701--710},
 numpages = {10},
 url = {http://doi.acm.org/10.1145/2623330.2623732},
 doi = {10.1145/2623330.2623732},
 acmid = {2623732},
 publisher = {ACM},
 address = {New York, NY, USA},
 keywords = {deep learning, latent representations, learning with partial labels, network classification, online learning, social networks},
}

Misc

DeepWalk - Online learning of social representations.

https://badge.fury.io/py/deepwalk.png https://travis-ci.org/phanein/deepwalk.png?branch=master https://pypip.in/d/deepwalk/badge.png

History

1.0.3 (2018-03-23)

  • Now compatible with the latest version of gensim and sklearn

  • Better support for Python 3

1.0.2 (2014-09-19)

  • Fixed gensim at 0.10.2 for now

1.0.1 (2014-09-19)

  • Added utilities to support generated embeddings for larger graphs

  • Support for additional input file formats

1.0.0 (2014-08-24)

  • First release on PyPI.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

deepwalk-1.0.3.tar.gz (32.0 kB view details)

Uploaded Source

Built Distribution

deepwalk-1.0.3-py2.py3-none-any.whl (10.7 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file deepwalk-1.0.3.tar.gz.

File metadata

  • Download URL: deepwalk-1.0.3.tar.gz
  • Upload date:
  • Size: 32.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for deepwalk-1.0.3.tar.gz
Algorithm Hash digest
SHA256 507a8fc85363fb14a2838eb2304b8a04a08ac0d8ff57611fbea22db671a44674
MD5 6c167f4dc1a8a6abce988a91adb431ff
BLAKE2b-256 aa30bbbf62ca65e9c427b91eb685e3ce37490324969f8a4141971f5a12a3e2bf

See more details on using hashes here.

File details

Details for the file deepwalk-1.0.3-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for deepwalk-1.0.3-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 fb15c59c94fb35467071f45700ae33e6fecb11750776c53b4f2d38fffb33a57a
MD5 52440ef0ac822475882821ff6aad73fa
BLAKE2b-256 178497fcfdea22ebe0f61e1a6740daae98d47cc1ba49fffa2b117c550b1a76ee

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page