Skip to main content

Python bindings for the high‑performance Rust graph/network embedding library graphembed

Project description

install with bioconda install with conda

GraphEmbed: Efficient and Robust Network Embedding via High-Order Proximity Preservation or Recursive Sketching

This crate provides an executable and a library for embedding of directed or undirected graphs with positively weighted edges. We engineered and optimized current network embedding algorithms for large-scale network embedding, especially biological network. This crate was developed by Jianshu Zhao and Jean-Pierre Both jpboth. We have a copy here in Github

  • For simple graphs, without data attached to nodes, there are 2 modules nodesketch and atp. A simple executable with a validation option based on link prediction is also provided.

Quick Install

Pre-built binaries on Linux

wget https://gitlab.com/-/project/64961144/uploads/ea72ca007e9e4899e0c830e708f52939/graphembed_Linux_x86-64_v0.1.4.zip
unzip graphembed_Linux_x86-64_v0.1.4.zip
chmod a+x ./graphembed
./graphembed -h

Bioconda on Linux

conda install -c conda-forge -c bioconda graphembed

Homebrew on MacOS

brew tap jianshu93/graphembed
brew update
brew install graphembed

In Python (Please install python first)

pip install graphembed_rs

### or you can build from source (Linux) after installing maturin
git clone https://gitlab.com/Jianshu_Zhao/graphembed
cd graphembed
pip install maturin
### note: for macOS, you need to change the line "features = ["pyo3/extension-module", "intel-mkl-static", "simdeez_f"]" in pyporject.toml to "features = ["pyo3/extension-module","openblas-system","stdsimd"]"
maturin develop --release

#### Prepare some data
wget https://gitlab.com/-/project/64961144/uploads/4e341383d62d86d1dd66e668e91b2c07/BlogCatalog.txt
import os
os.environ["RUST_LOG"] = "graphembed=info"
import graphembed as ge
help(ge)
### HOPE
ge.embed_hope_rank("BlogCatalog.txt", target_rank=128, nbiter=4)

### Sketching
### sketching only
ge.embed_sketching("BlogCatalog.txt", decay=0.3, dim=128, nbiter=5, symetric=True, output="embedding_output")
### validate accuracy
auc_scores = ge.validate_sketching("BlogCatalog.txt",decay=0.3, dim=128, nbiter=3, nbpass=1, skip_frac=0.2,symetric=True, centric=True)
print("Standard AUC per pass:", auc_scores)

Methods

The embedding algorithms used in this crate are based on the following papers

  • nodesketch

NodeSketch : Highly-Efficient Graph Embeddings via Recursive Sketching KDD 2019. see nodesketch
D. Yang,P. Rosso,Bin-Li, P. Cudre-Mauroux.

It is based on multi hop neighbourhood identification via sensitive hashing based on the recent algorithm probminhash. See arxiv or ieee-2022.

The algorithm associates a probability distribution on neighbours of each point depending on edge weights and distance to the point. Then this distribution is hashed to build a (discrete) embedding vector consisting in nodes identifiers.
The distance between embedded vectors is the Jaccard distance so we get a real distance on the embedding space for the symetric embedding.

An extension of the paper is also implemented to get asymetric embedding for directed graph. The similarity is also based on the hash of sets (nodes going to or from) a given node but then the dissimilarity is no more a distance (no symetry and some discrepancy with the triangular inequality).

The orkut graph with 3 millions nodes and 100 millions of edge is embedded in 5' with a 24 core i9 laptop with this algorithm giving an AUC of 0.95.

  • atp

Asymetric Transitivity Preserving Graph Embedding 2016. M. Ou, P Cui, J. Pei, Z. Zhang and W. Zhu. See hope.

The objective is to provide an asymetric graph embedding and get estimate of the precision of the embedding in function of its dimension.

We use the Adamic-Adar matricial representation of the graph. (It must be noted that the ponderation of a node by the number of couples joined by it is called Resource Allocation in the Graph Kernel litterature). The asymetric embedding is obtained from the left and right singular eigenvectors of the Adamic-Adar representation of the graph. Source node are related to left singular vectors and target nodes to the right ones.
The similarity measure is the dot product, so it is not a norm.
The svd is approximated by randomization as described in Halko-Tropp 2011 as implemented in the annembed crate.

Validation

Validation of embeddings is assessed via standard Auc with random deletion of edges. See documentation in the link module and embed binary. We give also a variation based on centric quality assessment as explained at cauc

Some data sets

Small datasets are given in the Data subdirectory (with 7z compression) to run tests.
Larger datasets can be downloaded from the SNAP data collections https://snap.stanford.edu/data

Some small test graphs are provided in a Data subdirectory

Some larger data tests for user to download

These graphs were used in results see below.

Beware of the possible need to convert from Windows to Linux End Of Line, see the dos2unix utility.
Possibly some data can need to be converted from Tsv format to Csv, before being read by the program.

Some results

results for the atp and nodesketch modules

Embedding and link prediction evaluation for the above data sets are given in file resultats.md A more global analysis of the embedding with the nodesketch module is done for the orkut graph in file orkut.md

A preliminary of node centric quality estimation is provided in the validation module (see documentation in validation::link).

Some qualitative comments

  • For the embedding using the randomized svd, increasing the embedding dimension is interesting as far as the corresponding eigenvalues continue to decrease significantly.

  • The munmun_twitter_social graph shows that treating a directed graph as an undirected graph give significantly different results in terms of link prediction AUC.

Generalized Svd

An implementation of Generalized Svd comes as a by-product in module gsvd.

Detailed Installation and Usage

Installation

The crate provides features (with a default configuration), required by the annembed dependency, to specify which version of lapack you want to use or the choice of simd implementation.

  • For example compilation is done by : cargo build --release --features="openblas-system" to use a dynamic link with openblas. The choice of one feature is mandatory to provide required linear algebra library.
  • On Intel the simdeez_f feature can be used. On other cpus the stdsimd feature can be chosen but it requires compiler >= 1.79

Usage

The embed module can be generated with the standard : cargo doc --no-deps --bin embed.

  • The Hope embedding relying on matrices computations limits the size of the graph to some hundred thousands nodes. It is intrinsically asymetric in nature. It nevertheless gives access to the spectrum of Adamic Adar matrix representing the graph and so to the required dimension to get a valid embedding in $R^{n}$.

  • The Sketching embedding is much faster for large graphs but embeds in a space consisting in sequences of node id equipped with the Jaccard distance. It is particularly efficient in low degrees graph.

  • The embed module takes embedding and possibly validation commands (link prediction task) in one directive.
    The general syntax is :

    graphembed file_description [validation_command --validation_arguments] sketching mode --embedding_arguments
    for example:

    For a symetric graph we get:

  • just embedding: graphembed --csv ./Data/Graphs/Orkut/com-orkut.ungraph.txt --symetric sketching --decay 0.2 --dim 200 --nbiter

  • embedding and validation:

      graphembed --csv ./Data/Graphs/Orkut/com-orkut.ungraph.txt  --symetric  validation --nbpass 5 --skip 0.15 sketching --decay 0.2  --dim 200 --nbiter 5
    

For an asymetric graph we get

   graphembed --csv ./Data/Graphs/asymetric.csv  validation --nbpass 5 --skip 0.15 sketching --decay 0.2  --dim 200 --nbiter 5 


More details can be found in docs of the embed module. Use cargo doc --no-dep --bin embed (and cargo doc --no-dep) as usual.
  • Use the environment variable RUST_LOG gives access to some information at various level (debug, info, error) via the log and env_logger crates.

License

Licensed under either of

at your option.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

graphembed_rs-0.1.1-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.8 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ x86-64

graphembed_rs-0.1.1-cp313-cp313-macosx_11_0_arm64.whl (1.6 MB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

graphembed_rs-0.1.1-cp313-cp313-macosx_10_12_x86_64.whl (1.7 MB view details)

Uploaded CPython 3.13macOS 10.12+ x86-64

graphembed_rs-0.1.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.8 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

graphembed_rs-0.1.1-cp312-cp312-macosx_11_0_arm64.whl (1.6 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

graphembed_rs-0.1.1-cp312-cp312-macosx_10_12_x86_64.whl (1.7 MB view details)

Uploaded CPython 3.12macOS 10.12+ x86-64

graphembed_rs-0.1.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.8 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

graphembed_rs-0.1.1-cp311-cp311-macosx_11_0_arm64.whl (1.6 MB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

graphembed_rs-0.1.1-cp311-cp311-macosx_10_12_x86_64.whl (1.7 MB view details)

Uploaded CPython 3.11macOS 10.12+ x86-64

graphembed_rs-0.1.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.8 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

graphembed_rs-0.1.1-cp310-cp310-macosx_11_0_arm64.whl (1.6 MB view details)

Uploaded CPython 3.10macOS 11.0+ ARM64

graphembed_rs-0.1.1-cp310-cp310-macosx_10_12_x86_64.whl (1.7 MB view details)

Uploaded CPython 3.10macOS 10.12+ x86-64

graphembed_rs-0.1.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.8 MB view details)

Uploaded CPython 3.9manylinux: glibc 2.17+ x86-64

graphembed_rs-0.1.1-cp39-cp39-macosx_11_0_arm64.whl (1.6 MB view details)

Uploaded CPython 3.9macOS 11.0+ ARM64

graphembed_rs-0.1.1-cp39-cp39-macosx_10_12_x86_64.whl (1.7 MB view details)

Uploaded CPython 3.9macOS 10.12+ x86-64

graphembed_rs-0.1.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.8 MB view details)

Uploaded CPython 3.8manylinux: glibc 2.17+ x86-64

File details

Details for the file graphembed_rs-0.1.1-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for graphembed_rs-0.1.1-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 5ce03fa5e8689c55c10f863612aa7607ff96988890f8158c950851da6e027025
MD5 e0b3ef0b8399fff15561edf0be684fd8
BLAKE2b-256 85f0b1fcf5f01dfc67d45a2a3e705f0aa2f4109bf82ebb7f04f6e42c5677a4f1

See more details on using hashes here.

File details

Details for the file graphembed_rs-0.1.1-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for graphembed_rs-0.1.1-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 d0b308e4da1e0b334bf3eabe889defb3fdccf4b0d8559144fec84c257fb30e77
MD5 5aea13b45c0deb7c1aca509f646d1a47
BLAKE2b-256 9575623ad2a93ed4e2bd119790456a2ac0452848bafd9171556c0b97a53ca43f

See more details on using hashes here.

File details

Details for the file graphembed_rs-0.1.1-cp313-cp313-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for graphembed_rs-0.1.1-cp313-cp313-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 0907b211de8962de041662b330cfc7d771802e49075f3b63daff537e62ba7d10
MD5 3ad8d62c803df0fa69f4bea895434ac0
BLAKE2b-256 5ac4fe5745a29784f92892a22c626d8510ea467fa9a8be88e141906b91d1731e

See more details on using hashes here.

File details

Details for the file graphembed_rs-0.1.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for graphembed_rs-0.1.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 163c21ea545190947524ee568a4e8f940bdb1e35819e176d70a14fb2f9d05044
MD5 5621a94b8789f6797c2146b12f74a26b
BLAKE2b-256 12547fc681e42e541189a4241981324ef890810e12182cd7de11d737f966d0df

See more details on using hashes here.

File details

Details for the file graphembed_rs-0.1.1-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for graphembed_rs-0.1.1-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 474f04933053d9432afc4da2017490528e4c5417b4eec97abf07f930ef7e30ef
MD5 c54567b42f028476791e5d566a254e2b
BLAKE2b-256 82cd3c906a6aef4cdd007d9f5d8821947a921405ca012359da991eb604e92c3e

See more details on using hashes here.

File details

Details for the file graphembed_rs-0.1.1-cp312-cp312-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for graphembed_rs-0.1.1-cp312-cp312-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 fcb9268321fc4d0a97266a6cba403b9526e6d0576aca7ae612bbf639e3a9996f
MD5 df1a7fc92aae2da544686411cf18f56f
BLAKE2b-256 e5e5286350b3487872af2100454ff8447709295b4e800f411ddbd4e20725aa2d

See more details on using hashes here.

File details

Details for the file graphembed_rs-0.1.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for graphembed_rs-0.1.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 2ff747cce870d06035fafd453748a744987b91767dae15f288c3411c725d1a90
MD5 de8f32a64d3e990ebe51a84302cd4fb3
BLAKE2b-256 758b4af3151a899ff8ab2dd11f7aa07100cf76ea1ed99bf496ddaf97b934324b

See more details on using hashes here.

File details

Details for the file graphembed_rs-0.1.1-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for graphembed_rs-0.1.1-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 e2004fcc0d5ece3469dd3e672b0b7f4810482718e66dd218d84fff52a63eeb5e
MD5 81598d5e8bd1fc1800768017256abfda
BLAKE2b-256 5020b9f80468c994f12dc6a422369e561d93f7c56fdfe455ff19d45be1fc7000

See more details on using hashes here.

File details

Details for the file graphembed_rs-0.1.1-cp311-cp311-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for graphembed_rs-0.1.1-cp311-cp311-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 b753bdcbbee2770d2ac74f75dfb4dcd8696fded125b76e43316e5fbb4ebe18e9
MD5 c6deb681d18d776c91a64f3cd782f4f2
BLAKE2b-256 f5047b5548b59160accb3f32ffec2c87ff2efc12ca9bef5db81177a3d565ac5c

See more details on using hashes here.

File details

Details for the file graphembed_rs-0.1.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for graphembed_rs-0.1.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 2ac5731e068ebb08aa60f7acd047570732e7ae6c7876eb5ff122a380eba9b479
MD5 1e4384a597b4a8a70f8f7619d50144a3
BLAKE2b-256 72ce6cb00edfe8160319a25bbc74fbf515c2cb18180a3983416209cde60fdb67

See more details on using hashes here.

File details

Details for the file graphembed_rs-0.1.1-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for graphembed_rs-0.1.1-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 b42b409b415d53d13f813b0d74a304c152040e31ca805eaa79d564937880c33d
MD5 580fe8e84d9e2ae60545e25588abb2f7
BLAKE2b-256 fbf3676140c6b26b8a836993921995f5ea4e9298344ab47df68bd70c638ca8a3

See more details on using hashes here.

File details

Details for the file graphembed_rs-0.1.1-cp310-cp310-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for graphembed_rs-0.1.1-cp310-cp310-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 a49b316b73ca2ae130de3128943a00d5c0f19f584d5007284bcd5fb3e616c9eb
MD5 1836ecbf776e64d1f4b2c829e3348362
BLAKE2b-256 b36989734070de7f279bf95ea6831574f96b71ea7991ee87b6f0c79625840cec

See more details on using hashes here.

File details

Details for the file graphembed_rs-0.1.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for graphembed_rs-0.1.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 4324ede7f03f1a9c9c1a71cfb2265a9cd01307b611a37f349eea2d6189692b20
MD5 9eb0f0fd0660c2e7d07d0cb809a43008
BLAKE2b-256 fd4256712b652183519ade4beb784ed1507d57db05a9a4db79481fd3fac057d8

See more details on using hashes here.

File details

Details for the file graphembed_rs-0.1.1-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for graphembed_rs-0.1.1-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 5bc616e01af3af22af997d437f4c69c2400e5290145cee37dec702e8ae307b9d
MD5 856fd076cc1436f7e01c0010e1920f8f
BLAKE2b-256 90a2f253e2859abbc423710a0ece2482487fa46f9c1287890f59addd858272c6

See more details on using hashes here.

File details

Details for the file graphembed_rs-0.1.1-cp39-cp39-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for graphembed_rs-0.1.1-cp39-cp39-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 7f9d12baa0a2f30c00eed7fcc080b276d176fe69d25d861688050cb46419b897
MD5 ac2321cb296e224d1b0ec8eeb8a98b5e
BLAKE2b-256 92c7300874183c10257311268c52435a7028085082f52b1ee4966c1214cecdc9

See more details on using hashes here.

File details

Details for the file graphembed_rs-0.1.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for graphembed_rs-0.1.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 71e403ca83bc6997a3b2195228dc509eeb7e84ea22003ebe2d9de8fd237f1f9b
MD5 579205de72b77e5451be5afb98ea9b9a
BLAKE2b-256 b64e6beb0a102c117af00cb8b1445fefc4302dd8f54f1c6bc295eecb0106468c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page