Skip to main content

Python bindings for the high‑performance Rust graph/network embedding library graphembed

Project description

install with bioconda install with conda

GraphEmbed: Efficient and Robust Network Embedding via High-Order Proximity Preservation or Recursive Sketching

This crate provides an executable and a library for embedding of directed or undirected graphs with positively weighted edges. We engineered and optimized current network embedding algorithms for large-scale network embedding, especially biological network. This crate was developed by Jianshu Zhao and Jean-Pierre Both jpboth. We have a copy here in Github

  • For simple graphs, without data attached to nodes, there are 2 modules nodesketch and atp. A simple executable with a validation option based on link prediction is also provided.

Quick Install

Pre-built binaries on Linux

wget https://gitlab.com/-/project/64961144/uploads/9d7d0b038140cb67c584f01cd6dafac9/graphembed_Linux_x86-64_v0.1.6.zip
unzip graphembed_Linux_x86-64_v0.1.6.zip
chmod a+x ./graphembed
./graphembed -h

Bioconda on Linux/MacOS

conda install -c conda-forge -c bioconda graphembed

Homebrew on MacOS

brew tap jianshu93/graphembed
brew update
brew install graphembed

In Python (Please install python>=3.9 first)

pip install graphembed_rs

### or you can build from source (Linux) after installing maturin
git clone https://gitlab.com/Jianshu_Zhao/graphembed
cd graphembed
pip install maturin
### note: for macOS, you need to change the line "features = ["pyo3/extension-module", "intel-mkl-static", "simdeez_f"]" in pyporject.toml to "features = ["pyo3/extension-module","openblas-system","stdsimd"]", you also need to Install OpenBLAS and add to system library path via Homebrew
maturin develop --release

#### Prepare some data
wget https://gitlab.com/-/project/64961144/uploads/4e341383d62d86d1dd66e668e91b2c07/BlogCatalog.txt
import os
os.environ["RUST_LOG"] = "info"
import graphembed_rs.graphembed_rs as ge
import graphembed_rs.load_utils as ge_utils
help(ge)
help(ge_utils)
### HOPE
ge.embed_hope_rank("BlogCatalog.txt", target_rank=128, nbiter=4,output="embedding_output")
out_vectors=ge_utils.load_embedding_bson("embedding_output.bson")
print("OUT embedding shape :", out_vectors.shape)
print("first OUT vector    :", out_vectors[0])

### Sketching
### sketching only
ge.embed_sketching("BlogCatalog.txt", decay=0.3, dim=128, nbiter=5, symetric=True, output="embedding_output")
out_vectors=ge_utils.load_embedding_bson("embedding_output.bson")
print("OUT embedding shape :", out_vectors.shape)
print("first OUT vector    :", out_vectors[0])


### validate accuracy
auc_scores = ge.validate_sketching("BlogCatalog.txt",decay=0.3, dim=128, nbiter=3, nbpass=1, skip_frac=0.2,symetric=True, centric=True)
print("Standard AUC per pass:", auc_scores)

Methods

The embedding algorithms used in this crate are based on the following papers

  • nodesketch

NodeSketch : Highly-Efficient Graph Embeddings via Recursive Sketching KDD 2019. see nodesketch
D. Yang,P. Rosso,Bin-Li, P. Cudre-Mauroux.

It is based on multi hop neighbourhood identification via sensitive hashing based on the recent algorithm probminhash. See arxiv or ieee-2022.

The algorithm associates a probability distribution on neighbours of each point depending on edge weights and distance to the point. Then this distribution is hashed to build a (discrete) embedding vector consisting in nodes identifiers.
The distance between embedded vectors is the Jaccard distance so we get a real distance on the embedding space for the symetric embedding.

An extension of the paper is also implemented to get asymetric embedding for directed graph. The similarity is also based on the hash of sets (nodes going to or from) a given node but then the dissimilarity is no more a distance (no symetry and some discrepancy with the triangular inequality).

The orkut graph with 3 millions nodes and 100 millions of edge is embedded in 5' with a 24 core i9 laptop with this algorithm giving an AUC of 0.95.

  • atp

Asymetric Transitivity Preserving Graph Embedding 2016. M. Ou, P Cui, J. Pei, Z. Zhang and W. Zhu. See hope.

The objective is to provide an asymetric graph embedding and get estimate of the precision of the embedding in function of its dimension.

We use the Adamic-Adar matricial representation of the graph. (It must be noted that the ponderation of a node by the number of couples joined by it is called Resource Allocation in the Graph Kernel litterature). The asymetric embedding is obtained from the left and right singular eigenvectors of the Adamic-Adar representation of the graph. Source node are related to left singular vectors and target nodes to the right ones.
The similarity measure is the dot product, so it is not a norm.
The svd is approximated by randomization as described in Halko-Tropp 2011 as implemented in the annembed crate.

Validation

Validation of embeddings is assessed via standard Auc with random deletion of edges. See documentation in the link module and embed binary. We give also a variation based on centric quality assessment as explained at cauc

Some data sets

Small datasets are given in the Data subdirectory (with 7z compression) to run tests.
Larger datasets can be downloaded from the SNAP data collections https://snap.stanford.edu/data

Some small test graphs are provided in a Data subdirectory

Some larger data tests for user to download

These graphs were used in results see below.

Beware of the possible need to convert from Windows to Linux End Of Line, see the dos2unix utility.
Possibly some data can need to be converted from Tsv format to Csv, before being read by the program.

Some results

results for the atp and nodesketch modules

Embedding and link prediction evaluation for the above data sets are given in file resultats.md A more global analysis of the embedding with the nodesketch module is done for the orkut graph in file orkut.md

A preliminary of node centric quality estimation is provided in the validation module (see documentation in validation::link).

Some qualitative comments

  • For the embedding using the randomized svd, increasing the embedding dimension is interesting as far as the corresponding eigenvalues continue to decrease significantly.

  • The munmun_twitter_social graph shows that treating a directed graph as an undirected graph give significantly different results in terms of link prediction AUC.

Generalized Svd

An implementation of Generalized Svd comes as a by-product in module gsvd.

Detailed Installation and Usage

Installation

The crate provides features (with a default configuration), required by the annembed dependency, to specify which version of lapack you want to use or the choice of simd implementation.

  • For example compilation is done by : cargo build --release --features="openblas-system" to use a dynamic link with openblas. The choice of one feature is mandatory to provide required linear algebra library.
  • On Intel the simdeez_f feature can be used. On other cpus the stdsimd feature can be chosen but it requires compiler >= 1.79

Usage

The embed module can be generated with the standard : cargo doc --no-deps --bin embed.

  • The Hope embedding relying on matrices computations limits the size of the graph to some hundred thousands nodes. It is intrinsically asymetric in nature. It nevertheless gives access to the spectrum of Adamic Adar matrix representing the graph and so to the required dimension to get a valid embedding in $R^{n}$.

  • The Sketching embedding is much faster for large graphs but embeds in a space consisting in sequences of node id equipped with the Jaccard distance. It is particularly efficient in low degrees graph.

  • The embed module takes embedding and possibly validation commands (link prediction task) in one directive.
    The general syntax is :

    graphembed file_description [validation_command --validation_arguments] sketching mode --embedding_arguments
    for example:

    For a symetric graph we get:

  • just embedding: graphembed --csv ./Data/Graphs/Orkut/com-orkut.ungraph.txt --symetric sketching --decay 0.2 --dim 200 --nbiter

  • embedding and validation:

      graphembed --csv ./Data/Graphs/Orkut/com-orkut.ungraph.txt  --symetric  validation --nbpass 5 --skip 0.15 sketching --decay 0.2  --dim 200 --nbiter 5
    

For an asymetric graph we get

   graphembed --csv ./Data/Graphs/asymetric.csv  validation --nbpass 5 --skip 0.15 sketching --decay 0.2  --dim 200 --nbiter 5 


More details can be found in docs of the embed module. Use cargo doc --no-dep --bin embed (and cargo doc --no-dep) as usual.
  • Use the environment variable RUST_LOG gives access to some information at various level (debug, info, error) via the log and env_logger crates.

License

Licensed under either of

at your option.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

graphembed_rs-0.1.2-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.8 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ x86-64

graphembed_rs-0.1.2-cp313-cp313-macosx_11_0_arm64.whl (1.7 MB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

graphembed_rs-0.1.2-cp313-cp313-macosx_10_12_x86_64.whl (1.7 MB view details)

Uploaded CPython 3.13macOS 10.12+ x86-64

graphembed_rs-0.1.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.8 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

graphembed_rs-0.1.2-cp312-cp312-macosx_11_0_arm64.whl (1.7 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

graphembed_rs-0.1.2-cp312-cp312-macosx_10_12_x86_64.whl (1.7 MB view details)

Uploaded CPython 3.12macOS 10.12+ x86-64

graphembed_rs-0.1.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.8 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

graphembed_rs-0.1.2-cp311-cp311-macosx_11_0_arm64.whl (1.7 MB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

graphembed_rs-0.1.2-cp311-cp311-macosx_10_12_x86_64.whl (1.7 MB view details)

Uploaded CPython 3.11macOS 10.12+ x86-64

graphembed_rs-0.1.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.8 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

graphembed_rs-0.1.2-cp310-cp310-macosx_11_0_arm64.whl (1.7 MB view details)

Uploaded CPython 3.10macOS 11.0+ ARM64

graphembed_rs-0.1.2-cp310-cp310-macosx_10_12_x86_64.whl (1.7 MB view details)

Uploaded CPython 3.10macOS 10.12+ x86-64

graphembed_rs-0.1.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.8 MB view details)

Uploaded CPython 3.9manylinux: glibc 2.17+ x86-64

graphembed_rs-0.1.2-cp39-cp39-macosx_11_0_arm64.whl (1.7 MB view details)

Uploaded CPython 3.9macOS 11.0+ ARM64

graphembed_rs-0.1.2-cp39-cp39-macosx_10_12_x86_64.whl (1.7 MB view details)

Uploaded CPython 3.9macOS 10.12+ x86-64

graphembed_rs-0.1.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.8 MB view details)

Uploaded CPython 3.8manylinux: glibc 2.17+ x86-64

graphembed_rs-0.1.2-cp38-cp38-macosx_11_0_arm64.whl (1.7 MB view details)

Uploaded CPython 3.8macOS 11.0+ ARM64

graphembed_rs-0.1.2-cp38-cp38-macosx_10_12_x86_64.whl (1.7 MB view details)

Uploaded CPython 3.8macOS 10.12+ x86-64

File details

Details for the file graphembed_rs-0.1.2-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for graphembed_rs-0.1.2-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 1efc295e36fc6e934fdf130376e1bdea256e71330360c5c43a87efaa0bfe69ea
MD5 1de8b709ad3fb1de17958849790fc26f
BLAKE2b-256 ac233bdb6fb90c1ea4ab6d2fcfce94e34eb1d76d19f2ac577cc5fab9463cbb12

See more details on using hashes here.

File details

Details for the file graphembed_rs-0.1.2-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for graphembed_rs-0.1.2-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 6c3cea1012760e6cfe9c391235225e1eb6a8308491aaa057098dbaa46bd1da6d
MD5 b68c9c990842fbf45458585ef9b89268
BLAKE2b-256 cdcbb137714a55af7af2ba0fdb909d3fdb3be2253e6c52c948e542e74a54f1f4

See more details on using hashes here.

File details

Details for the file graphembed_rs-0.1.2-cp313-cp313-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for graphembed_rs-0.1.2-cp313-cp313-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 e9571e300cdf6fa1b9ba40fcd513270b67fc4ce0172b73f7441cf5cd470bca88
MD5 8178e6433d56da67e6d77c87db7126e9
BLAKE2b-256 6e9328c7be591087ba168a1c4d2150c1410fa1d9f6cca2ae3a487234b1501af3

See more details on using hashes here.

File details

Details for the file graphembed_rs-0.1.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for graphembed_rs-0.1.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 9a4d715028e978adfb41ba3614121fb1f9d8d619e6b75a40027b8b450ac346fe
MD5 121f3eb4f645562e7ea9050d50b14201
BLAKE2b-256 d4af31788a8b1e2ce1ec01e5528b42420755c5ac50b3ce15dc292c9088c06215

See more details on using hashes here.

File details

Details for the file graphembed_rs-0.1.2-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for graphembed_rs-0.1.2-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 51ce47a98328398c0b55651cd9769bdaee05d13999137e30c0dbd5867f08ce43
MD5 8003356e6de49f4f2d181333ebff2c13
BLAKE2b-256 53c73095ea508fc792b357c1cf0c4555e186e86c137e3262a973616969f80ef3

See more details on using hashes here.

File details

Details for the file graphembed_rs-0.1.2-cp312-cp312-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for graphembed_rs-0.1.2-cp312-cp312-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 cd16b47c9a63dcae1e02d1f2f4eeccb8fbb1bb23134aa87245fc636e65ebc9d7
MD5 e950e532c82762f444ec769a55f79e17
BLAKE2b-256 8c97735f44ac274695301c9184a16fa9f727b58f8d43acfb0c391301f9bbbc5c

See more details on using hashes here.

File details

Details for the file graphembed_rs-0.1.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for graphembed_rs-0.1.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 d091a2352e2c4523e3579a9de2d57874460e8382eb8ef756a36bd1dea1fe181a
MD5 6bb8c8caee4a46ce074f298108a6d5f3
BLAKE2b-256 936b19a64481eb7fc1144d3e53fb4b139d9892a20b3c24fce7789e686522ea4e

See more details on using hashes here.

File details

Details for the file graphembed_rs-0.1.2-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for graphembed_rs-0.1.2-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 ae3659fc68369f15120f292d7d6b30ffa2c4b7cd2552d71b7b5fc14efe720685
MD5 c246a86d4033fb2af31b67d10a7629a7
BLAKE2b-256 5adf7ba9af6fe777f77552d511961a74a96cdfaad71f0b9f41f318d1fef2bacd

See more details on using hashes here.

File details

Details for the file graphembed_rs-0.1.2-cp311-cp311-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for graphembed_rs-0.1.2-cp311-cp311-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 3448f660d3b2449aa0505f9c2f0345c66829c07f520dc397c1ba5337a01077e9
MD5 01e33f4b65b528cd4db6f38fedfd645b
BLAKE2b-256 bdcb0539555bff5bcc5347f9fc4f0bc569c774372d142cdc0409234cc4a11d8e

See more details on using hashes here.

File details

Details for the file graphembed_rs-0.1.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for graphembed_rs-0.1.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 8dc1737b042be5153157560531046286f3d86530306e4026c875b9c6b63386f7
MD5 704a197317dff229a36144cac31f59e0
BLAKE2b-256 372db1897a3369b1eca35c9674f8f24fa9bb96993af93a34eb13e0bc6bc167ca

See more details on using hashes here.

File details

Details for the file graphembed_rs-0.1.2-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for graphembed_rs-0.1.2-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 10720411774717508aa0e7124159fecb047899171c63a891e350f56f33b1f44d
MD5 3992dae967caf44b49eef650f61c321d
BLAKE2b-256 62c68f25e3b8943eeb6a531e66b73d22ef38099f647bf8cde2cfdc7c58355e0c

See more details on using hashes here.

File details

Details for the file graphembed_rs-0.1.2-cp310-cp310-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for graphembed_rs-0.1.2-cp310-cp310-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 8e73d16f846d4704abe5f0486697d3dfd8a4a6cdaf5dece4c0fce2c1ab8fef1c
MD5 f0962e9e42e6ac04dc70d5da8eaf630c
BLAKE2b-256 f0bdda59352e0bcd9eb333d858177cc289716fc39a0c064dc5983ebe0cdd0bb2

See more details on using hashes here.

File details

Details for the file graphembed_rs-0.1.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for graphembed_rs-0.1.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 b25076b4f27bf9a9d1bf46814a06ae7180417020edbacf4f72cd8953078097f5
MD5 9b4b6c53ab520740eff72d4f718bf216
BLAKE2b-256 9865f443f2486caee3cd07b8c598fe8f498d8672f89eb1a8273c92528124f954

See more details on using hashes here.

File details

Details for the file graphembed_rs-0.1.2-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for graphembed_rs-0.1.2-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 587a5ad70f57cb9554a89af3df29529ab673e462c9c79cab0073789f0883f8fc
MD5 00e4f79929a075448e54ad32ec225d4b
BLAKE2b-256 eb519c2dde653f7c813dc0c08219a3bbc492493da035f4fdcd74edaa7f2ec2a9

See more details on using hashes here.

File details

Details for the file graphembed_rs-0.1.2-cp39-cp39-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for graphembed_rs-0.1.2-cp39-cp39-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 654278ba79ea0d78a3d5635e46a563ee50f09f131cf35cb9166c8072e08cf476
MD5 843df14a27a339afa8f9b8fe91d1b9e3
BLAKE2b-256 3c5337c7d0f15b7f4f9f440e1afff30092a07939d91d9cd373f96506ac27c7df

See more details on using hashes here.

File details

Details for the file graphembed_rs-0.1.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for graphembed_rs-0.1.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 2ead93bd52468e07c9e81358eed38255995ed36609fe9af763a0ae4c45375163
MD5 f82942d980c469a448f746248337c6d3
BLAKE2b-256 3ff0e1187c1a320e83c8e298ed956f909f27a66dd8b64abc54d15a1ff5e2de52

See more details on using hashes here.

File details

Details for the file graphembed_rs-0.1.2-cp38-cp38-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for graphembed_rs-0.1.2-cp38-cp38-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 2bf01cb68e80e400587b138dcf7f8946bb4de1a5265e07851ce73b04e65eeae9
MD5 5049d3849e212bd581a733ba75503227
BLAKE2b-256 1765a4b37be22ba961533a80acac4e508194d08e2ca6dcf88e05c76b692627d2

See more details on using hashes here.

File details

Details for the file graphembed_rs-0.1.2-cp38-cp38-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for graphembed_rs-0.1.2-cp38-cp38-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 a91c24e4f6b272599f5d5b0693ede4f37a9c631fde4c6a984f2e19e23059fbe4
MD5 8767b400541705ac07deea50acb7ecc5
BLAKE2b-256 c0338d23c7b52c2133379a17699606ed88c382967baa054ec76474caba9541ca

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page