Skip to main content

Python bindings for the high‑performance Rust graph/network embedding library graphembed

Project description

Graphembed

This crate provides ,as a library and an executable,embedding of directed or undirected graphs with positively weighted edges.

  • For simple graphs, without data attached to nodes/labels, there are 2 (rust) modules nodesketch and atp. A simple executable with a validation option based on link prediction is also provided.

  • To complement the embeddings we provide also core decomposition of graphs (see the module structure). We give try to analyze how Orkut communities are preserved through an embedding. (See Notebooks directory).

  • The module gkernel is dedicated to graphs with discrete labels attached to nodes/edges. We use the petgraph crate for graph description. The algorithm is based on an extension of the hashing strategy used in the module nodesketch.
    In the undirected case, this module also computes a global embedding vector for the whole graph. It is still in an early version.

Methods

The embedding algorithms used in this crate are based on the following papers

  • nodesketch

NodeSketch : Highly-Efficient Graph Embeddings via Recursive Sketching KDD 2019. see nodesketch
D. Yang,P. Rosso,Bin-Li, P. Cudre-Mauroux.

It is based on multi hop neighbourhood identification via sensitive hashing based on the recent algorithm probminhash. See arxiv or ieee-2022.

The algorithm associates a probability distribution on neighbours of each point depending on edge weights and distance to the point. Then this distribution is hashed to build a (discrete) embedding vector consisting in nodes identifiers.
The distance between embedded vectors is the Jaccard distance so we get a real distance on the embedding space for the symetric embedding.

An extension of the paper is also implemented to get asymetric embedding for directed graph. The similarity is also based on the hash of sets (nodes going to or from) a given node but then the dissimilarity is no more a distance (no symetry and some discrepancy with the triangular inequality).

The orkut graph with 3 millions nodes and 100 millions of edge is embedded in 5' with a 24 core i9 laptop with this algorithm giving an AUC of 0.95.

  • atp

Asymetric Transitivity Preserving Graph Embedding 2016. M. Ou, P Cui, J. Pei, Z. Zhang and W. Zhu. See hope.

The objective is to provide an asymetric graph embedding and get estimate of the precision of the embedding in function of its dimension.

We use the Adamic-Adar matricial representation of the graph. (It must be noted that the ponderation of a node by the number of couples joined by it is called Resource Allocation in the Graph Kernel litterature). The asymetric embedding is obtained from the left and right singular eigenvectors of the Adamic-Adar representation of the graph. Source node are related to left singular vectors and target nodes to the right ones.
The similarity measure is the dot product, so it is not a norm.
The svd is approximated by randomization as described in Halko-Tropp 2011 as implemented in the annembed crate.

The core decomposition algorithms

  • Density-friendly decomposition

Large Scale decomposition via convex programming 2017
M.Danisch T.H Hubert Chan and M.Sozio

The decomposition of the graph in maximally dense groups of nodes is implemented and used to assess the quality of the embeddings in a structural way. See module validation and the comments on the embedding of the Orkut graph where we can use the community data provided with the graph to analyze the behaviour of embedded edge lengths.

In particular it is shown that :

  • embedding of edges internal to a community are consistently smaller than embedded edges crossing a block frontier.
  • The transition probabilities of edge from one block to another are similar (low kullback divergence) in the original graph and in the embedded graph.

See results in orkut.md and examples directory together with a small Rust notebook in directory Notebooks

Validation

Validation of embeddings is assessed via standard Auc with random deletion of edges. See documentation in the link module and embed binary. We give also a variation based on centric quality assessment as explained at cauc

Some data sets

Without labels

Small datasets are given in the Data subdirectory (with 7z compression) to run tests.
Larger datasets can be downloaded from the SNAP data collections https://snap.stanford.edu/data

Some small test graphs are provided in a Data subdirectory

Some larger data tests for user to download

These graphs were used in results see below.

Beware of the possible need to convert from Windows to Linux End Of Line, see the dos2unix utility.
Possibly some data can need to be converted from Tsv format to Csv, before being read by the program.

Some results

results for the atp and nodesketch modules

Embedding and link prediction evaluation for the above data sets are given in file resultats.md A more global analysis of the embedding with the nodesketch module is done for the orkut graph in file orkut.md

A preliminary of node centric quality estimation is provided in the validation module (see documentation in validation::link).

Some qualitative comments

  • For the embedding using the randomized svd, increasing the embedding dimension is interesting as far as the corresponding eigenvalues continue to decrease significantly.

  • The munmun_twitter_social graph shows that treating a directed graph as an undirected graph give significantly different results in terms of link prediction AUC.

Generalized Svd

An implementation of Generalized Svd comes as a by-product in module gsvd.

Installation and Usage

Installation

The crate provides features (with a default configuration), required by the annembed dependency, to specify which version of lapack you want to use or the choice of simd implementation.

  • For example compilation is done by : cargo build --release --features="openblas-system" to use a dynamic link with openblas. The choice of one feature is mandatory to provide required linear algebra library.
  • On Intel the simdeez_f feature can be used. On other cpus the stdsimd feature can be chosen but it requires compiler >= 1.79

Usage

The embed module can be generated with the standard : cargo doc --no-deps --bin embed.

  • The Hope embedding relying on matrices computations limits the size of the graph to some hundred thousands nodes. It is intrinsically asymetric in nature. It nevertheless gives access to the spectrum of Adamic Adar matrix representing the graph and so to the required dimension to get a valid embedding in $R^{n}$.

  • The Sketching embedding is much faster for large graphs but embeds in a space consisting in sequences of node id equipped with the Jaccard distance. It is particularly efficient in low degrees graph.

  • The embed module takes embedding and possibly validation commands (link prediction task) in one directive.
    The general syntax is :

    embed file_description [validation_command --validation_arguments] sketching mode --embedding_arguments
    for example:

    For a symetric graph we get:

  • just embedding: embed --csv ./Data/Graphs/Orkut/com-orkut.ungraph.txt --symetric sketching --decay 0.2 --dim 200 --nbiter

  • embedding and validation:

      embed --csv ./Data/Graphs/Orkut/com-orkut.ungraph.txt  --symetric  validation --nbpass 5 --skip 0.15 sketching --decay 0.2  --dim 200 --nbiter 5
    

For an asymetric graph we get

   embed --csv ./Data/Graphs/asymetric.csv  validation --nbpass 5 --skip 0.15 sketching --decay 0.2  --dim 200 --nbiter 5 


More details can be found in docs of the embed module. Use cargo doc --no-dep --bin embed (and cargo doc --no-dep) as usual.
  • Use the environment variable RUST_LOG gives access to some information at various level (debug, info, error) via the log and env_logger crates.

License

Licensed under either of

at your option.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

graphembed_rs-0.1.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.9 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ x86-64

graphembed_rs-0.1.0-cp313-cp313-macosx_11_0_arm64.whl (1.7 MB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

graphembed_rs-0.1.0-cp313-cp313-macosx_10_12_x86_64.whl (1.7 MB view details)

Uploaded CPython 3.13macOS 10.12+ x86-64

graphembed_rs-0.1.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.9 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

graphembed_rs-0.1.0-cp312-cp312-macosx_11_0_arm64.whl (1.7 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

graphembed_rs-0.1.0-cp312-cp312-macosx_10_12_x86_64.whl (1.7 MB view details)

Uploaded CPython 3.12macOS 10.12+ x86-64

graphembed_rs-0.1.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.9 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

graphembed_rs-0.1.0-cp311-cp311-macosx_11_0_arm64.whl (1.7 MB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

graphembed_rs-0.1.0-cp311-cp311-macosx_10_12_x86_64.whl (1.7 MB view details)

Uploaded CPython 3.11macOS 10.12+ x86-64

graphembed_rs-0.1.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.9 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

graphembed_rs-0.1.0-cp310-cp310-macosx_11_0_arm64.whl (1.7 MB view details)

Uploaded CPython 3.10macOS 11.0+ ARM64

graphembed_rs-0.1.0-cp310-cp310-macosx_10_12_x86_64.whl (1.7 MB view details)

Uploaded CPython 3.10macOS 10.12+ x86-64

graphembed_rs-0.1.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.9 MB view details)

Uploaded CPython 3.9manylinux: glibc 2.17+ x86-64

graphembed_rs-0.1.0-cp39-cp39-macosx_11_0_arm64.whl (1.7 MB view details)

Uploaded CPython 3.9macOS 11.0+ ARM64

graphembed_rs-0.1.0-cp39-cp39-macosx_10_12_x86_64.whl (1.7 MB view details)

Uploaded CPython 3.9macOS 10.12+ x86-64

graphembed_rs-0.1.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.9 MB view details)

Uploaded CPython 3.8manylinux: glibc 2.17+ x86-64

File details

Details for the file graphembed_rs-0.1.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for graphembed_rs-0.1.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 31f456c264bc77e07328ef9769074324b631d688e714fd11a28899532907b383
MD5 079e8b6379e948189da580d24dbb0891
BLAKE2b-256 5f6a35b68d0724976d4e9af5dd7855a05af7bfd93d0a36a2f9bc42aa3607201f

See more details on using hashes here.

File details

Details for the file graphembed_rs-0.1.0-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for graphembed_rs-0.1.0-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 ea61155725ec412e13bd126329816b7b6a504d74da2621524c31e5006b690c34
MD5 46b06c5e7850fe4961fb50a6bbd9ae10
BLAKE2b-256 c169d76ef2e586d014512999fa6e91e44269b6a31bf77dca25c22c8fac714ca4

See more details on using hashes here.

File details

Details for the file graphembed_rs-0.1.0-cp313-cp313-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for graphembed_rs-0.1.0-cp313-cp313-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 8d8d33b95d2b634ae897ebe7d24100588838a64e02ed84299c5990a60c33cbb0
MD5 3d53f272e73de12a60557e7b3b964718
BLAKE2b-256 85c4ac1e75161dc3abac11ae13f7b2e414b283d5549d0090e95d872057c05d12

See more details on using hashes here.

File details

Details for the file graphembed_rs-0.1.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for graphembed_rs-0.1.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 9e57c19b505a5e706818cbe8e58163fcf65c82c292f77e0f8fd246106dacfe11
MD5 f60e69614305520eb51cbae35b4aff2d
BLAKE2b-256 26499a0130a2d27405935db7629eb90cfac2e9825f9b8de178512180a6f008b7

See more details on using hashes here.

File details

Details for the file graphembed_rs-0.1.0-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for graphembed_rs-0.1.0-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 a05bc4c0c482bcd495aff92916ec95718a77617465127af6ba4a257df97532d4
MD5 56bc6c041a2905b879b8f597dac451ab
BLAKE2b-256 ae76d433ec0b88bf7a2e019c7e66d4b0676d9ca3d6170733acade27df0540082

See more details on using hashes here.

File details

Details for the file graphembed_rs-0.1.0-cp312-cp312-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for graphembed_rs-0.1.0-cp312-cp312-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 ba11e9f3fb502ca32d895cea905fccc6f018b79532c1e4898fc989ded66b64b9
MD5 40f7c88ed6290634004218919dfb98d7
BLAKE2b-256 8d7e2a9ed5c6682b249557c5535fd1204af25d321af99347216c02afaa1b2f03

See more details on using hashes here.

File details

Details for the file graphembed_rs-0.1.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for graphembed_rs-0.1.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 a78cde379f9180827a7631bb6c0c9438fb62ab54000087423348527cde230e10
MD5 9d9303e36f6516b232e9c2ee412c58f8
BLAKE2b-256 d2507ca16a76b4cc3dd0a5b4b4270e813636000824aa579bfa4b33305a949b27

See more details on using hashes here.

File details

Details for the file graphembed_rs-0.1.0-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for graphembed_rs-0.1.0-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 e202861db7725145dd8c2d9894858ecddbbc5925ad28286044194117d9e21610
MD5 191045edefa41b7c7c7d70f82a25ecd9
BLAKE2b-256 e233c07c5333f4c349b7c2e7e60430d33c2d34a84790964d595714d22602be87

See more details on using hashes here.

File details

Details for the file graphembed_rs-0.1.0-cp311-cp311-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for graphembed_rs-0.1.0-cp311-cp311-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 dc9d004f1b4b481be1b6958ae9d4cc936a6db48051679b9e4d9be9d27b46882a
MD5 57fc83cb9479f6e6697d1477d634eb71
BLAKE2b-256 beeb2eed3b684ecac3fc84e2313f2c10f6361a16cd724e19dd8ed4f20afc5f02

See more details on using hashes here.

File details

Details for the file graphembed_rs-0.1.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for graphembed_rs-0.1.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 578a35b1110524b1cc4fed8bcf8bea978f52ec2298a956e189f87e314c5d7dc1
MD5 ff21913f2c5a15cc4e2b492199eb532a
BLAKE2b-256 b602b6e8f9ecc9a8811adb9d7e8836f1692efbabe19db54ae67ffc102b2d7775

See more details on using hashes here.

File details

Details for the file graphembed_rs-0.1.0-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for graphembed_rs-0.1.0-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 a367ba58dc8d9b237c2e7d75f44798eaa24054b7d009729c5da87344314aa975
MD5 deaedb1eba4b4de4a261fcad14f64e5a
BLAKE2b-256 8843b64da142dc81b56373e69fa505e598e78868c0b05b3f2868002d4c4b8152

See more details on using hashes here.

File details

Details for the file graphembed_rs-0.1.0-cp310-cp310-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for graphembed_rs-0.1.0-cp310-cp310-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 54c6b7d3101c4775ce0262432413ddaa5fa20383028da07a4ade034950dada6f
MD5 851d1ed4ab03f198b6e9b7455a401357
BLAKE2b-256 61a56a42fcd1730fea3add8774f610fe7f988e1255b4ec8cc30176d82ed6cf4a

See more details on using hashes here.

File details

Details for the file graphembed_rs-0.1.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for graphembed_rs-0.1.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 ac028d31a9f33761aa86f98ea678cafe8f0ca22c8fafde4f07eabf3e07286273
MD5 852e66fc9344616f02e247da2341bd14
BLAKE2b-256 7a638a7686f3b3d0c174ff24b6a2e36c25d3120ec14ca07e1099b3f30f499b47

See more details on using hashes here.

File details

Details for the file graphembed_rs-0.1.0-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for graphembed_rs-0.1.0-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 62b8318ac2640c1a50390eacdf9a975c6e7052691c5e8a098ef7329d67230706
MD5 07a80cb6e086e88afc525759a42029db
BLAKE2b-256 da3e86e83ed4f33f93363a6999944755e67143d3a775a9e10f615b628d86dcd6

See more details on using hashes here.

File details

Details for the file graphembed_rs-0.1.0-cp39-cp39-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for graphembed_rs-0.1.0-cp39-cp39-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 8bf1d47b999c4879ff7dee0b7d65bce697c3d8ad57a5bbd6f2a4258bf1d72cfc
MD5 ea2f097f1f9ed45a8cd743e1998ce181
BLAKE2b-256 66a48e67a6d21de668b21169eeebdd986e67dfbebbab129610128092c6d8af78

See more details on using hashes here.

File details

Details for the file graphembed_rs-0.1.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for graphembed_rs-0.1.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 51167f2aac0cb870e0ac883be8c333bc36e33765b80354f6eef9a1265b3fa8de
MD5 552efe0cbfddcc907f44be76152adce5
BLAKE2b-256 b719a52e7e95ea3998232e0a2375df5277e7b3600c86d6fee424cac59ffeaf50

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page