Implementation of the HeNHoE-2vec algorithm by Valentini et al. (2021).
Project description
HeNHoE-2vec
A Python implementation of the HeNHoE-2vec algorithm by Valentini et al. for the embedding of networks with heterogeneous nodes and homogeneous edges (HeNHoE).
Note: HeNHoE networks are analogous to multilayer networks: in HeNHoE networks, each node has a distinct node type, and in multilayer networks, each node belongs to a distinct layer. The terms type
and layer
may therefore be regarded synonymous. Throughout the code and for the remainder of this documentation, we will use the terms multilayer network
and layer
as opposed to HeNHoE network
and type
.
Installation
Install the package from PyPI by running the following command:
$ pip install henhoe2vec
Alternatively, clone this repository by running
$ git clone git@github.com:Bertr0/HeNHoE-2vec.git
and then install the package by running pip install .
from the root of the repository.
Usage
This package may be used as a Python script or as a package, allowing its modules to be imported by other Python projects. Both forms of use make it easy to run HeNHoE-2vec on multilayer networks.
As a Package
After installing the package using pip
, its modules may be imported using
import henhoe2vec
The many individual steps of HeNHoE-2vec are accumulated in a single run()
method in the henhoe2vec.henhoe2vec
module. HeNHoE-2vec can be run from start to finish as follows:
import henho2vec as hh2v
hh2v.henhoe2vec.run(input_csv, output_dir)
input_csv
is the path to the multilayer edge list of the network to be embedded (csv file with no index). output_dir
is the path to the output directory where the embedding files will be saved. The run()
method takes a bunch of other optional parameters which can be used to configure HeNHoE-2vec. A comprehensive overview of parameters can be found in the code documentation.
As a Python Script
To run HeNHoE-2vec as a script, clone this repository using
$ git clone git@github.com:Bertr0/HeNHoE-2vec.git
, install the requirements found in requirements.txt
and run the following command from the root of the repository:
$ python3 -m src.henhoe2vec --input <input_path> --output_dir <output_dir_path>
This will generate node embeddings for the nodes of the network specified by the multilayer edge list saved at <input_path>
and saves the embedding files in <output_dir>
.
Run python3 -m src.henhoe2vec --help
from the root of the repository to show an overview of all arguments taken by the script. The following table also shows an overview of all arguments:
Script Arguments
Argument | Type | Description | Default Value |
---|---|---|---|
--input |
str | Path to the multilayer edge list of the network to be embedded (csv file with no index). | - |
--sep |
str | Delimiter of the input csv edge list. | "\t" |
--header |
store_true | Pass this argument if the input csv edge list has a header. | - |
--output_name |
str | Name of the output .csv file (without suffix). | "embeddings" |
--is_directed |
store_true | Pass this argument if the network is directed. | - |
--edges_are_distances |
store_true | Pass this argument if edge weights indicate distance between nodes (opposed to weight/similarity). | - |
--output_dir |
str | Path of the output directory where the embedding files will be saved. | - |
--dimensions |
int | The dimensionality of the embeddings. | 128 |
--walk_length |
int | Length of each random walk. | 20 |
--num_walks |
int | Number of random walks to simulate for each node. | 10 |
--p |
float | Return parameter p from the node2vec algorithm. |
1.0 |
--q |
float | In-out parameter q from the node2vec algorithm. |
0.5 |
--s |
float | Default switching parameter for layer pairs which are not specified in the --s-dict argument. |
1.0 |
--s_dict |
list | Switching parameters for specific layer pairs in a dict-like manner. Pass the names of layer pairs followed by their switching parameters, separated by white spaces. E.g., if the switching parameter from layer1 to layer2 is 0.5 and the switching parameter from layer2 to layer1 is 0.7 , you would pass layer1 layer2 0.5 layer2 layer1 0.7 . Note that layer pairs are directed. For all layer pairs which are not specified here, the default parameter --s is adopted. |
empty list |
--window_size |
int | Context size for the word2vec optimization. | 10 |
--epochs |
int | Number of epochs in SGD. | 1 |
--workers |
int | Number of parallel workers (threads). | 8 |
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for henhoe2vec-1.0.4-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2cd46441216617a96b9fa00036f2d844ed94078e85b7964db239f8e0732c0945 |
|
MD5 | ec1dd28f9a4b86615ebf9ca41def1ca2 |
|
BLAKE2b-256 | 55ddd3b8e96c05c36357807c58fd2f2b727e097a9f65ac4c1aba6ea969bd6524 |