Word-of-Mouth cascades Generator
Project description
WoMG: Word of Mouth Generator
WoMG is a Python library for Word-of-Mouth Cascades Generation.
We propose a model for the synthetic generation of information cascades in social media. In our model the information “memes” propagating in the social network are characterized by a probability distribution in a topic space, accompanied by a textual description, i.e., a bag of keywords coherent with the topic distribution. Similarly, every person is described by a vector of interests defined over the same topic space. Information cascades are governed by the topic of the meme, its level of virality, the interests of each person, community pressure, and social influence.
This repository provides a reference implementation of WoMG as described in:
Generating realistic interest-driven information cascades.
Federico Cinus, Francesco Bonchi, André Panisson, Corrado Monti.
WoMG generates synthetic datasets of documents cascades on network. It starts with any (un)directed, (un)weighted graph and a collection of documents and it outputs the propagation DAGs of the docs through the network.
Installation
Install using pip
:
$ pip install womg-core
You can also download or clone the GitHub repository:
$ git clone https://github.com/FedericoCinus/WoMG.git
Quickstart
The WoMG package provides a Python module and a command-line method. To run WoMG-core on a demo mode, execute the following command from Terminal:
$ womgc
It loads 50 documents and their topic distributions located in /womgdata
and it spreads them over the default network (Les Miserables http://konect.uni-koblenz.de/networks/moreno_lesmis).
Options
You can check out the other options available to use with WoMG using:
$ womg --help
Input
[Network] The supported input format is an edgelist (txt extension):
node1_id_int node2_id_int <weight_float, optional>
You can specify the edgelist path using the graph argument:
$ womg --graph /this/is/an/example/path/Graph_Folder/edgelist.txt
If no path is given the default network is Les Miserables network.
Output (default)
-
[Propagations] The output format is:
time; item; node
-
[Items descriptions] :
item; [topic-dim vector]
-
[Topic descriptions] :
(topic_index, linear combination of words)
You can specify the output folder path:
$ womg --output /this/is/an/example/path/Output_Folder
WoMG extended (TBD)
WoMG is an open source reasearch project. More details of the software are reported below:
Input
-
[Network] The supported input format is an edgelist (txt extension):
node1_id_int node2_id_int <weight_float, optional>
The graph is assumed to be undirected and unweighted by default. These options can be changed by setting the appropriate flags. You can specify the edgelist path using the graph argument):
python womg --graph /this/is/an/example/path/Graph_Folder/edgelist.txt
If no path is given the default network is Les Miserables network.
- [Documents] The supported input format for documents collection (corpus) is txt. You have to specify the folder path containing them using the docs_folder argument:
$ womg --docs_folder /this/is/an/example/path/Corpus_Folder
If no documents folder path is given, WoMG will be set to generative mode.
Output
There are outputs for each class (or model)
- [Diffusion] file could be in two formats:
list (default):
time doc activating_node
dict :
{ time: { doc: [activating nodes] } }
-
[Network] files: [info] dict:
{'type': 'Graph', 'numb_nodes': '77', 'numb_edges': '254', 'aver_degree': '6.5974', 'directed': 'False'}
[graph] dict:
{(u, v): [1.3, 0.2, 0.8, ... , 0.91], ...}
Key: link-tuple. Value: weight vector
[interests and influence vectors] dict:
{(node, 'int'): [interest vector], (node, 'inlf'): [influence vector]}
- [Topic] files:
[topic distributions] dict:
{doc: [topic distribution]}
[viralities] dict:
{doc: virality}
One can modify the outputs formats extension with the format argument:
python womg --format pickle
python womg --format txt
and specify the output folder path:
python womg --output /this/is/an/example/path/Output_Folder
Options
topics
number of topics to be considered in the topic distributions of documents and nodes interests; it has to be less than number of dimensions of the nodes' space provided by node2vec
Graph
homophily
H degree of homophily. Node2vec is used as baseline for generating interests vectors of the nodes starting from the given graph. Parameters p and q can achieve different decoded degree of homophily and structural equivalence (see paper). The best mix of them can be achieved only by a deep analysis of the network and a grid searh on the parameters. In order to pursuit generality in the input graph we use three degree of mixing: structural equivalence predominant, deepWalk (p=1, q=1), homophily predominant (which are not the best for representing the graph!). 1-H is the degree of social influence between nodes; which is the percentage of the avg interests vecs norms to be assigned to the influence vectors.
Documents
docs
number of documents TO BE GENERATED by lda, giving this parameter lda will be directly set to generative modevirality
virality of the doc; if virality is high, exponent of the power law is high and threshold for activation is low.
Diffusion
steps
steps of the diffusion simulationactives
percentage of active nodes with respect to the total number of nodes in the intial configuration (before diffusion) for each doc.
Node2Vec
dimensions
Number of dimensions for node2vec. Default 128walk-length
length of walk per source. Default 80num-walks
number of walks per source. Default 10window-size
context size for optimization. Default 10iter
number of epochs in SGDworkers
number of parallel workers. Default 8p
manually set BFS parameter; else: it is set by Hq
manually set DFS parameter; else: it is set by H
Input and Output
-
graph
Input path of the graph edgelist -
weighted
boolean specifying (un)weighted. Default unweighted -
unweighted
-
directed
graph is (un)directed. Default undirected -
undirected
-
docs-folder
Input path of the documents folder -
output
Outputs path -
format
Outputs format -
seed
Seed (int) for random distribution extraction
Citing
@inproceedings{,
author = {},
title = {},
booktitle = {Proceedings},
year = {2019}
}
Miscellaneous
Please feel free ..
Note: This is only a reference implementation analysis and more details are provided by the thesis.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.