Build PubMed temporal graph dataset using data from the PubMed API.
Project description
PubMed-Temporal: A dynamic graph dataset with node-level features
Code to build and reproduce the temporal split for the PubMed/Planetoid graph dataset.
If you use this dataset in your research, please consider citing the paper that introduced it:
Passos, N.A.R.A., Carlini, E., Trani, S. (2024). Deep Community Detection in Attributed Temporal Graphs: Experimental Evaluation of Current Approaches. In Proceedings of the 3rd Graph Neural Networking Workshop 2024 (GNNet '24). Association for Computing Machinery, New York, NY, USA, 1–6.
Description
| Graph | Split | Nodes | Edges | Class 0 | Class 1 | Class 2 | Time steps | Interval (Years) |
|---|---|---|---|---|---|---|---|---|
| Full | None | 19717 | 44324 | 4103 | 7739 | 7875 | 42 | 1967 - 2010 |
| Transductive | Train | 11664 | 24645 | 2964 | 3508 | 5192 | 38 | 1967 - 2006 |
| Transductive | Validation | 3697 | 4535 | 524 | 1803 | 1370 | 1 | 2007 - 2007 |
| Transductive | Test | 9810 | 15144 | 1372 | 4795 | 3643 | 3 | 2008 - 2010 |
| Inductive | Train | 11664 | 24645 | 2964 | 3508 | 5192 | 38 | 1967 - 2006 |
| Inductive | Validation | 2093 | 2113 | 297 | 1123 | 673 | 1 | 2007 - 2007 |
| Inductive | Test | 5960 | 6928 | 842 | 3108 | 2010 | 3 | 2008 - 2010 |
FIrst citation occurs from a paper published in 1967 to another published in 1964.
Load dataset
PyTorch Geometric
from pubmed_temporal import Planetoid
# from torch_geometric.datasets import Planetoid # pytorch_geometric#9982
dataset = Planetoid(root=".", name="pubmed", split="temporal")
data = dataset[0]
print(data)
Data(x=[19717, 500], edge_index=[2, 88648], y=[19717], time=[88648],
train_mask=[88648], val_mask=[88648], test_mask=[88648])
The number of edges is doubled in the undirected graph from PyTorch Geometric.
NetworkX
import networkx as nx
G = nx.read_graphml("pubmed/temporal/graph/pubmed-temporal.graphml")
print(G)
DiGraph with 19717 nodes and 44335 edges
The directed graph contains 11 bidirectional edges from co-citing papers.
Build dataset
The temporal split and edge masks for the train, validation, and test splits are already included in this repository.
In order to build it completely from scratch (requires pubmed-id), run:
python build_dataset.py --workers 1
To build the dataset, the following steps are taken, aside from obtaining the required data from PubMed:
- Download original PubMed graph dataset.
- Build NetworkX object from dataset.
- Obtain Planetoid node index map.
- Relabel nodes to match Planetoid's index map.
- Add weight vectors
x. - Add classes
y. - Add time steps
time. - Verify if dataset matches Planetoid's.
- Save data with edge time steps starting from zero.
Extras
To plot the figures and table displayed above:
python extra/build_extra.py
Requires the extra requirements: matplotlib and tabulate.
References
-
Query-driven Active Surveying for Collective Classification (2012). Namata et al., Workshop on Mining and Learning with Graphs (MLG), Edinburgh, Scotland, UK, 2012.
-
Revisiting Semi-Supervised Learning with Graph Embeddings (2016). Yang et al., Proceedings of the 33rd International Conference on Machine Learning (ICML), New York, NY, USA, 2016.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pubmed_temporal-1.1.2.tar.gz.
File metadata
- Download URL: pubmed_temporal-1.1.2.tar.gz
- Upload date:
- Size: 13.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
370704021b3ae347fe110fed1730afd6f0d39175a089eae20488f73825f52a97
|
|
| MD5 |
b9189851c87f898cf997287b6d7767c5
|
|
| BLAKE2b-256 |
a8abd0e10b5546809b2d8ab63648481c93529e3c66952ac0cf7f164c9f5c3bd7
|
File details
Details for the file pubmed_temporal-1.1.2-py3-none-any.whl.
File metadata
- Download URL: pubmed_temporal-1.1.2-py3-none-any.whl
- Upload date:
- Size: 12.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2914055d979b31082bf01be2c407846064350222f302dd146d84b175e87fa32d
|
|
| MD5 |
7ae7e08a70a3d0add8347a938a75d7e0
|
|
| BLAKE2b-256 |
209e558d1cef9c2520f8bafa108d83a99f15eb0c2ca13b64e731b2b96b0ab7a5
|