Skip to main content

Fast temporal negative edge sampler for graph neural networks

Project description

Temporal Negative Edge Sampler

PyPI Latest Release PyPI Downloads

The Problem:

Poursafaei et al. in NeurIPS 2022 described that random negative edges make temporal link prediction task easy. This is because random negatives are often between unrelated node pairs that have never interacted, making them trivially easy for models to distinguish from true positive edges. So they introduced the concept of historical negatives—edges that existed in the past but are inactive at the current time—making the prediction task more realistic and challenging.

Huang et al. in NeurIPS 2023 created a benchmark based on this called TGB (Temporal Graph Benchmark) where in the validation and test sets they included both historical and random negatives.

  • Historical negatives: Edges that existed in the past but are inactive at the current time between a pair of nodes.
  • Random negatives: Edges that never existed in the past between a pair of nodes.

TGB generated 10-20 negative edges per positive edge where 50% is historical and 50% is random.

However, they only released pre-generated pickle files with negatives for their benchmark datasets, and did not publish code to generate such negatives for new datasets or for training-time augmentation.

Negative edges are generated using the following steps:

  1. Group all edges by timestamp, sorted in chronological order.

  2. Iterate through each timestamp group (i.e., a batch of edges sharing the same timestamp):

    • Add the current batch to a streaming graph structure, which keeps track of all previously seen edges (the historical graph).
  3. For each source node in the current batch:

    • Sample n × p historical negative edges:
      • These are destination nodes that have been connected to the source in the past, but are not connected to the source in the current timestamp.
    • Sample n × (1 - p) random negative edges:
      • These are destination nodes that have never been connected to the source in the historical graph (including the current batch).

While the algorithm described above is correct and benchmark-aligned, it is computationally expensive if implemented naively in Python. A naive Python implementation will not scale to large datasets and may run indefinitely for even moderate-sized graphs.

Temporal Negative Edge Sampler is a high-performance library for generating negative edges in temporal link prediction tasks, supporting both historical and random negative sampling. It mirrors the evaluation strategy used in the Temporal Graph Benchmark (TGB), but extends it to training and custom datasets. The core implementation is written in parallelized C++ with Python bindings, making it scalable to large graphs with millions of edges.

API

collect_all_negatives_by_timestamp

Collects negative edges for temporal graph sampling, processing edges grouped by timestamp.

Signature

from temporal_negative_edge_sampler import collect_all_negatives_by_timestamp

collect_all_negatives_by_timestamp(
    sources: np.ndarray,
    targets: np.ndarray,
    timestamps: np.ndarray,
    is_directed: bool,
    num_negatives_per_positive: int,
    historical_negative_percentage: float = 0.5
) -> Tuple[np.ndarray, np.ndarray]

Parameters

  • sources (np.ndarray): Array of source node IDs
  • targets (np.ndarray): Array of target node IDs
  • timestamps (np.ndarray): Array of timestamps for each edge
  • is_directed (bool): Whether the graph is directed
  • num_negatives_per_positive (int): Number of negative edges to sample per positive edge
  • historical_negative_percentage (float, optional): Ratio of historical vs random negatives (default: 0.5)

References:

Poursafaei et al. (2022)
Towards Better Evaluation for Dynamic Link Prediction
Farimah Poursafaei, Shenyang Huang, Kris Pelrine, Reihaneh Rabbany
In NeurIPS 2022 Datasets and Benchmarks Track.
https://arxiv.org/pdf/2207.10128

Huang et al. (2023)
Temporal Graph Benchmark for Machine Learning on Temporal Graphs
Shenyang Huang, Farimah Poursafaei, Jacob Danovitch, Matthias Fey, Weihua Hu, Emanuele Rossi, Jure Leskovec, Michael Bronstein, Guillaume Rabusseau, Reihaneh Rabbany
In NeurIPS 2023 Datasets and Benchmarks Track.
https://arxiv.org/pdf/2307.01026

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

temporal_negative_edge_sampler-0.1.14-cp311-cp311-manylinux_2_34_x86_64.whl (214.4 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.34+ x86-64

temporal_negative_edge_sampler-0.1.14-cp310-cp310-manylinux_2_34_x86_64.whl (212.9 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.34+ x86-64

temporal_negative_edge_sampler-0.1.14-cp310-cp310-macosx_15_0_arm64.whl (73.6 kB view details)

Uploaded CPython 3.10macOS 15.0+ ARM64

temporal_negative_edge_sampler-0.1.14-cp39-cp39-manylinux_2_34_x86_64.whl (211.9 kB view details)

Uploaded CPython 3.9manylinux: glibc 2.34+ x86-64

temporal_negative_edge_sampler-0.1.14-cp38-cp38-manylinux_2_34_x86_64.whl (211.8 kB view details)

Uploaded CPython 3.8manylinux: glibc 2.34+ x86-64

File details

Details for the file temporal_negative_edge_sampler-0.1.14-cp311-cp311-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for temporal_negative_edge_sampler-0.1.14-cp311-cp311-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 8caa77c471e9bf63a817e192f671183e6cfeb3d72cfb111fccafc9f8557b0a5f
MD5 662bc0149592c5f5bc7108a118a2106a
BLAKE2b-256 7a49a52f00b2010e27ed1b78da188ceb9f07023cb8bb8ae9bd2b114dfc06feb2

See more details on using hashes here.

File details

Details for the file temporal_negative_edge_sampler-0.1.14-cp310-cp310-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for temporal_negative_edge_sampler-0.1.14-cp310-cp310-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 9a32ede770d55ebaed9ea426e6dd9e597cc7ad230e19687f64bce819e7e6cd80
MD5 3b0a462520a7d87ee7ddb6e42bf250bc
BLAKE2b-256 85b05f06f3d878fce65c36f675d7f182f55853c316af9957c7dc95e0c5f9049a

See more details on using hashes here.

File details

Details for the file temporal_negative_edge_sampler-0.1.14-cp310-cp310-macosx_15_0_arm64.whl.

File metadata

File hashes

Hashes for temporal_negative_edge_sampler-0.1.14-cp310-cp310-macosx_15_0_arm64.whl
Algorithm Hash digest
SHA256 76daa58a0664dfb7eb30d40d0dcf49d402135bd851e2273c8046f3aef8d8c637
MD5 499a29373291a53be6c1a93b3aa17437
BLAKE2b-256 0d861f479d507e96da4f8cd8b4f259ec99b9013c3ad019b627070e08f1ae1189

See more details on using hashes here.

File details

Details for the file temporal_negative_edge_sampler-0.1.14-cp39-cp39-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for temporal_negative_edge_sampler-0.1.14-cp39-cp39-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 c36bb671419be1376cc8d8085a2d79692cac136dabe1c4f6a008be87e7290f7d
MD5 5d523ecee2d1d50d8d70f55f798f3fe8
BLAKE2b-256 526e4c7e85021e863d24f278ee6571a709112a2ca0fbc843b8d9f2dd13842a33

See more details on using hashes here.

File details

Details for the file temporal_negative_edge_sampler-0.1.14-cp38-cp38-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for temporal_negative_edge_sampler-0.1.14-cp38-cp38-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 1a76c23486d2308dee461ce60af09bbe538cdff7c810de23dbdd518f88567a02
MD5 a59d2c20107dc212af203a96ec0f7131
BLAKE2b-256 68a7878ba11a76a7eb905c78c1c640aa205db6b860c759d0f19b299968dbdd24

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page