Skip to main content

A graph embedding library with PyTorch and RAPIDS acceleration

Project description

graphem rapids logo

GraphEm Rapids: High-Performance Graph Embedding

License: MIT Python 3.8+ PyTorch 2.0+ RAPIDS cuVS Pepy Total Downloads

High-performance GraphEm implementation using PyTorch and RAPIDS cuVS. Force-directed layout with geometric intersection detection produces embeddings that correlate strongly with centrality measures.

Features

  • Unified API: Scipy sparse adjacency matrices, sklearn-style parameters (n_components, n_neighbors)
  • Multiple Backends: PyTorch (1K-100K vertices), RAPIDS cuVS (100K+ vertices), automatic selection
  • GPU Acceleration: CUDA support, memory-efficient chunking, automatic CPU fallback
  • Graph Generators: Erdős-Rényi, scale-free, SBM, bipartite, Delaunay, and more
  • Influence Maximization: Fast embedding-based seed selection

Installation

pip install graphem-rapids              # PyTorch backend
pip install graphem-rapids[cuda]        # + CUDA support
pip install graphem-rapids[rapids]      # + RAPIDS cuVS
pip install graphem-rapids[all]         # Everything

Quick Start

import graphem_rapids as gr

# Generate graph (returns sparse adjacency matrix)
adjacency = gr.generate_er(n=1000, p=0.01)

# Create embedder (automatic backend selection)
embedder = gr.create_graphem(adjacency, n_components=3)

# Run layout
embedder.run_layout(num_iterations=50)

# Get positions and visualize
positions = embedder.get_positions()  # numpy array (n, d)
embedder.display_layout()             # 2D or 3D plot

Backend Selection

Automatic (Recommended)

embedder = gr.create_graphem(adjacency, n_components=3)

Explicit PyTorch

embedder = gr.GraphEmbedderPyTorch(
    adjacency, n_components=3, device='cuda',
    L_min=1.0, k_attr=0.2, k_inter=0.5, n_neighbors=10,
    batch_size=None  # Automatic (or manual: 1024)
)

Explicit RAPIDS cuVS

embedder = gr.GraphEmbedderCuVS(
    adjacency, n_components=3,
    index_type='auto',  # 'brute_force', 'ivf_flat', 'ivf_pq'
    sample_size=1024, batch_size=None
)

Index Types: brute_force (<100K), ivf_flat (100K-1M), ivf_pq (>1M vertices)

Check Backends

info = gr.get_backend_info()
print(f"CUDA: {info['cuda_available']}, Recommended: {info['recommended_backend']}")

Configuration

Environment Variables:

export GRAPHEM_BACKEND=pytorch        # Force backend
export GRAPHEM_PREFER_GPU=true        # Prefer GPU
export GRAPHEM_MEMORY_LIMIT=8         # GB
export GRAPHEM_VERBOSE=true

Programmatic:

from graphem_rapids.utils.backend_selection import BackendConfig, get_optimal_backend

config = BackendConfig(n_vertices=50000, force_backend='cuvs', memory_limit=16.0)
backend = get_optimal_backend(config)
embedder = gr.create_graphem(adjacency, backend=backend)

Graph Generators

All generators return scipy sparse adjacency matrices:

# Random
gr.generate_er(n=1000, p=0.01, seed=42)
gr.generate_random_regular(n=100, d=3, seed=42)

# Scale-free & small-world
gr.generate_ba(n=300, m=3, seed=42)             # Barabási-Albert
gr.generate_ws(n=1000, k=6, p=0.3, seed=42)     # Watts-Strogatz
gr.generate_scale_free(n=100, seed=42)

# Community structures
gr.generate_sbm(n_per_block=75, num_blocks=4, p_in=0.15, p_out=0.01, seed=42)
gr.generate_caveman(l=10, k=10)
gr.generate_relaxed_caveman(l=10, k=10, p=0.1, seed=42)

# Bipartite
gr.generate_bipartite_graph(n_top=50, n_bottom=100, p=0.2, seed=42)
gr.generate_complete_bipartite_graph(n_top=50, n_bottom=100)

# Geometric
gr.generate_geometric(n=100, radius=0.2, dim=2, seed=42)
gr.generate_delaunay_triangulation(n=100, seed=42)
gr.generate_road_network(width=30, height=30)   # 2D grid

# Trees
gr.generate_balanced_tree(r=2, h=10)

Influence Maximization

adjacency = gr.generate_er(n=1000, p=0.01)
embedder = gr.create_graphem(adjacency, n_components=3)
embedder.run_layout(num_iterations=50)

# Fast: embedding-based selection
seeds = gr.graphem_seed_selection(embedder, k=10)

# Evaluate with Independent Cascade model
import networkx as nx
G = nx.from_scipy_sparse_array(adjacency)
influence, _ = gr.ndlib_estimated_influence(G, seeds, p=0.1, iterations_count=100)

# Compare with greedy (slow, optimal)
greedy_seeds, _ = gr.greedy_seed_selection(G, k=10, p=0.1)

Advanced

Memory Management

from graphem_rapids.utils.memory_management import MemoryManager, get_gpu_memory_info

mem_info = get_gpu_memory_info()
print(f"GPU: {mem_info['free']:.1f}GB free / {mem_info['total']:.1f}GB total")

adjacency = gr.generate_er(n=1000, p=0.01)
with MemoryManager(cleanup_on_exit=True):
    embedder = gr.create_graphem(adjacency)
    embedder.run_layout(50)

Batch Size Tuning

from graphem_rapids.utils.memory_management import get_optimal_chunk_size

adjacency = gr.generate_er(n=1000, p=0.01)

# Automatic (recommended)
embedder = gr.GraphEmbedderPyTorch(adjacency, batch_size=None)

# Manual
embedder = gr.GraphEmbedderPyTorch(adjacency, batch_size=1024)

# Programmatic
optimal = get_optimal_chunk_size(n_vertices=1000000, n_components=3, backend='pytorch')
embedder = gr.GraphEmbedderPyTorch(adjacency, batch_size=optimal)

Testing & Benchmarking

pytest                                          # Run all tests
pytest tests/test_pytorch_backend.py            # Specific backend
python benchmarks/run_benchmarks.py             # Performance tests
python benchmarks/compare_backends.py --sizes 1000,10000,100000

Contributing

See CONTRIBUTING.md for development setup, testing, and contribution guidelines.

Citation

arXiv

@misc{kolpakov-rivin-2025fast,
  title={Fast Geometric Embedding for Node Influence Maximization},
  author={Kolpakov, Alexander and Rivin, Igor},
  year={2025},
  eprint={2506.07435},
  archivePrefix={arXiv},
  primaryClass={cs.SI},
  url={https://arxiv.org/abs/2506.07435}
}

License

MIT License - see LICENSE file.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

graphem_rapids-0.2.1.tar.gz (74.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

graphem_rapids-0.2.1-py3-none-any.whl (45.6 kB view details)

Uploaded Python 3

File details

Details for the file graphem_rapids-0.2.1.tar.gz.

File metadata

  • Download URL: graphem_rapids-0.2.1.tar.gz
  • Upload date:
  • Size: 74.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for graphem_rapids-0.2.1.tar.gz
Algorithm Hash digest
SHA256 9352649999b8bb1b5c743111d2924de6ab62d68278f41cae7fff0837a8acef19
MD5 f9f301ae92fbb86c4a90c503bd07b531
BLAKE2b-256 9323b38514c164722315cf1a47f7f007f5870ff7d3a618634b5a06f1918fb116

See more details on using hashes here.

File details

Details for the file graphem_rapids-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: graphem_rapids-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 45.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for graphem_rapids-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 296098d05cca77955273b622a3f6a5cf208a8045307176e1fab9dea2270dfa45
MD5 c628f16b775d3dc173ac1de90af5863a
BLAKE2b-256 5b1e5e8cc2403b9d0d2c0c57a195e3079a9a57b94d4775b3336ef6906ec8d8e4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page