A graph embedding library with PyTorch and RAPIDS acceleration
Project description
GraphEm Rapids: High-Performance Graph Embedding
GraphEm Rapids is a high-performance implementation of the GraphEm graph embedding library, with PyTorch and RAPIDS for enhanced scalability and GPU acceleration.
Key Features
- Multiple Backends: PyTorch, RAPIDS cuVS, and CPU fallback
- Automatic Backend Selection: Optimal backend chosen based on data size and hardware
- Large-Scale Support: Handles graphs with millions of vertices using RAPIDS
- Memory Efficient: Adaptive chunking and memory management
- GPU Accelerated: Full CUDA support with PyTorch and RAPIDS
Installation
Basic Installation (PyTorch backend)
pip install graphem-rapids
With CUDA Support
pip install graphem-rapids[cuda]
With Full RAPIDS Support
pip install graphem-rapids[rapids]
# or for everything
pip install graphem-rapids[all]
Development Installation
git clone https://github.com/sashakolpakov/graphem-rapids.git
cd graphem-rapids
pip install -e .
Quick Start
Automatic Backend Selection
import graphem_rapids as gr
# Generate a graph
edges = gr.erdos_renyi_graph(n=10000, p=0.001)
# Create embedder with automatic backend selection
embedder = gr.create_graphem(edges, n_vertices=10000, dimension=3)
# Run layout
embedder.run_layout(num_iterations=50)
# Display
embedder.display_layout()
Explicit Backend Selection
# Force PyTorch backend
embedder = gr.GraphEmbedderPyTorch(
edges, n_vertices=10000, dimension=3,
device='cuda' # or 'cpu'
)
# Force RAPIDS cuVS backend (for large graphs)
embedder = gr.GraphEmbedderCuVS(
edges, n_vertices=100000, dimension=3,
index_type='ivf_flat'
)
Backend Information
# Check available backends
info = gr.get_backend_info()
print(f"CUDA available: {info['cuda_available']}")
print(f"Recommended: {info['recommended_backend']}")
Architecture
GraphEm Rapids provides multiple computational backends:
PyTorch Backend
- Best for: Medium-scale graphs (1K-100K vertices)
- Features: CUDA acceleration, memory-efficient chunking
- Fallback: Automatic CPU mode when GPU unavailable
RAPIDS cuVS Backend
- Best for: Large-scale graphs (100K+ vertices)
- Features: Optimized KNN with cuVS indices, CuPy operations
- Index Types: Brute force, IVF-Flat, IVF-PQ (automatic selection)
Automatic Selection
The create_graphem() function automatically selects the optimal backend based on:
- Dataset size (number of vertices)
- Available hardware (CUDA, RAPIDS)
- Memory constraints
- User preferences
Configuration
Environment Variables
export GRAPHEM_BACKEND=pytorch # Force backend
export GRAPHEM_PREFER_GPU=true # Prefer GPU backends
export GRAPHEM_MEMORY_LIMIT=8 # Memory limit in GB
export GRAPHEM_VERBOSE=true # Verbose logging
export GRAPHEM_RAPIDS_QUIET=true # Suppress startup messages
Programmatic Configuration
from graphem_rapids.utils import BackendConfig
config = BackendConfig(
n_vertices=50000,
dimension=3,
force_backend='cuvs',
memory_limit=16.0, # GB
prefer_gpu=True
)
embedder = gr.create_graphem(edges, n_vertices=50000, **config.__dict__)
Influence Maximization
GraphEm Rapids maintains full compatibility with influence maximization algorithms:
# Select influential nodes using embedding-based method
seeds = gr.graphem_seed_selection(embedder, k=10)
# Compare with traditional methods
import networkx as nx
G = nx.from_edgelist(edges)
influence, _ = gr.ndlib_estimated_influence(G, seeds, p=0.1)
print(f"Estimated influence: {influence} nodes")
Testing
Run the test suite:
pytest tests/ -v
Test specific backends:
pytest tests/test_pytorch_backend.py
pytest tests/test_cuvs_backend.py
Benchmarking
Run performance benchmarks:
python benchmarks/run_benchmarks.py
Compare backends:
python benchmarks/compare_backends.py --sizes 1000,10000,100000
Advanced Usage
Custom Memory Management
from graphem_rapids.utils import MemoryManager
with MemoryManager(cleanup_on_exit=True):
embedder = gr.create_graphem(edges, n_vertices=50000)
embedder.run_layout(50)
# Automatic cleanup on exit
Chunked Processing for Large Graphs
from graphem_rapids.utils import get_optimal_chunk_size
chunk_size = get_optimal_chunk_size(n_vertices=1000000, dimension=3)
embedder = gr.GraphEmbedderPyTorch(
edges, n_vertices=1000000,
batch_size=chunk_size,
memory_efficient=True
)
cuVS Index Configuration
embedder = gr.GraphEmbedderCuVS(
edges, n_vertices=500000,
index_type='ivf_pq', # Options: 'brute_force', 'ivf_flat', 'ivf_pq'
sample_size=2048, # Larger samples for better accuracy
batch_size=8192 # Larger batches for better throughput
)
Documentation
License
MIT License - see LICENSE file.
Citation
If you use GraphEm Rapids in your research, please cite:
@misc{kolpakov-rivin-2025fast,
title={Fast Geometric Embedding for Node Influence Maximization},
author={Kolpakov, Alexander and Rivin, Igor},
year={2025},
eprint={2506.07435},
archivePrefix={arXiv},
primaryClass={cs.SI},
url={https://arxiv.org/abs/2506.07435}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file graphem_rapids-0.1.0.tar.gz.
File metadata
- Download URL: graphem_rapids-0.1.0.tar.gz
- Upload date:
- Size: 48.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.8.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6e3d1dab2a567c487b81eb7a5288fc8941dc8da5213829d2199994bcd0e894f4
|
|
| MD5 |
5b8105273f37d3c0fa9cdcfb1661d3b6
|
|
| BLAKE2b-256 |
72f5e987df4adf5c8099452ad23b30d33f2037a76544adbeed9dac39d971bfff
|
File details
Details for the file graphem_rapids-0.1.0-py3-none-any.whl.
File metadata
- Download URL: graphem_rapids-0.1.0-py3-none-any.whl
- Upload date:
- Size: 40.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.8.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d0144983f64d7dddee50babb3af8c6182bc56e70da52513ed87f729f2e9b7715
|
|
| MD5 |
9da54a80a80ac52b8fc70f7648a2281d
|
|
| BLAKE2b-256 |
02f51a173ee3eaa94eec253be776f66f23366a5779992ba61769abe85bd117c1
|