Skip to main content

A graph embedding library with PyTorch and RAPIDS acceleration

Project description

graphem rapids logo

GraphEm Rapids: High-Performance Graph Embedding

License: MIT Python 3.8+ PyTorch 2.0+ RAPIDS cuVS

GraphEm Rapids is a high-performance implementation of the GraphEm graph embedding library, with PyTorch and RAPIDS for enhanced scalability and GPU acceleration.

Key Features

  • Multiple Backends: PyTorch, RAPIDS cuVS, and CPU fallback
  • Automatic Backend Selection: Optimal backend chosen based on data size and hardware
  • Large-Scale Support: Handles graphs with millions of vertices using RAPIDS
  • Memory Efficient: Adaptive chunking and memory management
  • GPU Accelerated: Full CUDA support with PyTorch and RAPIDS

Installation

Basic Installation (PyTorch backend)

pip install graphem-rapids

With CUDA Support

pip install graphem-rapids[cuda]

With Full RAPIDS Support

pip install graphem-rapids[rapids]
# or for everything
pip install graphem-rapids[all]

Development Installation

git clone https://github.com/sashakolpakov/graphem-rapids.git
cd graphem-rapids
pip install -e .

Quick Start

Automatic Backend Selection

import graphem_rapids as gr

# Generate a graph
edges = gr.erdos_renyi_graph(n=10000, p=0.001)

# Create embedder with automatic backend selection
embedder = gr.create_graphem(edges, n_vertices=10000, dimension=3)

# Run layout
embedder.run_layout(num_iterations=50)

# Display
embedder.display_layout()

Explicit Backend Selection

# Force PyTorch backend
embedder = gr.GraphEmbedderPyTorch(
    edges, n_vertices=10000, dimension=3,
    device='cuda'  # or 'cpu'
)

# Force RAPIDS cuVS backend (for large graphs)
embedder = gr.GraphEmbedderCuVS(
    edges, n_vertices=100000, dimension=3,
    index_type='ivf_flat'
)

Backend Information

# Check available backends
info = gr.get_backend_info()
print(f"CUDA available: {info['cuda_available']}")
print(f"Recommended: {info['recommended_backend']}")

Architecture

GraphEm Rapids provides multiple computational backends:

PyTorch Backend

  • Best for: Medium-scale graphs (1K-100K vertices)
  • Features: CUDA acceleration, memory-efficient chunking
  • Fallback: Automatic CPU mode when GPU unavailable

RAPIDS cuVS Backend

  • Best for: Large-scale graphs (100K+ vertices)
  • Features: Optimized KNN with cuVS indices, CuPy operations
  • Index Types: Brute force, IVF-Flat, IVF-PQ (automatic selection)

Automatic Selection

The create_graphem() function automatically selects the optimal backend based on:

  • Dataset size (number of vertices)
  • Available hardware (CUDA, RAPIDS)
  • Memory constraints
  • User preferences

Configuration

Environment Variables

export GRAPHEM_BACKEND=pytorch     # Force backend
export GRAPHEM_PREFER_GPU=true     # Prefer GPU backends
export GRAPHEM_MEMORY_LIMIT=8      # Memory limit in GB
export GRAPHEM_VERBOSE=true        # Verbose logging
export GRAPHEM_RAPIDS_QUIET=true   # Suppress startup messages

Programmatic Configuration

from graphem_rapids.utils import BackendConfig

config = BackendConfig(
    n_vertices=50000,
    dimension=3,
    force_backend='cuvs',
    memory_limit=16.0,  # GB
    prefer_gpu=True
)

embedder = gr.create_graphem(edges, n_vertices=50000, **config.__dict__)

Influence Maximization

GraphEm Rapids maintains full compatibility with influence maximization algorithms:

# Select influential nodes using embedding-based method
seeds = gr.graphem_seed_selection(embedder, k=10)

# Compare with traditional methods
import networkx as nx
G = nx.from_edgelist(edges)
influence, _ = gr.ndlib_estimated_influence(G, seeds, p=0.1)
print(f"Estimated influence: {influence} nodes")

Testing

Run the test suite:

pytest tests/ -v

Test specific backends:

pytest tests/test_pytorch_backend.py
pytest tests/test_cuvs_backend.py

Benchmarking

Run performance benchmarks:

python benchmarks/run_benchmarks.py

Compare backends:

python benchmarks/compare_backends.py --sizes 1000,10000,100000

Advanced Usage

Custom Memory Management

from graphem_rapids.utils import MemoryManager

with MemoryManager(cleanup_on_exit=True):
    embedder = gr.create_graphem(edges, n_vertices=50000)
    embedder.run_layout(50)
    # Automatic cleanup on exit

Chunked Processing for Large Graphs

from graphem_rapids.utils import get_optimal_chunk_size

chunk_size = get_optimal_chunk_size(n_vertices=1000000, dimension=3)
embedder = gr.GraphEmbedderPyTorch(
    edges, n_vertices=1000000,
    batch_size=chunk_size,
    memory_efficient=True
)

cuVS Index Configuration

embedder = gr.GraphEmbedderCuVS(
    edges, n_vertices=500000,
    index_type='ivf_pq',  # Options: 'brute_force', 'ivf_flat', 'ivf_pq'
    sample_size=2048,     # Larger samples for better accuracy
    batch_size=8192       # Larger batches for better throughput
)

Documentation

License

MIT License - see LICENSE file.

Citation

If you use GraphEm Rapids in your research, please cite:

@misc{kolpakov-rivin-2025fast,
  title={Fast Geometric Embedding for Node Influence Maximization},
  author={Kolpakov, Alexander and Rivin, Igor},
  year={2025},
  eprint={2506.07435},
  archivePrefix={arXiv},
  primaryClass={cs.SI},
  url={https://arxiv.org/abs/2506.07435}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

graphem_rapids-0.1.0.tar.gz (48.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

graphem_rapids-0.1.0-py3-none-any.whl (40.8 kB view details)

Uploaded Python 3

File details

Details for the file graphem_rapids-0.1.0.tar.gz.

File metadata

  • Download URL: graphem_rapids-0.1.0.tar.gz
  • Upload date:
  • Size: 48.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.8.10

File hashes

Hashes for graphem_rapids-0.1.0.tar.gz
Algorithm Hash digest
SHA256 6e3d1dab2a567c487b81eb7a5288fc8941dc8da5213829d2199994bcd0e894f4
MD5 5b8105273f37d3c0fa9cdcfb1661d3b6
BLAKE2b-256 72f5e987df4adf5c8099452ad23b30d33f2037a76544adbeed9dac39d971bfff

See more details on using hashes here.

File details

Details for the file graphem_rapids-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: graphem_rapids-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 40.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.8.10

File hashes

Hashes for graphem_rapids-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d0144983f64d7dddee50babb3af8c6182bc56e70da52513ed87f729f2e9b7715
MD5 9da54a80a80ac52b8fc70f7648a2281d
BLAKE2b-256 02f51a173ee3eaa94eec253be776f66f23366a5779992ba61769abe85bd117c1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page