Skip to main content

Given an input graph the library generates graph embeddings using Low-Code built on top of PyG

Project description

fastgraphml

Given an input graph it generates Graph Embeddings using Low-Code framework built on top of PyG. The package supports training on both GPU and CPU enabled machines. Training jobs on GPUs results in much faster execution and increased in performance when it comes to handling large graphs as compared to CPUs. In addition, the framework provides tight integration with ArangoDB which is a scalable, fully managed graph database, document store and search engine in one place. Once Graph Embeddings are generated, they can be used for various downstream machine learning tasks like Node Classification, Link Prediction, Visualisation, Community Detection, Similartiy Search, Recommendation, etc.

Installation

Required Dependencies

  1. PyTorch 1.12.* is required.
    • Install using previous version that matches your CUDA version: pytorch
      • To find your installed CUDA version run nvidia-smi in your terminal.
  2. pyg
  3. FAISS
    • Note: For FAISS-CPU one needs numba==0.53.0

Latest Release

pip install fastgraphml

Quickstart: Graph Embedding Generation

Use Case 1: Generates Graph Embeddings using the graphs stored inside ArangoDB:

Example Homogneous Graphs

from fastgraphml.graph_embeddings import SAGE, GAT
from fastgraphml.graph_embeddings import downstream_tasks
from fastgraphml import Datasets 
from arango import ArangoClient

# Initialize the ArangoDB client.
client = ArangoClient("http://127.0.0.1:8529")
db = client.db('_system', username='root', password='')

# Loading Amazon Computer Products dataset into ArangoDB
Datasets(db).load("AMAZON_COMPUTER_PRODUCTS")

# Optionally use arangodb graph
# arango_graph = db.graph('product_graph')

# metadata information of arango_graph
metagraph = {
    "vertexCollections": {
        "Computer_Products": {"x": "features", "y": "label"},
    },
    "edgeCollections": {
        "bought_together": {},
    },
}

# generating graph embeddings with 3 lines of code
model = SAGE(db,'product_graph', metagraph, embedding_size=64) # define graph embedding model
model._train(epochs=10) # train
embeddings = model.get_embeddings() # get embeddings

Example Heterogeneous Graphs

from fastgraphml.graph_embeddings import METAPATH2VEC, DMGI
from fastgraphml.graph_embeddings import downstream_tasks 
from fastgraphml import Datasets 

from arango import ArangoClient

# Initialize the ArangoDB client.
client = ArangoClient("http://127.0.0.1:8529")
db = client.db('_system', username='root')

# Loading IMDB Dataset into ArangoDB
Datasets(db).load("IMDB_X")

# Optionally use ArangoDB Graph
# arango_graph = db.graph("IMDB")

metagraph = {
    "vertexCollections": {

        "movie": { "x": "x", "y": "y"},  
        "director": {"x": "x"},
        "actor": {"x": "x"},
    },
    "edgeCollections": {
        "to": {},
    },
}
metapaths = [('movie', 'to','actor'),
             ('actor', 'to', 'movie'), ] # MAM # co-actor relationship

# generating graph embeddings with 3 lines of code
model = METAPATH2VEC(db, "IMDB_X", metagraph, metapaths, key_node='movie', embedding_size=128,
                     walk_length=5, context_size=6, walks_per_node=5, num_negative_samples=5,
                     sparse=True) # define model
model._train(epochs=10, lr=0.03) # train
embeddings = model.get_embeddings() # get embeddings

Use Case 2: Generates Graph Embeddings using PyG graphs:

from fastgraphml.graph_embeddings import SAGE, GAT
from fastgraphml.graph_embeddings import downstream_tasks 
from torch_geometric.datasets import Planetoid

# load pyg dataset
dataset = Planetoid(root='/tmp/Cora', name='Cora')
data = dataset[0]

# generating graph embeddings with 3 lines of code
model = SAGE(pyg_graph=data, embedding_size=64) # define graph embedding model
model._train(epochs=10) # train
embeddings = model.get_embeddings() # get embeddings

Models Supported

Model Homogeneous Heterogeneous Node Features
GraphSage ✔️ ✔️
GAT ✔️ ✔️
Metapath2Vec ✔️
DMGI ✔️ ✔️

Quickstart: Downstream Tasks

In addition, the library also provides various low-code helper methods to carry out number of downstream tasks such as visualisation, similarity search (recommendation) , and link prediction (to be added soon).

Downstream Task 1: Graph Embedding Visualisation

This method helps in visualization of generated Graph Embeddings by reducing them 2 dimensions using U-Map.

Example

# amazon computers dataset
class_names = {0: 'Desktops',1: 'Data Storage',2: 'Laptops',3: 'Monitors',4: 'Computer Components',
 5: 'Video Projectors',6: 'Routers',7: 'Tablets',8: 'Networking Products',9: 'Webcams'}
# with one line of code
downstream_tasks.visualize_embeddings(model.G, embeddings, class_mapping=class_names, emb_percent=0.1) # model.G is PyG data object

Downstream Task 2: Scalable Similarity Search with Faiss

Faiss is a tool developed by Facebook that performs similarity search in sets of vectors of any size, up to ones that possibly do not fit in RAM. We support two types of search for now:

  1. exact search: For precise similarity search but at the cost of scalability.
  2. approx search: For scalable similarity search but at the cost of some precision loss.

Example 1

downstream_tasks.similarity_search(embeddings, top_k_nbors=10, nlist=10, search_type='exact')

Example 2

If nearest_nbors_search=True, store_embeddings method saves generated Graph Embeddings in ArangoDB along with top_k nearest neighbors (node ids with similar embeddings) and their corresponding similarity scores (i.e. cosine distance).

model.graph_util.store_embeddings(embeddings, collection_name=None, batch_size=100, class_mapping=None, 
        nearest_nbors_search=False, top_k_nbors=10, nlist=10, search_type='exact')

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fastgraphml-0.0.0.tar.gz (177.0 kB view details)

Uploaded Source

Built Distribution

fastgraphml-0.0.0-py3-none-any.whl (28.5 kB view details)

Uploaded Python 3

File details

Details for the file fastgraphml-0.0.0.tar.gz.

File metadata

  • Download URL: fastgraphml-0.0.0.tar.gz
  • Upload date:
  • Size: 177.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.8.13

File hashes

Hashes for fastgraphml-0.0.0.tar.gz
Algorithm Hash digest
SHA256 de0e284930c71a16079ae3b4b526173b3a412edd1cb0bfe2e9b320ce9edf8e9a
MD5 fac62cb0cf47ed1f570833226040569b
BLAKE2b-256 f150630a0c2a713a98bb0f72199837f80a2ffb729b0be24adc48b076e1dbf745

See more details on using hashes here.

File details

Details for the file fastgraphml-0.0.0-py3-none-any.whl.

File metadata

  • Download URL: fastgraphml-0.0.0-py3-none-any.whl
  • Upload date:
  • Size: 28.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.8.13

File hashes

Hashes for fastgraphml-0.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1190dd64e6100c4bd1c91d1845a7d96ec27262a4da44bdfde19b32457e0b8495
MD5 e19b59814b671ff3bc79509cf26b71c1
BLAKE2b-256 609746723ce644f545ee76c7cf65ac06579456065d43efe4be39a0c0a4f7b611

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page