NebulaGraph Data Intelligence Suite

Project description

NebulaGraph Data Intelligence(ngdi) Suite

NebulaGraph Data Intelligence Suite for Python (ngdi) is a powerful Python library that offers a range of APIs for data scientists to effectively read, write, analyze, and compute data in NebulaGraph. This library allows data scientists to perform these operations on a single machine using NetworkX, or in a distributed computing environment using Spark, in unified and intuitive API. With ngdi, data scientists can easily access and process data in NebulaGraph, enabling them to perform advanced analytics and gain valuable insights.

        ┌───────────────────────────────────────────────────┐            
        │   Spark Cluster                                   │            
        │    .─────.    .─────.    .─────.    .─────.       │            
     ┌─▶│   :       ;  :       ;  :       ;  :       ;      │            
     │  │     `───'      `───'      `───'      `───'        │            
Algorithm                                                   │            
  Spark └───────────────────────────────────────────────────┘            
 Engine ┌────────────────────────────────────────────────────────────────┐
     └──┤                                                                │
        │   NebulaGraph Data Intelligence Suite(ngdi)                    │
        │     ┌────────┐    ┌──────┐    ┌────────┐   ┌─────┐             │
        │     │ Reader │    │ Algo │    │ Writer │   │ GNN │             │
        │     └────────┘    └──────┘    └────────┘   └─────┘             │
        │          ├────────────┴───┬────────┴─────┐    └──────┐         │
        │          ▼                ▼              ▼           ▼         │
        │   ┌─────────────┐ ┌──────────────┐ ┌──────────┐┌───────────┐   │
     ┌──┤   │ SparkEngine │ │ NebulaEngine │ │ NetworkX ││ DGLEngine │   │
     │  │   └─────────────┘ └──────────────┘ └──────────┘└───────────┘   │
     │  └──────────┬─────────────────────────────────────────────────────┘
     │             │        Spark                                        
     │             └────────Reader ────────────┐                         
Spark Reader              Query Mode           │                         
Scan Mode                                      ▼                         
     │  ┌───────────────────────────────────────────────────┐            
     │  │  NebulaGraph Graph Engine         Nebula-GraphD   │            
     │  ├──────────────────────────────┬────────────────────┤            
     │  │  NebulaGraph Storage Engine  │                    │            
     └─▶│  Nebula-StorageD             │    Nebula-Metad    │            
        └──────────────────────────────┴────────────────────┘

Installation

pip install ngdi

Spark Engine Prerequisites

NebulaGraph Engine Prerequisites

Run on PySpark Jupyter Notebook(Spark Engine)

Assuming we have put the nebula-spark-connector.jar and nebula-algo.jar in /opt/nebulagraph/ngdi/package/.

export PYSPARK_PYTHON=python3
export PYSPARK_DRIVER_PYTHON=jupyter
export PYSPARK_DRIVER_PYTHON_OPTS="notebook --ip=0.0.0.0 --port=8888 --no-browser"

pyspark --driver-class-path /opt/nebulagraph/ngdi/package/nebula-spark-connector.jar \
    --driver-class-path /opt/nebulagraph/ngdi/package/nebula-algo.jar \
    --jars /opt/nebulagraph/ngdi/package/nebula-spark-connector.jar \
    --jars /opt/nebulagraph/ngdi/package/nebula-algo.jar

Then we could access Jupyter Notebook with PySpark and refer to examples/spark_engine.ipynb

Submit Algorithm job to Spark Cluster(Spark Engine)

Assuming we have put the nebula-spark-connector.jar and nebula-algo.jar in /opt/nebulagraph/ngdi/package/; We have put the ngdi-py3-env.zip in /opt/nebulagraph/ngdi/package/. And we have the following Algorithm job in pagerank.py:

from ngdi import NebulaGraphConfig
from ngdi import NebulaReader

# set NebulaGraph config
config_dict = {
    "graphd_hosts": "graphd:9669",
    "metad_hosts": "metad0:9669,metad1:9669,metad2:9669",
    "user": "root",
    "password": "nebula",
    "space": "basketballplayer",
}
config = NebulaGraphConfig(**config_dict)

# read data with spark engine, query mode
reader = NebulaReader(engine="spark")
query = """
    MATCH ()-[e:follow]->()
    RETURN e LIMIT 100000
"""
reader.query(query=query, edge="follow", props="degree")
df = reader.read()

# run pagerank algorithm
pr_result = df.algo.pagerank(reset_prob=0.15, max_iter=10)

Note, this could be done by Airflow, or other job scheduler in production.

Then we can submit the job to Spark cluster:

spark-submit --master spark://master:7077 \
    --driver-class-path /opt/nebulagraph/ngdi/package/nebula-spark-connector.jar \
    --driver-class-path /opt/nebulagraph/ngdi/package/nebula-algo.jar \
    --jars /opt/nebulagraph/ngdi/package/nebula-spark-connector.jar \
    --jars /opt/nebulagraph/ngdi/package/nebula-algo.jar \
    --py-files /opt/nebulagraph/ngdi/package/ngdi-py3-env.zip \
    pagerank.py

Run ngdi algorithm job from python script(Spark Engine)

We have everything ready as above, including the pagerank.py.

import subprocess

subprocess.run(["spark-submit", "--master", "spark://master:7077",
                "--driver-class-path", "/opt/nebulagraph/ngdi/package/nebula-spark-connector.jar",
                "--driver-class-path", "/opt/nebulagraph/ngdi/package/nebula-algo.jar",
                "--jars", "/opt/nebulagraph/ngdi/package/nebula-spark-connector.jar",
                "--jars", "/opt/nebulagraph/ngdi/package/nebula-algo.jar",
                "--py-files", "/opt/nebulagraph/ngdi/package/ngdi-py3-env.zip",
                "pagerank.py"])

Run on single machine(NebulaGraph Engine)

Assuming we have NebulaGraph cluster up and running, and we have the following Algorithm job in pagerank_nebula_engine.py:

This file is the same as pagerank.py except for the following line:

- reader = NebulaReader(engine="spark")
+ reader = NebulaReader(engine="nebula")

Then we can run the job on single machine:

python3 pagerank.py

Documentation

API Reference

Usage

Spark Engine Examples

NebulaGraph Engine Examples(not yet implemented)

from ngdi import NebulaReader

# read data with nebula engine, query mode
reader = NebulaReader(engine="nebula")
reader.query("""
    MATCH ()-[e:follow]->()
    RETURN e.src, e.dst, e.degree LIMIT 100000
""")
df = reader.read() # this will take some time
df.show(10)

# read data with nebula engine, scan mode
reader = NebulaReader(engine="nebula")
reader.scan(edge_types=["follow"])
df = reader.read() # this will take some time
df.show(10)

# convert dataframe to NebulaGraphObject
graph = reader.to_graph() # this will take some time
graph.nodes.show(10)
graph.edges.show(10)

# run pagerank algorithm
pr_result = graph.algo.pagerank(reset_prob=0.15, max_iter=10) # this will take some time

Project details

Release history Release notifications | RSS feed

0.2.6

Mar 17, 2023

0.2.5

Mar 17, 2023

0.2.4

Mar 17, 2023

0.2.3

Mar 1, 2023

0.2.2

Mar 1, 2023

0.2.1

Mar 1, 2023

This version

0.1.9

Mar 1, 2023

0.1.8

Feb 28, 2023

0.1.7

Feb 28, 2023

0.1.6

Feb 28, 2023

0.1.5

Feb 28, 2023

0.1.1

Feb 27, 2023

0.1.0

Feb 24, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ngdi-0.1.9.tar.gz (16.8 kB view details)

Uploaded Mar 1, 2023 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ngdi-0.1.9-py3-none-any.whl (17.9 kB view details)

Uploaded Mar 1, 2023 Python 3

File details

Details for the file ngdi-0.1.9.tar.gz.

File metadata

Download URL: ngdi-0.1.9.tar.gz
Upload date: Mar 1, 2023
Size: 16.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: pdm/2.4.6 CPython/3.8.10

File hashes

Hashes for ngdi-0.1.9.tar.gz
Algorithm	Hash digest
SHA256	`228a39df18ba8df1fefd28deb99a1b058cdf8263bd5b7aa63951c99a98d68010`
MD5	`5f020ce52131cda84ff8672c8a870fc5`
BLAKE2b-256	`0e5fe63a802f857fc7fd5e0c9181454200d1e0ce52fecf375258e804b7888d40`

See more details on using hashes here.

File details

Details for the file ngdi-0.1.9-py3-none-any.whl.

File metadata

Download URL: ngdi-0.1.9-py3-none-any.whl
Upload date: Mar 1, 2023
Size: 17.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: pdm/2.4.6 CPython/3.8.10

File hashes

Hashes for ngdi-0.1.9-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b1202afe14ee8fb449e27273060d920e06be015286532a3faf971844befeb072`
MD5	`c9dda4f39b8642cf0217132263ce176a`
BLAKE2b-256	`f57bf0f637034ea1c6c51b4dad2490f805d0935c038a98149dfa97c216bd4f7a`

See more details on using hashes here.

ngdi 0.1.9

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

NebulaGraph Data Intelligence(ngdi) Suite

Installation

Spark Engine Prerequisites

NebulaGraph Engine Prerequisites

Run on PySpark Jupyter Notebook(Spark Engine)

Submit Algorithm job to Spark Cluster(Spark Engine)

Run ngdi algorithm job from python script(Spark Engine)

Run on single machine(NebulaGraph Engine)

Documentation

Usage

Spark Engine Examples

NebulaGraph Engine Examples(not yet implemented)

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes