Skip to main content

A Python client for the Neo4j Graph Data Science (GDS) library

Project description

Neo4j Graph Data Science Client

Latest version PyPI downloads month Python versions Documentation Discord Community forum License

graphdatascience is a Python client for operating and working with the Neo4j Graph Data Science (GDS) library. It enables users to write pure Python code to project graphs, run algorithms, as well as define and use machine learning pipelines in GDS.

The API is designed to mimic the GDS Cypher procedure API in Python code. It abstracts the necessary operations of the Neo4j Python driver to offer a simpler surface. Additionally, the client-specific graph, model, and pipeline objects offer convenient functions that heavily reduce the need to use Cypher to access and operate these GDS resources.

graphdatascience is only guaranteed to work with GDS versions 2.0+.

Please leave any feedback as issues on the source repository. Happy coding!

Installation

To install the latest deployed version of graphdatascience, simply run:

pip install graphdatascience

Getting started

To use the GDS Python Client, we need to instantiate a GraphDataScience object. Then, we can project graphs, create pipelines, train models, and run algorithms.

from graphdatascience import GraphDataScience

# Configure the driver with AuraDS-recommended settings
gds = GraphDataScience("neo4j+s://my-aura-ds.databases.neo4j.io:7687", auth=("neo4j", "my-password"), aura_ds=True)

# Import the Cora common dataset to GDS
G = gds.graph.load_cora()
assert G.node_count() == 2708

# Run PageRank in mutate mode on G
pagerank_result = gds.pageRank.mutate(G, tolerance=0.5, mutateProperty="pagerank")
assert pagerank_result["nodePropertiesWritten"] == G.node_count()

# Create a Node Classification pipeline
pipeline = gds.nc_pipe("myPipe")
assert pipeline.type() == "Node classification training pipeline"

# Add a Degree Centrality feature to the pipeline
pipeline.addNodeProperty("degree", mutateProperty="rank")
pipeline.selectFeatures("rank")
features = pipeline.feature_properties()
assert len(features) == 1
assert features[0]["feature"] == "rank"

# Add a training method
pipeline.addLogisticRegression(penalty=(0.1, 2))

# Train a model on G
model, train_result = pipeline.train(G, modelName="myModel", targetProperty="myClass", metrics=["ACCURACY"])
assert model.metrics()["ACCURACY"]["test"] > 0
assert train_result["trainMillis"] >= 0

# Compute predictions in stream mode
predictions = model.predict_stream(G)
assert len(predictions) == G.node_count()

The example here assumes using an AuraDS instance. For additional examples and extensive documentation of all capabilities, please refer to the GDS Python Client Manual.

Full end-to-end examples in Jupyter ready-to-run notebooks can be found in the examples source directory:

Documentation

The primary source for learning everything about the GDS Python Client is the manual, hosted at https://neo4j.com/docs/graph-data-science-client/current/. The manual is versioned to cover all GDS Python Client versions, so make sure to use the correct version to get the correct information.

Known limitations

Operations known to not yet work with graphdatascience:

License

graphdatascience is licensed under the Apache Software License version 2.0. All content is copyright © Neo4j Sweden AB.

Acknowledgements

This work has been inspired by the great work done in the following libraries:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

graphdatascience-1.7.tar.gz (860.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

graphdatascience-1.7-py3-none-any.whl (938.7 kB view details)

Uploaded Python 3

File details

Details for the file graphdatascience-1.7.tar.gz.

File metadata

  • Download URL: graphdatascience-1.7.tar.gz
  • Upload date:
  • Size: 860.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.2

File hashes

Hashes for graphdatascience-1.7.tar.gz
Algorithm Hash digest
SHA256 5e877c5a601c1790ea97aaf03dee7f82c07966bc8a69e7ce86ea0686a90508d2
MD5 22a456b47a36f23aad4978cbf724d12a
BLAKE2b-256 739db2a6dc8c06b076bf71823d8b00eacca82b1b1ca2efea5ee2dc841ce0fd81

See more details on using hashes here.

File details

Details for the file graphdatascience-1.7-py3-none-any.whl.

File metadata

  • Download URL: graphdatascience-1.7-py3-none-any.whl
  • Upload date:
  • Size: 938.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.2

File hashes

Hashes for graphdatascience-1.7-py3-none-any.whl
Algorithm Hash digest
SHA256 bfda0823452974a705374bdaaa6d6819e3adf89ee5abec44e4137d211e8c10b3
MD5 57547f359e6031d4af3cc195fb917037
BLAKE2b-256 56a70d5a36acfaf551fa9abcb99c32cf3ff91401282200e3d2f01cc5c64c2190

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page