Skip to main content

Graph analytics for data lake sources using Neptune Analytics

Project description


nx_neptune

CI Upload Python Package

nx-neptune is a Python library that brings graph analytics to your data lake. Project your data from Amazon S3 Tables, S3 Vectors, Databricks, Snowflake, OpenSearch, and other sources into Neptune Analytics for graph analysis — with results exported back to S3 or persisted as Iceberg tables.

Graph Over Data Lake

Run graph algorithms on data that lives in your data lake — without moving it permanently into a graph database. nx-neptune projects your data into a Neptune Analytics graph using SQL queries (via Amazon Athena), so you can run graph algorithms, explore relationships with openCypher queries, and visualize connections that are invisible in tabular form. When you're done, export the results back to S3 or destroy the graph.

What previously required complex ETL pipelines to get data into a graph database is now a streamlined workflow: define your Athena query, give the module the right permissions, and it handles the projection for you.

Supported data sources:

  • Amazon S3 Tables — query Iceberg tables via Athena SQL (demo)
  • Amazon S3 Vectors — project vector embeddings via a custom Athena connector (connector, demo)
  • Databricks Unity Catalog — query Databricks tables via a JDBC-based Athena connector (connector, demo)
  • Snowflake — query tables via the Athena Snowflake connector (demo)
  • Amazon OpenSearch — project embeddings from OpenSearch indices (demo)
  • Any source accessible through Athena federated queries25+ connectors available

For data already in S3-compatible formats (CSV, Parquet), Neptune Analytics also supports native S3 import without Athena.

Use cases demonstrated in the notebooks:

  • Fraud detection — project financial transactions as a graph, run community detection (Louvain) to identify fraud rings (S3 Tables demo, Databricks demo)
  • Product recommendation — project product catalogs with vector embeddings, run similarity search to find related items (S3 Vectors demo, OpenSearch demo)

Vector embeddings are a natural add-on to graph analytics — import them alongside your graph data to combine structural traversal with semantic similarity search.

Session management:

The SessionManager API manages the full lifecycle of a Neptune Analytics graph: create, import, analyze, export, and destroy. See the session manager demo and instance lifecycle demo.

NetworkX Backend

nx-neptune also serves as a NetworkX-compatible backend for Neptune Analytics, enabling you to offload graph algorithm workloads to AWS with no code changes. Use familiar NetworkX APIs to seamlessly scale graph computations on-demand. This combines the simplicity of local development with the performance and scalability of a fully managed AWS graph analytics service. For more on NetworkX backends, see the NetworkX backends documentation.

import networkx as nx

G = nx.Graph()
G.add_edge("Bill", "John")
r = nx.pagerank(G, backend="neptune")

Supported Algorithms

For details of all supported NetworkX algorithms see algorithms.md

Preview Status

This project is in Alpha Preview. We welcome questions, suggestions, and contributions. It is recommended for testing purposes only — we're tracking production readiness on the roadmap.

Installation

Install it from PyPI

pip install nx_neptune

Build and install from package wheel

# Package the project from source:
python -m pip wheel -w dist .

# Install with Jupyter dependencies from wheel: 
pip install "dist/nx_neptune-0.6.0-py3-none-any.whl"

Install from source

git clone git@github.com:awslabs/nx-neptune.git
cd nx-neptune

# install from source directly
make install

Dependencies are pinned in lock files (requirements.txt, requirements-dev.txt, requirements-jupyter.txt) generated by pip-tools. If you update pyproject.toml, regenerate them with:

make lock

Note: make lock must be run with a Python 3.11 interpreter to match CI. If your default Python is a different version, create a 3.11 venv:

python3.11 -m venv .venv-lock
source .venv-lock/bin/activate
pip install pip-tools
make lock

CI only verifies requirements.txt and requirements-dev.txt. The jupyter lock file (requirements-jupyter.txt) is not checked because it may contain platform-specific packages (e.g., appnope on macOS) that differ between local and CI environments.

Prerequisite

Before using this backend, ensure the following prerequisites are met:

AWS IAM Permissions

The IAM role or user accessing Neptune Analytics must have the following permissions:

These permissions are required to read, write, and manage graph data via queries on Neptune Analytics:

  • neptune-graph:ReadDataViaQuery
  • neptune-graph:WriteDataViaQuery
  • neptune-graph:DeleteDataViaQuery

These permissions are required to start/stop a Neptune Analytics graph:

  • neptune-graph:StartGraph
  • neptune-graph:StopGraph

These permissions are required to save/restore a Neptune Analytics snapshot:

  • neptune-graph:CreateGraphSnapshot (for save)
  • neptune-graph:RestoreGraphFromSnapshot (for restore)
  • neptune-graph:DeleteGraphSnapshot (for delete)
  • neptune-graph:TagResource

These permissions are required to import/export between S3 and Neptune Analytics:

  • s3:GetObject (for import)
  • s3:PutObject (for export)
  • s3:ListBucket (for export)
  • s3:DeleteBucket (for delete)
  • kms:Decrypt
  • kms:GenerateDataKey
  • kms:DescribeKey

In Addition to the S3 import/export permissions, to read from/write to an existing S3 Tables datalake:

  • athena:StartQueryExecution
  • athena:GetQueryExecution

The ARN with the above permissions must be added to your environment variables

Python Runtime

  • Python 3.11 is required.
  • Ensure your environment uses Python 3.11 to maintain compatibility with dependencies and API integrations.

Note: As part of the preview status, we are recommending that you run the library using Python 3.11.

Usage

import networkx as nx

G = nx.Graph()
G.add_node("Bill")
G.add_node("John")
G.add_edge("Bill", "John")

r = nx.pagerank(G, backend="neptune")

And run with:

# Set the NETWORKX_GRAPH_ID environment variable
export NETWORKX_GRAPH_ID=your-neptune-analytics-graph-id
python ./nx_example.py

Alternatively, you can pass the NETWORKX_GRAPH_ID directly:

NETWORKX_GRAPH_ID=your-neptune-analytics-graph-id python ./nx_example.py

Without a valid NETWORKX_GRAPH_ID, the examples will fail to connect to your Neptune Analytics instance. Make sure your AWS credentials are properly configured and your IAM role/user has the required permissions (ReadDataViaQuery, WriteDataViaQuery, DeleteDataViaQuery).

Running tests

Unit tests can be run with make, this runs all tests in the test folder:

make test

Integration tests are included in the integ_test folder and run examples against an existing instance of Neptune Analytics, by passing the graph identifier available in the AWS account.

export NETWORKX_GRAPH_ID=g-test12345
make integ-test

You can set BACKEND=False to run the test suite using NetworkX without nx-neptune as the backend.

CloudFormation Deployment

A CloudFormation template is provided to deploy a complete Neptune Analytics + SageMaker notebook environment with a single command. The stack creates a Neptune Analytics graph, a SageMaker notebook instance with nx_neptune pre-installed, an S3 staging bucket with KMS encryption, and all required IAM permissions.

Quick deploy

By default, the stack installs nx_neptune from PyPI:

./cloudformation-templates/deploy.sh                        # defaults: nx-neptune-demo, us-west-1
./cloudformation-templates/deploy.sh my-stack us-east-1     # custom stack name and region

To deploy with a locally built wheel instead, pass true as the third argument:

./cloudformation-templates/deploy.sh nx-neptune-demo us-west-1 true

Teardown

./cloudformation-templates/teardown.sh                      # defaults: nx-neptune-demo, us-west-1
./cloudformation-templates/teardown.sh my-stack us-east-1

For full parameter reference, manual deploy steps, and environment variable details, see cloudformation-templates/README.md.

Jupyter Notebook Integration

For interactive exploration and visualization, you can use the Jupyter notebook integration. To deploy a pre-configured SageMaker notebook environment, see CloudFormation Deployment above.

Notebooks

The notebooks directory contains interactive demonstrations:

Data lake integration:

Session and lifecycle management:

Algorithm demos:

Running locally

A full tutorial is available to run in Neptune Jupyter Notebooks.

To install the required dependencies for the Jupyter notebook (including the Jupyter dependencies):

pip install "nx_neptune[jupyter]"

To run the Jupyter notebooks:

  1. Set your Neptune Analytics Graph ID as an environment variable:

    export NETWORKX_GRAPH_ID=your-neptune-analytics-graph-id
    
  2. You will also need to specify the IAM roles that will execute S3 import or export:

    export NETWORKX_ARN_IAM_ROLE=arn:aws:iam::AWS_ACCOUNT:role/IAM_ROLE_NAME
    export NETWORKX_S3_IMPORT_BUCKET_PATH=s3://S3_BUCKET_PATH
    export NETWORKX_S3_EXPORT_BUCKET_PATH=s3://S3_BUCKET_PATH
    
  3. Launch Jupyter Notebook:

    jupyter notebook notebooks/
    

Security

See CONTRIBUTING for more information.

License

This project is licensed under the Apache-2.0 License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nx_neptune-0.6.0-py3-none-any.whl (90.5 kB view details)

Uploaded Python 3

File details

Details for the file nx_neptune-0.6.0-py3-none-any.whl.

File metadata

  • Download URL: nx_neptune-0.6.0-py3-none-any.whl
  • Upload date:
  • Size: 90.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for nx_neptune-0.6.0-py3-none-any.whl
Algorithm Hash digest
SHA256 91166e8203a93dbef60bdcf5136bf86e9e2f2b13422d0455cdbf9ce7f4e832a2
MD5 aeea16443de0eb623e0bcce94f8b4da6
BLAKE2b-256 60854d536866c0657f01126192c12761515a80eb423dde8e4cb7fffb1f1e9c94

See more details on using hashes here.

Provenance

The following attestation bundles were made for nx_neptune-0.6.0-py3-none-any.whl:

Publisher: python-publish.yml on awslabs/nx-neptune

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page