Skip to main content

Python package to allow easy integration to Neo4j's GraphRAG features

Project description

Neo4j GraphRAG package for Python

This repository contains the official Neo4j GraphRAG features for Python.

The purpose of this package is to provide a first party package to developers, where Neo4j can guarantee long term commitment and maintenance as well as being fast to ship new features and high performing patterns and methods.

Documentation: https://neo4j.com/docs/neo4j-graphrag-python/

Python versions supported:

  • Python 3.12 supported.
  • Python 3.11 supported.
  • Python 3.10 supported.
  • Python 3.9 supported.

Usage

Installation

This package requires Python (>=3.9).

To install the latest stable version, use:

pip install neo4j-graphrag

Optional dependencies

pygraphviz

pygraphviz is used for visualizing pipelines. Follow installation instructions here.

Examples

Knowledge graph construction

NOTE: The APOC core library must be installed in your Neo4j instance in order to use this feature

Assumption: Neo4j running

from neo4j import GraphDatabase
from neo4j_graphrag.embeddings import OpenAIEmbeddings
from neo4j_graphrag.experimental.pipeline.kg_builder import SimpleKGPipeline
from neo4j_graphrag.llm.openai_llm import OpenAILLM

# Connect to Neo4j database
URI = "neo4j://localhost:7687"
AUTH = ("neo4j", "password")
driver = GraphDatabase.driver(URI, auth=AUTH)

# Instantiate Entity and Relation objects
entities = ["Person", "House", "Planet"]
relations = ["PARENT_OF", "HEIR_OF", "RULES"]
potential_schema = [
    ("Person", "PARENT_OF", "Person"),
    ("Person", "HEIR_OF", "House"),
    ("House", "RULES", "Planet")
]

# Instantiate an Embedder object
embedder = OpenAIEmbeddings(model="text-embedding-3-large")

# Instantiate the LLM
llm = OpenAILLM(
    model_name="gpt-4o",
    model_params={
        "max_tokens": 2000,
        "response_format": {"type": "json_object"},
        "temperature": 0,
    },
)

# Instantiate the SimpleKGPipeline
kg_builder = SimpleKGPipeline(
    llm=llm,
    driver=driver,
    embedder=embedder,
    entities=entities,
    relations=relations,
    on_error="CONTINUE",
    from_pdf=False,
)

await kg_builder.run_async(
    text=""""The son of Duke Leto Atreides and the Lady Jessica, Paul is the heir of
        House Atreides, an aristocratic family that rules the planet Caladan."""
)

Example knowledge graph created using the above code:

Example knowledge graph

Creating a vector index

When creating a vector index, make sure you match the number of dimensions in the index with the number of dimensions the embeddings have.

Assumption: Neo4j running

from neo4j import GraphDatabase
from neo4j_graphrag.indexes import create_vector_index

URI = "neo4j://localhost:7687"
AUTH = ("neo4j", "password")

INDEX_NAME = "vector-index-name"

# Connect to Neo4j database
driver = GraphDatabase.driver(URI, auth=AUTH)

# Creating the index
create_vector_index(
    driver,
    INDEX_NAME,
    label="Document",
    embedding_property="vectorProperty",
    dimensions=1536,
    similarity_fn="euclidean",
)

Populating the Neo4j Vector Index

Note that the below example is not the only way you can upsert data into your Neo4j database. For example, you could also leverage the Neo4j Python driver.

Assumption: Neo4j running with a defined vector index

from neo4j import GraphDatabase
from neo4j_graphrag.indexes import upsert_vector

URI = "neo4j://localhost:7687"
AUTH = ("neo4j", "password")

# Connect to Neo4j database
driver = GraphDatabase.driver(URI, auth=AUTH)

# Upsert the vector
vector = ...
upsert_vector(
    driver,
    node_id=1,
    embedding_property="vectorProperty",
    vector=vector,
)

Performing a similarity search

Assumption: Neo4j running with populated vector index in place.

Limitation: The query over the vector index is an approximate nearest neighbor search and may not give exact results. See this reference for more details.

While the library has more retrievers than shown here, the following examples should be able to get you started.

In the following example, we use a simple vector search as retriever, that will perform a similarity search over the index-name vector index in Neo4j.

from neo4j import GraphDatabase
from neo4j_graphrag.retrievers import VectorRetriever
from neo4j_graphrag.llm import OpenAILLM
from neo4j_graphrag.generation import GraphRAG
from neo4j_graphrag.embeddings import OpenAIEmbeddings

URI = "neo4j://localhost:7687"
AUTH = ("neo4j", "password")

INDEX_NAME = "vector-index-name"

# Connect to Neo4j database
driver = GraphDatabase.driver(URI, auth=AUTH)

# Create Embedder object
embedder = OpenAIEmbeddings(model="text-embedding-3-large")

# Initialize the retriever
retriever = VectorRetriever(driver, INDEX_NAME, embedder)

# Initialize the LLM
# Note: An OPENAI_API_KEY environment variable is required here
llm = OpenAILLM(model_name="gpt-4o", model_params={"temperature": 0})

# Initialize the RAG pipeline
rag = GraphRAG(retriever=retriever, llm=llm)

# Query the graph
query_text = "How do I do similarity search in Neo4j?"
response = rag.search(query_text=query_text, retriever_config={"top_k": 5})
print(response.answer)

Development

Install dependencies

poetry install

Getting started

Issues

If you have a bug to report or feature to request, first search to see if an issue already exists. If a related issue doesn't exist, please raise a new issue using the relevant issue form.

If you're a Neo4j Enterprise customer, you can also reach out to Customer Support.

If you don't have a bug to report or feature request, but you need a hand with the library; community support is available via Neo4j Online Community and/or Discord.

Make changes

  1. Fork the repository.
  2. Install Python and Poetry.
  3. Create a working branch from main and start with your changes!

Pull request

When you're finished with your changes, create a pull request, also known as a PR.

  • Ensure that you have signed the CLA.
  • Ensure that the base of your PR is set to main.
  • Don't forget to link your PR to an issue if you are solving one.
  • Enable the checkbox to allow maintainer edits so that maintainers can make any necessary tweaks and update your branch for merge.
  • Reviewers may ask for changes to be made before a PR can be merged, either using suggested changes or normal pull request comments. You can apply suggested changes directly through the UI, and any other changes can be made in your fork and committed to the PR branch.
  • As you update your PR and apply changes, mark each conversation as resolved.
  • Update the CHANGELOG.md if you have made significant changes to the project, these include:
    • Major changes:
      • New features
      • Bug fixes with high impact
      • Breaking changes
    • Minor changes:
      • Documentation improvements
      • Code refactoring without functional impact
      • Minor bug fixes
  • Keep CHANGELOG.md changes brief and focus on the most important changes.

Updating the CHANGELOG.md

  1. When opening a PR, you can generate an edit suggestion by commenting on the GitHub PR using CodiumAI:
@CodiumAI-Agent /update_changelog
  1. Use this as a suggestion and update the CHANGELOG.md content under 'Next'.
  2. Commit the changes.

Run tests

Unit tests

This should run out of the box once the dependencies are installed.

poetry run pytest tests/unit

E2E tests

To run e2e tests you'd need to have some services running locally:

  • neo4j
  • weaviate
  • weaviate-text2vec-transformers

The easiest way to get it up and running is via Docker compose:

docker compose -f tests/e2e/docker-compose.yml up

(pro tip: if you suspect something in the databases are cached, run docker compose -f tests/e2e/docker-compose.yml down to remove them completely)

Once the services are running, execute the following command to run the e2e tests.

poetry run pytest tests/e2e

Further information

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

neo4j_graphrag-1.0.0.tar.gz (63.3 kB view details)

Uploaded Source

Built Distribution

neo4j_graphrag-1.0.0-py3-none-any.whl (111.3 kB view details)

Uploaded Python 3

File details

Details for the file neo4j_graphrag-1.0.0.tar.gz.

File metadata

  • Download URL: neo4j_graphrag-1.0.0.tar.gz
  • Upload date:
  • Size: 63.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for neo4j_graphrag-1.0.0.tar.gz
Algorithm Hash digest
SHA256 35b8b46691456ca082738d58bdbe2a64b76b185cecf4a25b84de5d75c89175de
MD5 b6b6211f2677563f43e40d19f9f6dbaf
BLAKE2b-256 c8ccabfc916b1dde3c6b820f5625d05c408e15bcadffd5e5ba46c71693f0db78

See more details on using hashes here.

File details

Details for the file neo4j_graphrag-1.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for neo4j_graphrag-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 25bb39a0dd879686080a2340ba1b05a8d4a91b9ddbf6ab91ac79828dcd4910a5
MD5 d7e16d32705ef2c5e214eac07d9e3a83
BLAKE2b-256 3a76e8fa448c56841c31f45a8528f903384ddc1af4d0292e005c6b93044b0849

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page