Skip to main content

Python package to allow easy integration to Neo4j's GraphRAG features

Project description

Neo4j GraphRAG Package for Python

The official Neo4j GraphRAG package for Python enables developers to build graph retrieval augmented generation (GraphRAG) applications using the power of Neo4j and Python. As a first-party library, it offers a robust, feature-rich, and high-performance solution, with the added assurance of long-term support and maintenance directly from Neo4j.

📄 Documentation

Documentation can be found here

Resources

A series of blog posts demonstrating how to use this package:

A list of Neo4j GenAI-related features can also be found at Neo4j GenAI Ecosystem.

🐍 Python Version Support

Version Supported?
3.12
3.11
3.10
3.9
3.8

📦 Installation

To install the latest stable version, run:

pip install neo4j-graphrag

Optional Dependencies

This package has some optional features that can be enabled using the extra dependencies described below:

  • LLM providers (at least one is required for RAG and KG Builder Pipeline):
    • ollama: LLMs from Ollama
    • openai: LLMs from OpenAI (including AzureOpenAI)
    • google: LLMs from Vertex AI
    • cohere: LLMs from Cohere
    • anthropic: LLMs from Anthropic
    • mistralai: LLMs from MistralAI
  • sentence-transformers : to use embeddings from the sentence-transformers Python package
  • Vector database (to use :ref:External Retrievers):
    • weaviate: store vectors in Weaviate
    • pinecone: store vectors in Pinecone
    • qdrant: store vectors in Qdrant
  • experimental: experimental features such as the Knowledge Graph creation pipelines.
    • Warning: this dependency group requires pygraphviz. See below for installation instructions.

Install package with optional dependencies with (for instance):

pip install "neo4j-graphrag[openai]"
# or
pip install "neo4j-graphrag[openai, experimental]"

pygraphviz

pygraphviz is used for visualizing pipelines. Installation instructions can be found here.

💻 Example Usage

The scripts below demonstrate how to get started with the package and make use of its key features. To run these examples, ensure that you have a Neo4j instance up and running and update the NEO4J_URI, NEO4J_USERNAME, and NEO4J_PASSWORD variables in each script with the details of your Neo4j instance. For the examples, make sure to export your OpenAI key as an environment variable named OPENAI_API_KEY. Additional examples are available in the examples folder.

Knowledge Graph Construction

NOTE: The APOC core library must be installed in your Neo4j instance in order to use this feature

This package offers two methods for constructing a knowledge graph.

The Pipeline class provides extensive customization options, making it ideal for advanced use cases. See the examples/pipeline folder for examples of how to use this class.

For a more streamlined approach, the SimpleKGPipeline class offers a simplified abstraction layer over the Pipeline, making it easier to build knowledge graphs. Both classes support working directly with text and PDFs.

import asyncio

from neo4j import GraphDatabase
from neo4j_graphrag.embeddings import OpenAIEmbeddings
from neo4j_graphrag.experimental.pipeline.kg_builder import SimpleKGPipeline
from neo4j_graphrag.llm.openai_llm import OpenAILLM

NEO4J_URI = "neo4j://localhost:7687"
NEO4J_USERNAME = "neo4j"
NEO4J_PASSWORD = "password"

# Connect to the Neo4j database
driver = GraphDatabase.driver(NEO4J_URI, auth=(NEO4J_USERNAME, NEO4J_PASSWORD))

# List the entities and relations the LLM should look for in the text
entities = ["Person", "House", "Planet"]
relations = ["PARENT_OF", "HEIR_OF", "RULES"]
potential_schema = [
    ("Person", "PARENT_OF", "Person"),
    ("Person", "HEIR_OF", "House"),
    ("House", "RULES", "Planet"),
]

# Create an Embedder object
embedder = OpenAIEmbeddings(model="text-embedding-3-large")

# Instantiate the LLM
llm = OpenAILLM(
    model_name="gpt-4o",
    model_params={
        "max_tokens": 2000,
        "response_format": {"type": "json_object"},
        "temperature": 0,
    },
)

# Instantiate the SimpleKGPipeline
kg_builder = SimpleKGPipeline(
    llm=llm,
    driver=driver,
    embedder=embedder,
    entities=entities,
    relations=relations,
    on_error="IGNORE",
    from_pdf=False,
)

# Run the pipeline on a piece of text
text = (
    "The son of Duke Leto Atreides and the Lady Jessica, Paul is the heir of House "
    "Atreides, an aristocratic family that rules the planet Caladan."
)
asyncio.run(kg_builder.run_async(text=text))
driver.close()

Example knowledge graph created using the above script:

Example knowledge graph

Creating a Vector Index

When creating a vector index, make sure you match the number of dimensions in the index with the number of dimensions your embeddings have.

from neo4j import GraphDatabase
from neo4j_graphrag.indexes import create_vector_index

NEO4J_URI = "neo4j://localhost:7687"
NEO4J_USERNAME = "neo4j"
NEO4J_PASSWORD = "password"
INDEX_NAME = "vector-index-name"

# Connect to the Neo4j database
driver = GraphDatabase.driver(NEO4J_URI, auth=(NEO4J_USERNAME, NEO4J_PASSWORD))

# Create the index
create_vector_index(
    driver,
    INDEX_NAME,
    label="Chunk",
    embedding_property="embedding",
    dimensions=3072,
    similarity_fn="euclidean",
)
driver.close()

Populating a Vector Index

This example demonstrates one method for upserting data in your Neo4j database. It's important to note that there are alternative approaches, such as using the Neo4j Python driver.

Ensure that your vector index is created prior to executing this example.

from neo4j import GraphDatabase
from neo4j_graphrag.embeddings import OpenAIEmbeddings
from neo4j_graphrag.indexes import upsert_vector

NEO4J_URI = "neo4j://localhost:7687"
NEO4J_USERNAME = "neo4j"
NEO4J_PASSWORD = "password"

# Connect to the Neo4j database
driver = GraphDatabase.driver(NEO4J_URI, auth=(NEO4J_USERNAME, NEO4J_PASSWORD))

# Create an Embedder object
embedder = OpenAIEmbeddings(model="text-embedding-3-large")

# Generate an embedding for some text
text = (
    "The son of Duke Leto Atreides and the Lady Jessica, Paul is the heir of House "
    "Atreides, an aristocratic family that rules the planet Caladan."
)
vector = embedder.embed_query(text)

# Upsert the vector
upsert_vector(
    driver,
    node_id=0,
    embedding_property="embedding",
    vector=vector,
)
driver.close()

Performing a Similarity Search

Please note that when querying a Neo4j vector index approximate nearest neighbor search is used, which may not always deliver exact results. For more information, refer to the Neo4j documentation on limitations and issues of vector indexes.

In the example below, we perform a simple vector search using a retriever that conducts a similarity search over the vector-index-name vector index.

This library provides more retrievers beyond just the VectorRetriever. See the examples folder for examples of how to use these retrievers.

Before running this example, make sure your vector index has been created and populated.

from neo4j import GraphDatabase
from neo4j_graphrag.embeddings import OpenAIEmbeddings
from neo4j_graphrag.generation import GraphRAG
from neo4j_graphrag.llm import OpenAILLM
from neo4j_graphrag.retrievers import VectorRetriever

NEO4J_URI = "neo4j://localhost:7687"
NEO4J_USERNAME = "neo4j"
NEO4J_PASSWORD = "password"
INDEX_NAME = "vector-index-name"

# Connect to the Neo4j database
driver = GraphDatabase.driver(NEO4J_URI, auth=(NEO4J_USERNAME, NEO4J_PASSWORD))

# Create an Embedder object
embedder = OpenAIEmbeddings(model="text-embedding-3-large")

# Initialize the retriever
retriever = VectorRetriever(driver, INDEX_NAME, embedder)

# Instantiate the LLM
llm = OpenAILLM(model_name="gpt-4o", model_params={"temperature": 0})

# Instantiate the RAG pipeline
rag = GraphRAG(retriever=retriever, llm=llm)

# Query the graph
query_text = "Who is Paul Atreides?"
response = rag.search(query_text=query_text, retriever_config={"top_k": 5})
print(response.answer)
driver.close()

🤝 Contributing

You must sign the contributors license agreement in order to make contributions to this project.

Install Dependencies

Our Python dependencies are managed using Poetry. If Poetry is not yet installed on your system, you can follow the instructions here to set it up. To begin development on this project, start by cloning the repository and then install all necessary dependencies, including the development dependencies, with the following command:

poetry install --with dev

Reporting Issues

If you have a bug to report or feature to request, first search to see if an issue already exists. If a related issue doesn't exist, please raise a new issue using the issue form.

If you're a Neo4j Enterprise customer, you can also reach out to Customer Support.

If you don't have a bug to report or feature request, but you need a hand with the library; community support is available via Neo4j Online Community and/or Discord.

Workflow for Contributions

  1. Fork the repository.
  2. Install Python and Poetry.
  3. Create a working branch from main and start with your changes!

Code Formatting and Linting

Our codebase follows strict formatting and linting standards using Ruff for code quality checks and Mypy for type checking. Before contributing, ensure that all code is properly formatted, free of linting issues, and includes accurate type annotations.

  • To install Ruff, follow the instructions here.
  • To set up Mypy, follow the steps outlined here.

Adherence to these standards is required for contributions to be accepted.

Using Pre-commit

We recommend setting up pre-commit to automate code quality checks. This ensures your changes meet our guidelines before committing.

  1. Install pre-commit by following the installation guide.

  2. Set up the pre-commit hooks by running:

    pre-commit install
    
  3. To manually check if a file meets the quality requirements, run:

    pre-commit run --file path/to/file
    

Pull Requests

When you're finished with your changes, create a pull request (PR) using the following workflow.

  • Ensure you have formatted and linted your code.
  • Ensure that you have signed the CLA.
  • Ensure that the base of your PR is set to main.
  • Don't forget to link your PR to an issue if you are solving one.
  • Check the checkbox to allow maintainer edits so that maintainers can make any necessary tweaks and update your branch for merge.
  • Reviewers may ask for changes to be made before a PR can be merged, either using suggested changes or normal pull request comments. You can apply suggested changes directly through the UI. Any other changes can be made in your fork and committed to the PR branch.
  • As you update your PR and apply changes, mark each conversation as resolved.
  • Update the CHANGELOG.md if you have made significant changes to the project, these include:
    • Major changes:
      • New features
      • Bug fixes with high impact
      • Breaking changes
    • Minor changes:
      • Documentation improvements
      • Code refactoring without functional impact
      • Minor bug fixes
  • Keep CHANGELOG.md changes brief and focus on the most important changes.

Updating the CHANGELOG.md

  1. You can automatically generate a changelog suggestion for your PR by commenting on it using CodiumAI:
@CodiumAI-Agent /update_changelog
  1. Edit the suggestion if necessary and update the appropriate subsection in the CHANGELOG.md file under 'Next'.
  2. Commit the changes.

🧪 Tests

Unit Tests

Install the project dependencies then run the following command to run the unit tests locally:

poetry run pytest tests/unit

E2E tests

To execute end-to-end (e2e) tests, you need the following services to be running locally:

  • neo4j
  • weaviate
  • weaviate-text2vec-transformers

The simplest way to set these up is by using Docker Compose:

docker compose -f tests/e2e/docker-compose.yml up

(tip: If you encounter any caching issues within the databases, you can completely remove them by running docker compose -f tests/e2e/docker-compose.yml down)

Once all the services are running, execute the following command to run the e2e tests:

poetry run pytest tests/e2e

ℹ️ Additional Information

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

neo4j_graphrag-1.3.0.tar.gz (79.3 kB view details)

Uploaded Source

Built Distribution

neo4j_graphrag-1.3.0-py3-none-any.whl (142.2 kB view details)

Uploaded Python 3

File details

Details for the file neo4j_graphrag-1.3.0.tar.gz.

File metadata

  • Download URL: neo4j_graphrag-1.3.0.tar.gz
  • Upload date:
  • Size: 79.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.0.1 CPython/3.12.8

File hashes

Hashes for neo4j_graphrag-1.3.0.tar.gz
Algorithm Hash digest
SHA256 4e454efbce7a95e22c55f85bc51242a7f8894e3d65d06476390c0e7d04b4c63f
MD5 a399d40d6df8e315cabf08b1ae5f8121
BLAKE2b-256 6c7e9779af474b409e16d1694d8bcd8a6e1baa0f4e28eb6fed6d9b6b95f60218

See more details on using hashes here.

Provenance

The following attestation bundles were made for neo4j_graphrag-1.3.0.tar.gz:

Publisher: publish.yaml on neo4j/neo4j-graphrag-python

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file neo4j_graphrag-1.3.0-py3-none-any.whl.

File metadata

File hashes

Hashes for neo4j_graphrag-1.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 39158ca75b93bf9a7a4892f4abc681482fd660c6869ebe2de82b64b10036a91b
MD5 6e3bbbe5d0cf69e617bc2ce38a403488
BLAKE2b-256 8bed5324d551355301b8a1b8eba344f57486c69a60322beea62419b9ba9e312d

See more details on using hashes here.

Provenance

The following attestation bundles were made for neo4j_graphrag-1.3.0-py3-none-any.whl:

Publisher: publish.yaml on neo4j/neo4j-graphrag-python

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page