Skip to main content

Integration of Neo4j graph database with Haystack

Project description

neo4j-haystack

A Haystack Document Store for Neo4j.

ci documentation pypi version python version


Table of Contents

Overview

An integration of Neo4j graph database with Haystack by deepset. In Neo4j Vector search index is being used for storing document embeddings and dense retrials.

The library allows using Neo4j as a DocumentStore, and provides an in-place replacement for any other vector embeddings store. Thus, you should expect any kind of application to be working smoothly just by changing the provider to Neo4jDocumentStore.

The key difference between Neo4jDocumentStore and other types of stores is that Document properties are stored as Graph nodes. Embeddings are stored as properties of a Document node, but indexing and querying of vector embeddings using approximate nearest neighbor search is managed by a dedicated Vector Index.

                                   +-----------------------------+
                                   |       Neo4j Database        |
                                   +-----------------------------+
                                   |                             |
                                   |      +----------------+     |
                                   |      |    Document    |     |
                write_documents    |      +----------------+     |
          +------------------------+----->|   properties   |     |
          |                        |      |                |     |
+---------+----------+             |      |   embedding    |     |
|                    |             |      +--------+-------+     |
| Neo4jDocumentStore |             |               |             |
|                    |             |               |index/query  |
+---------+----------+             |               |             |
          |                        |      +--------+--------+    |
          |                        |      |  Vector Index   |    |
          +----------------------->|      |                 |    |
               query_embeddings    |      | (for embedding) |    |
                                   |      +-----------------+    |
                                   |                             |
                                   +-----------------------------+

In the above diagram:

  • Document is a Neo4j node (with "Document" label)
  • properties are Document attributes stored as part of the node.
  • embedding is also a property of the Document node (just shown separately in the diagram for clarity) which is a vector of type LIST[FLOAT]
  • Vector Index is where embeddings are getting indexed by Neo4j (as soon as those are updated in Document nodes)

The neo4j-haystack library uses python-driver and Cypher Queries to implement DocumentStore related API methods and hide all complexities under the hood.

Installation

neo4j-haystack can be installed as any other Python library, using pip:

pip install --upgrade pip # optional
pip install neo4j-haystack

Usage

Once installed, you can start using Neo4jDocumentStore as any other document stores that support embeddings.

from neo4j_haystack import Neo4jDocumentStore

document_store = Neo4jDocumentStore(
    url="bolt://localhost:7687",
    username="neo4j",
    password="passw0rd",
    database="neo4j",
    embedding_dim=384,
    embedding_field="embedding",
    index="document-embeddings", # The name of the Vector Index in Neo4j
    node_label="Document", # Providing a label to Neo4j nodes which store Documents
)

The full list of parameters accepted by Neo4jDocumentStore can be found in API documentation.

Please notice you will need to have a running instance of Neo4j database (in-memory version of Neo4j is not supported). There are several options available:

  • Docker, other options available in the same Operations Manual
  • AuraDB - a fully managed Cloud Instance of Neo4j
  • Neo4j Desktop client application

The simplest way to start database locally will be with Docker container:

docker run \
    --restart always \
    --publish=7474:7474 --publish=7687:7687 \
    --env NEO4J_AUTH=neo4j/passw0rd \
    neo4j:5.15.0

License

neo4j-haystack is distributed under the terms of the MIT license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

neo4j_haystack-1.0.0.tar.gz (570.8 kB view hashes)

Uploaded Source

Built Distribution

neo4j_haystack-1.0.0-py3-none-any.whl (32.7 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page