Skip to main content

Integration of Neo4j graph database with Haystack

Project description

neo4j-haystack

A Haystack Document Store for Neo4j.

ci documentation pypi version python version


Table of Contents

Overview

An integration of Neo4j graph database with Haystack by deepset. In Neo4j Vector search index is being used for storing document embeddings and dense retrials.

The library allows using Neo4j as a DocumentStore, and provides an in-place replacement for any other vector embeddings store. Thus, you should expect any kind of application to be working smoothly just by changing the provider to Neo4jDocumentStore.

The key difference between Neo4jDocumentStore and other types of stores is that Document properties are stored as Graph nodes. Embeddings are stored as properties of a Document node, but indexing and querying of vector embeddings using approximate nearest neighbor search is managed by a dedicated Vector Index.

                                   +-----------------------------+
                                   |       Neo4j Database        |
                                   +-----------------------------+
                                   |                             |
                                   |      +----------------+     |
                                   |      |    Document    |     |
                write_documents    |      +----------------+     |
          +------------------------+----->|   properties   |     |
          |                        |      |                |     |
+---------+----------+             |      |   embedding    |     |
|                    |             |      +--------+-------+     |
| Neo4jDocumentStore |             |               |             |
|                    |             |               |index/query  |
+---------+----------+             |               |             |
          |                        |      +--------+--------+    |
          |                        |      |  Vector Index   |    |
          +----------------------->|      |                 |    |
               query_embeddings    |      | (for embedding) |    |
                                   |      +-----------------+    |
                                   |                             |
                                   +-----------------------------+

In the above diagram:

  • Document is a Neo4j node (with "Document" label)
  • properties are Document attributes stored as part of the node.
  • embedding is also a property of the Document node (just shown separately in the diagram for clarity) which is a vector of type LIST[FLOAT]
  • Vector Index is where embeddings are getting indexed by Neo4j (as soon as those are updated in Document nodes)

The neo4j-haystack library uses python-driver and Cypher Queries to implement DocumentStore related API methods and hide all complexities under the hood.

Installation

neo4j-haystack can be installed as any other Python library, using pip:

pip install --upgrade pip # optional
pip install neo4j-haystack

Usage

Once installed, you can start using Neo4jDocumentStore as any other document stores that support embeddings.

from neo4j_haystack import Neo4jDocumentStore

document_store = Neo4jDocumentStore(
    url="bolt://localhost:7687",
    username="neo4j",
    password="passw0rd",
    database="neo4j",
    embedding_dim=384,
    embedding_field="embedding",
    index="document-embeddings", # The name of the Vector Index in Neo4j
    node_label="Document", # Providing a label to Neo4j nodes which store Documents
)

The full list of parameters accepted by Neo4jDocumentStore can be found in API documentation.

Please notice you will need to have a running instance of Neo4j database (in-memory version of Neo4j is not supported). There are several options available:

  • Docker, other options available in the same Operations Manual
  • AuraDB - a fully managed Cloud Instance of Neo4j
  • Neo4j Desktop client application

The simplest way to start database locally will be with Docker container:

docker run \
    --restart always \
    --publish=7474:7474 --publish=7687:7687 \
    --env NEO4J_AUTH=neo4j/passw0rd \
    neo4j:5.15.0

License

neo4j-haystack is distributed under the terms of the MIT license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

neo4j_haystack-1.0.0.tar.gz (570.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

neo4j_haystack-1.0.0-py3-none-any.whl (32.7 kB view details)

Uploaded Python 3

File details

Details for the file neo4j_haystack-1.0.0.tar.gz.

File metadata

  • Download URL: neo4j_haystack-1.0.0.tar.gz
  • Upload date:
  • Size: 570.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/4.0.2 CPython/3.11.7

File hashes

Hashes for neo4j_haystack-1.0.0.tar.gz
Algorithm Hash digest
SHA256 e9a7088947fc4aff25d6198f95ff6fa150dedf5917f57d71479b31f2992b8dca
MD5 c7aebaf6e8d1ebdb58d602dd2113590c
BLAKE2b-256 210562802407d613bb0e5f47dbd33a3a5388c8aede9482e8a59d933368260560

See more details on using hashes here.

File details

Details for the file neo4j_haystack-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: neo4j_haystack-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 32.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/4.0.2 CPython/3.11.7

File hashes

Hashes for neo4j_haystack-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9e2ad3124978fbf991d00804d6fde624468384829cd0133203f81100ef06c425
MD5 333c902b59eaed3e8166a496f2473d61
BLAKE2b-256 cc04476c2ff289f9a1f8e8dd43c7b318e48146499de342bcf042f3a9af62b602

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page