Integration of Neo4j graph database with Haystack
Project description
neo4j-haystack
A Haystack Document Store for Neo4j.
Table of Contents
Overview
An integration of Neo4j graph database with Haystack by deepset. In Neo4j Vector search index is being used for storing document embeddings and dense retrials.
The library allows using Neo4j as a DocumentStore, and provides an in-place replacement
for any other vector embeddings store. Thus, you should expect any kind of application to be working
smoothly just by changing the provider to Neo4jDocumentStore
.
The key difference between Neo4jDocumentStore
and other types of stores is that Document properties are stored as Graph nodes. Embeddings are stored as properties of a Document node,
but indexing and querying of vector embeddings using approximate nearest neighbor search is managed by a dedicated Vector Index.
+-----------------------------+
| Neo4j Database |
+-----------------------------+
| |
| +----------------+ |
| | Document | |
write_documents | +----------------+ |
+------------------------+----->| properties | |
| | | | |
+---------+----------+ | | embedding | |
| | | +--------+-------+ |
| Neo4jDocumentStore | | | |
| | | |index/query |
+---------+----------+ | | |
| | +--------+--------+ |
| | | Vector Index | |
+----------------------->| | | |
query_embeddings | | (for embedding) | |
| +-----------------+ |
| |
+-----------------------------+
In the above diagram:
Document
is a Neo4j node (with "Document" label)properties
are Document attributes stored as part of the node.embedding
is also a property of the Document node (just shown separately in the diagram for clarity) which is a vector of typeLIST[FLOAT]
Vector Index
is where embeddings are getting indexed by Neo4j (as soon as those are updated in Document nodes)
The neo4j-haystack
library uses python-driver and
Cypher Queries to implement DocumentStore related API methods and hide all complexities under the hood.
Installation
neo4j-haystack
can be installed as any other Python library, using pip:
pip install --upgrade pip # optional
pip install neo4j-haystack
Usage
Once installed, you can start using Neo4jDocumentStore
as any other document stores that support embeddings.
from neo4j_haystack import Neo4jDocumentStore
document_store = Neo4jDocumentStore(
url="bolt://localhost:7687",
username="neo4j",
password="passw0rd",
database="neo4j",
embedding_dim=384,
embedding_field="embedding",
index="document-embeddings", # The name of the Vector Index in Neo4j
node_label="Document", # Providing a label to Neo4j nodes which store Documents
)
The full list of parameters accepted by Neo4jDocumentStore
can be found in
API documentation.
Please notice you will need to have a running instance of Neo4j database (in-memory version of Neo4j is not supported). There are several options available:
- Docker, other options available in the same Operations Manual
- AuraDB - a fully managed Cloud Instance of Neo4j
- Neo4j Desktop client application
The simplest way to start database locally will be with Docker container:
docker run \
--restart always \
--publish=7474:7474 --publish=7687:7687 \
--env NEO4J_AUTH=neo4j/passw0rd \
neo4j:5.15.0
License
neo4j-haystack
is distributed under the terms of the MIT license.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for neo4j_haystack-1.0.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9e2ad3124978fbf991d00804d6fde624468384829cd0133203f81100ef06c425 |
|
MD5 | 333c902b59eaed3e8166a496f2473d61 |
|
BLAKE2b-256 | cc04476c2ff289f9a1f8e8dd43c7b318e48146499de342bcf042f3a9af62b602 |