DataStax RAGStack Graph Store
Project description
RAGStack Graph Store
Hybrid Graph Store combining vector similarity and edges between chunks.
Usage
- Pre-process your documents to populate
metadata
information. - Create a Hybrid
GraphStore
and add your LangChainDocument
s. - Retrieve documents from the
GraphStore
.
Populate Metadata
The Graph Store makes use of the following metadata fields on each Document
:
content_id
: If assigned, this specifies the unique ID of theDocument
. If not assigned, one will be generated. This should be set if you may re-ingest the same document so that it is overwritten rather than being duplicated.link_tags
: A set ofLinkTag
s indicating how this node should be linked to other nodes.
Hyperlinks
To connect nodes based on hyperlinks, you can use the HtmlLinkEdgeExtractor
as shown below:
from ragstack_knowledge_store.langchain.extractors import HtmlLinkEdgeExtractor
html_link_extractor = HtmlLinkEdgeExtractor()
for doc in documents:
doc.metadata["content_id"] = doc.metadata["source"]
# Add link tags from the page_content to the metadata.
# Should be passed the HTML content as a string or BeautifulSoup.
html_link_extractor.extract_one(doc, doc.page_content)
Store
import cassio
from langchain_openai import OpenAIEmbeddings
from ragstack_knowledge_store import GraphStore
cassio.init(auto=True)
graph_store = GraphStore(embeddings=OpenAIEmbeddings())
# Store the documents
graph_store.add_documents(documents)
Retrieve
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model="gpt-4o")
# Retrieve and generate using the relevant snippets of the blog.
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
# Depth 0 - don't traverse edges. equivalent to vector-only.
# Depth 1 - vector search plus 1 level of edges
retriever = graph_store.as_retriever(k=4, depth=1)
template = """You are a helpful technical support bot. You should provide complete answers explaining the options the user has available to address their problem. Answer the question based only on the following context:
{context}
Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)
def format_docs(docs):
formatted = "\n\n".join(f"From {doc.metadata['content_id']}: {doc.page_content}" for doc in docs)
return formatted
rag_chain = (
{"context": retriever | format_docs, "question": RunnablePassthrough()}
| prompt
| llm
| StrOutputParser()
)
Development
poetry install --with=dev
# Run Tests
poetry run pytest
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Close
Hashes for ragstack_ai_knowledge_store-0.0.5.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2b96196c57ac5c3e7d19a88b3db051b07a68a21bbd3bd63aa31263b5c56b6f40 |
|
MD5 | 0c3d69d13128888a344d5664ef493e00 |
|
BLAKE2b-256 | 2719afab02250be2ec2262df954a5f99dc24ef0a95666e76bffea5dc3f1860e4 |
Close
Hashes for ragstack_ai_knowledge_store-0.0.5-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c8912fbaaa904341f252e868ec75ee6db8b12d5e24c9ad2b383fd9d7bbf84fe5 |
|
MD5 | c1ff82cf65bd116b1c6446a4c8cf0f8d |
|
BLAKE2b-256 | ed25a1e91b1d87f832760b30bf22976260ac66ef27c130c95269e764c7a92737 |