Skip to main content

DataStax RAGStack Knowledge Store

Project description

RAGStack Knowledge Store

Hybrid Knowledge Store combining vector similarity and edges between chunks.

Usage

  1. Pre-process your documents to populate metadata information.
  2. Create a Hybrid KnowledgeStore and add your LangChain Documents.
  3. Retrieve documents from the KnowledgeStore.

Populate Metadata

The Knowledge Store makes use of the following metadata fields on each Document:

  • content_id: If assigned, this specifies the unique ID of the Document. If not assigned, one will be generated. This should be set if you may re-ingest the same document so that it is overwritten rather than being duplicated.
  • link_tags: A set of LinkTags indicating how this node should be linked to other nodes.

Hyperlinks

To connect nodes based on hyperlinks, you can use the HtmlLinkEdgeExtractor as shown below:

from ragstack_knowledge_store.langchain.extractors import HtmlLinkEdgeExtractor

html_link_extractor = HtmlLinkEdgeExtractor()

for doc in documents:
    doc.metadata["content_id"] = doc.metadata["source"]

    # Add link tags from the page_content to the metadata.
    # Should be passed the HTML content as a string or BeautifulSoup.
    html_link_extractor.extract_one(doc, doc.page_content)

Store

import cassio
from langchain_openai import OpenAIEmbeddings
from ragstack_knowledge_store import KnowledgeStore

cassio.init(auto=True)

knowledge_store = KnowledgeStore(embeddings=OpenAIEmbeddings())

# Store the documents
knowledge_store.add_documents(documents)

Retrieve

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-4o")

# Retrieve and generate using the relevant snippets of the blog.
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate

# Depth 0 - don't traverse edges. equivalent to vector-only.
# Depth 1 - vector search plus 1 level of edges
retriever = knowledge_store.as_retriever(k=4, depth=1)

template = """You are a helpful technical support bot. You should provide complete answers explaining the options the user has available to address their problem. Answer the question based only on the following context:
{context}

Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)

def format_docs(docs):
    formatted = "\n\n".join(f"From {doc.metadata['content_id']}: {doc.page_content}" for doc in docs)
    return formatted


rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

Development

poetry install --with=dev

# Run Tests
poetry run pytest

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ragstack_ai_knowledge_store-0.0.4.tar.gz (17.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ragstack_ai_knowledge_store-0.0.4-py3-none-any.whl (21.0 kB view details)

Uploaded Python 3

File details

Details for the file ragstack_ai_knowledge_store-0.0.4.tar.gz.

File metadata

File hashes

Hashes for ragstack_ai_knowledge_store-0.0.4.tar.gz
Algorithm Hash digest
SHA256 8be1221ce6304c84ac984f95bb1902802766ec449ebc04417303014f3f883bd3
MD5 7cd6ad32f0da01b2ca49a7ea706242f8
BLAKE2b-256 95d8a65a9db859b3bffbf1c9b75004ad3e99623228748ef37d806b9166d13264

See more details on using hashes here.

File details

Details for the file ragstack_ai_knowledge_store-0.0.4-py3-none-any.whl.

File metadata

File hashes

Hashes for ragstack_ai_knowledge_store-0.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 88b8f196d6a58c66d4b49c93a523fd252102023d26a063d0f8cfb5911663e0da
MD5 31ae0a4e7e7fb3ca8560db9760b4ae0b
BLAKE2b-256 13961105631a0f506e8ecc0ecd7433f8055601087777b361e38920b4cd08a182

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page