Skip to main content

This component filters documents based on a threshold percentage, ensuring only the documents above the threshold get passed down the pipeline

Project description

haystack_threshold_node

This component filters documents based on a threshold percentage, ensuring only the documents above the threshold get passed down the pipeline. This allows you to query your document store for a larger top_k, but then filter the results down to those which are above a set confidence score.

Installation

pip install haystack-threshold-node

Usage

Include it in your pipeline - example as follows:

import logging
import re

from datasets import load_dataset
from haystack.document_stores import InMemoryDocumentStore
from haystack.nodes import PromptNode, PromptTemplate, AnswerParser, BM25Retriever
from haystack.pipelines import Pipeline
from haystack_lemmatize_node import LemmatizeDocuments


logging.basicConfig(format="%(levelname)s - %(name)s -  %(message)s", level=logging.WARNING)
logging.getLogger("haystack").setLevel(logging.INFO)

document_store = InMemoryDocumentStore(use_bm25=True)

dataset = load_dataset("bilgeyucel/seven-wonders", split="train")
document_store.write_documents(dataset)

retriever = BM25Retriever(document_store=document_store, top_k=10)

lfqa_prompt = PromptTemplate(
    name="lfqa",
    prompt_text="Given the context please answer the question using your own words. Generate a comprehensive, summarized answer. If the information is not included in the provided context, reply with 'Provided documents didn't contain the necessary information to provide the answer'\n\nContext: {documents}\n\nQuestion: {query} \n\nAnswer:",
    output_parser=AnswerParser(),
)

prompt_node = PromptNode(
    model_name_or_path="text-davinci-003",
    default_prompt_template=lfqa_prompt,
    max_length=500,
    api_key="sk-OPENAIKEY",
)

# The value you pass for threshold is the lowest % score you will accept. Whole numbers only.
# In this example, the threshold is set to 80%.
threshold = DocumentThreshold(threshold=80) 

pipe = Pipeline()
pipe.add_node(component=retriever, name="Retriever", inputs=["Query"])
pipe.add_node(component=threshold, name="Threshold", inputs=["Retriever"])
pipe.add_node(component=prompt_node, name="prompt_node", inputs=["Threshold"])

query = "What does the Rhodes Statue look like?"
  
output = pipe.run(query)

print(output['answers'][0].answer)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

haystack_threshold_node-0.0.1.tar.gz (2.8 kB view details)

Uploaded Source

Built Distribution

haystack_threshold_node-0.0.1-py3-none-any.whl (3.6 kB view details)

Uploaded Python 3

File details

Details for the file haystack_threshold_node-0.0.1.tar.gz.

File metadata

File hashes

Hashes for haystack_threshold_node-0.0.1.tar.gz
Algorithm Hash digest
SHA256 39d14457171184e15ffbb8f49a8c03783794e7627d34151b47ac0daf5b9ade16
MD5 62501c52ea5143e2fcbbe2da0c9f5221
BLAKE2b-256 271a20b5c579e7684ade0ae9956f4b64af594ee2505e408530c271ee5f12165f

See more details on using hashes here.

File details

Details for the file haystack_threshold_node-0.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for haystack_threshold_node-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 74fe1cac6b771a466ac31a2f0a68917a5f68e90155bc1d934f9ef54fdba1a021
MD5 e4beb2c12d1357bcd3a8bcdf2c68b775
BLAKE2b-256 0329316905d81030f5fc926f65f8f5fd6f32555232bfebe1e4c981ed87b12fbe

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page