This component filters documents based on a threshold percentage, ensuring only the documents above the threshold get passed down the pipeline
Project description
haystack_threshold_node
This component filters documents based on a threshold percentage, ensuring only the documents above the threshold get passed down the pipeline. This allows you to query your document store for a larger top_k, but then filter the results down to those which are above a set confidence score.
Installation
pip install haystack-threshold-node
Usage
Include it in your pipeline - example as follows:
import logging
import re
from datasets import load_dataset
from haystack.document_stores import InMemoryDocumentStore
from haystack.nodes import PromptNode, PromptTemplate, AnswerParser, BM25Retriever
from haystack.pipelines import Pipeline
from haystack_lemmatize_node import LemmatizeDocuments
logging.basicConfig(format="%(levelname)s - %(name)s - %(message)s", level=logging.WARNING)
logging.getLogger("haystack").setLevel(logging.INFO)
document_store = InMemoryDocumentStore(use_bm25=True)
dataset = load_dataset("bilgeyucel/seven-wonders", split="train")
document_store.write_documents(dataset)
retriever = BM25Retriever(document_store=document_store, top_k=10)
lfqa_prompt = PromptTemplate(
name="lfqa",
prompt_text="Given the context please answer the question using your own words. Generate a comprehensive, summarized answer. If the information is not included in the provided context, reply with 'Provided documents didn't contain the necessary information to provide the answer'\n\nContext: {documents}\n\nQuestion: {query} \n\nAnswer:",
output_parser=AnswerParser(),
)
prompt_node = PromptNode(
model_name_or_path="text-davinci-003",
default_prompt_template=lfqa_prompt,
max_length=500,
api_key="sk-OPENAIKEY",
)
# The value you pass for threshold is the lowest % score you will accept. Whole numbers only.
# In this example, the threshold is set to 80%.
threshold = DocumentThreshold(threshold=80)
pipe = Pipeline()
pipe.add_node(component=retriever, name="Retriever", inputs=["Query"])
pipe.add_node(component=threshold, name="Threshold", inputs=["Retriever"])
pipe.add_node(component=prompt_node, name="prompt_node", inputs=["Threshold"])
query = "What does the Rhodes Statue look like?"
output = pipe.run(query)
print(output['answers'][0].answer)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file haystack_threshold_node-0.0.1.tar.gz
.
File metadata
- Download URL: haystack_threshold_node-0.0.1.tar.gz
- Upload date:
- Size: 2.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 39d14457171184e15ffbb8f49a8c03783794e7627d34151b47ac0daf5b9ade16 |
|
MD5 | 62501c52ea5143e2fcbbe2da0c9f5221 |
|
BLAKE2b-256 | 271a20b5c579e7684ade0ae9956f4b64af594ee2505e408530c271ee5f12165f |
File details
Details for the file haystack_threshold_node-0.0.1-py3-none-any.whl
.
File metadata
- Download URL: haystack_threshold_node-0.0.1-py3-none-any.whl
- Upload date:
- Size: 3.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 74fe1cac6b771a466ac31a2f0a68917a5f68e90155bc1d934f9ef54fdba1a021 |
|
MD5 | e4beb2c12d1357bcd3a8bcdf2c68b775 |
|
BLAKE2b-256 | 0329316905d81030f5fc926f65f8f5fd6f32555232bfebe1e4c981ed87b12fbe |