No project description provided
Project description
🎾 ReadmeDocsFetcher Node for Haystack
This custom component for Haystack is designed to fetch documentation pages from the ReadMe documentation you have access to. It uses a MarkdownConverter
to convert all of your documentation pages to a list of Haystack Documents
. You can use this node as a standalone node or within an indexing pipeline.
Instllation
pip install readmedocs-fetcher-haystack
Usage in Haystack
- To initialize a
ReadmeDocsFetcher
you have to provide anapi_key
paramter. This is your ReadMe Docs API Key. - There are also 4 optional parameters to initialize the
ReadmeDocsFetcher
slugs
: To fetch a list of specific pages from your documentation. E.g. if you have want to fetch 'https://docs.haystack.deepset.ai/docs/installation' the slug would beinstallation
. If not set, all of the available pages will be fetched.base_url
: Optionally provide this to add the full url of a documentation page to themeta
of the created document. For examplebase_url='https://docs.haystack.deepset.ai'"
version
: If not set, the latest stable version of tour docs will be fethed.markdown_converter
: When documents are fetched from ReadMe, temporary.md
files are created and we use aMakrdownConverter
to create a list of haystackDocuments
. If not provided at initialization, the aMarkdownConverter
with the default parameters is used.
Standalone
import os
from dotenv import load_dotenv
from haystack.nodes import MarkdownConverter
from readmedocs_fetcher_haystack import ReadmeDocsFetcher
load_dotenv()
README_API_KEY = os.getenv('README_API_KEY')
converter = MarkdownConverter(remove_code_snippets=False)
readme_fetcher = ReadmeDocsFetcher(api_key=README_API_KEY, markdown_converter=converter, base_url="https://docs.haystack.deepset.ai")
readme_fetcher.fetch_docs()
To fetch a single doc from a specific version:
readme_fetcher.fetch_docs(slugs=["nodes_overview"], version="v1.18")
In a Pipeline
import os
from dotenv import load_dotenv
from haystack import Pipeline
from haystack.nodes import MarkdownConverter, PreProcessor
from haystack.document_stores import InMemoryDocumentStore
from readmedocs_fetcher_haystack import ReadmeDocsFetcher
load_dotenv()
README_API_KEY = os.getenv('README_API_KEY')
converter = MarkdownConverter(remove_code_snippets=False)
readme_fetcher = ReadmeDocsFetcher(api_key=README_API_KEY, markdown_converter=converter, base_url="https://docs.haystack.deepset.ai")
preprocessor = PreProcessor()
doc_store = InMemoryDocumentStore()
pipe = Pipeline()
pipe.add_node(component=readme_fetcher, name="ReadmeFetcher", inputs=["File"])
pipe.add_node(component=preprocessor, name="Preprocessor", inputs=["ReadmeFetcher"])
pipe.add_node(component=doc_store, name="DocumentStore", inputs=["Preprocessor"])
pipe.run()
To fetch a single documentation page:
pipe.run(params={"ReadmeFetcher":{"slugs": ["nodes_overview"]}})
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file readmedocs_fetcher_haystack-0.0.2.tar.gz
.
File metadata
- Download URL: readmedocs_fetcher_haystack-0.0.2.tar.gz
- Upload date:
- Size: 5.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: python-httpx/0.24.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5c1ba4c4815982c2f5de71dca5a9847991423e839f1a4ea25345998b649576f7 |
|
MD5 | f37746a867c571245df6c264e66e08d9 |
|
BLAKE2b-256 | 0e02edcc8d61650e4ab8f44cb3b26baf48265fbddf928318f7808a4bee7ea131 |
File details
Details for the file readmedocs_fetcher_haystack-0.0.2-py3-none-any.whl
.
File metadata
- Download URL: readmedocs_fetcher_haystack-0.0.2-py3-none-any.whl
- Upload date:
- Size: 5.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: python-httpx/0.24.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ae4a6bf3c133d6b5faccec602b35864ece0e52f434740aab78b020210b366dce |
|
MD5 | 21cceb269c13c30c5367a98872575429 |
|
BLAKE2b-256 | 03fdb98d79bec08bdd7f469feb0c7b87f1b7153ecd89998f9898b9984b648d4e |