No project description provided
Project description
🎾 ReadmeDocsFetcher Node for Haystack
This custom component for Haystack is designed to fetch documentation pages from the ReadMe documentation you have access to. It uses a MarkdownConverter
to convert all of your documentation pages to a list of Haystack Documents
. You can use this node as a standalone node or within an indexing pipeline.
Instllation
pip install readmedocs-fetcher-haystack
Usage in Haystack
- To initialize a
ReadmeDocsFetcher
you have to provide anapi_key
paramter. This is your ReadMe Docs API Key. - There are 3 optional parameters to initialize the
ReadmeDocsFetcher
slug
: To fetch a single defined page from your documentation. E.g. if you have want to fetch 'https://docs.haystack.deepset.ai/docs/installation' the slug would beinstallation
. If not set, all of the available pages will be fetched.version
: If not set, the latest stable version of tour docs will be fethed.markdown_converter
: When documents are fetched from ReadMe, temporary.md
files are created and we use aMakrdownConverter
to create a list of haystackDocuments
. If not provided at initialization, the aMarkdownConverter
with the default parameters is used.
Standalone
import os
from dotenv import load_dotenv
from haystack.nodes import MarkdownConverter
load_dotenv()
README_API_KEY = os.getenv('README_API_KEY')
converter = MarkdownConverter(remove_code_snippets=False)
readme_fetcher = ReadmeDocsFetcher(api_key=README_API_KEY, markdown_converter=converter)
readme_fetcher.fetch_docs()
To fetch a single doc from a specific version:
readme_fetcher.fetch_docs(slug="nodes_overview", version="v1.18")
In a Pipeline
import os
from dotenv import load_dotenv
from haystack import Pipeline
from haystack.nodes import MarkdownConverter, PreProcessor
from haystack.document_stores import InMemoryDocumentStore
load_dotenv()
README_API_KEY = os.getenv('README_API_KEY')
converter = MarkdownConverter(remove_code_snippets=False)
readme_fetcher = ReadmeDocsFetcher(api_key=README_API_KEY, markdown_converter=converter)
preprocessor = PreProcessor()
doc_store = InMemoryDocumentStore()
pipe = Pipeline()
pipe.add_node(component=readme_fetcher, name="ReadmeFetcher", inputs=["File"])
pipe.add_node(component=preprocessor, name="Preprocessor", inputs=["ReadmeFetcher"])
pipe.add_node(component=doc_store, name="DocumentStore", inputs=["Preprocessor"])
pipe.run()
To fetch a single documentation page:
pipe.run(params={"ReadmeFetcher":{"slug": "nodes_overview"}})
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Close
Hashes for readmedocs_fetcher_haystack-0.0.1.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 896f85a95c2bdee2fab50f2ed35dba14d450fb0caa4421a7f9152b3316f2a5bc |
|
MD5 | 2d7d266f8990e51bff0886ca21e320ec |
|
BLAKE2b-256 | 4c4e6dde08b45a188bef08e8d4b4f548050c1e7a7e24d67282c33b23e8543d6b |
Close
Hashes for readmedocs_fetcher_haystack-0.0.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 78da57e01b22ace3179a9d324eb30100f9daf47e64686f2a9ba2d04ffe295d4b |
|
MD5 | 9fd779f7fdb4999323bb96f5185c5a23 |
|
BLAKE2b-256 | 8bf2666d1cfef306852c579038a7938feb94ca0bd29a9c9929856a84b6d11b7b |