Skip to main content

No project description provided

Project description

PyPI

🎾 ReadmeDocsFetcher Node for Haystack

This custom component for Haystack is designed to fetch documentation pages from the ReadMe documentation you have access to. It uses a MarkdownConverter to convert all of your documentation pages to a list of Haystack Documents. You can use this node as a standalone node or within an indexing pipeline.

Instllation

pip install readmedocs-fetcher-haystack

Usage in Haystack

  1. To initialize a ReadmeDocsFetcher you have to provide an api_key paramter. This is your ReadMe Docs API Key.
  2. There are also 4 optional parameters to initialize the ReadmeDocsFetcher
    • slugs: To fetch a list of specific pages from your documentation. E.g. if you have want to fetch 'https://docs.haystack.deepset.ai/docs/installation' the slug would be installation. If not set, all of the available pages will be fetched.
    • base_url: Optionally provide this to add the full url of a documentation page to the meta of the created document. For example base_url='https://docs.haystack.deepset.ai'"
    • version: If not set, the latest stable version of tour docs will be fethed.
    • markdown_converter: When documents are fetched from ReadMe, temporary .md files are created and we use a MakrdownConverter to create a list of haystack Documents. If not provided at initialization, the a MarkdownConverter with the default parameters is used.

Standalone

import os
from dotenv import load_dotenv
from haystack.nodes import MarkdownConverter
from readmedocs_fetcher_haystack import ReadmeDocsFetcher

load_dotenv()
README_API_KEY = os.getenv('README_API_KEY')

converter = MarkdownConverter(remove_code_snippets=False)
readme_fetcher = ReadmeDocsFetcher(api_key=README_API_KEY, markdown_converter=converter, base_url="https://docs.haystack.deepset.ai")
readme_fetcher.fetch_docs()

To fetch a single doc from a specific version:

readme_fetcher.fetch_docs(slugs=["nodes_overview"], version="v1.18")

In a Pipeline

import os
from dotenv import load_dotenv
from haystack import Pipeline
from haystack.nodes import MarkdownConverter, PreProcessor
from haystack.document_stores import InMemoryDocumentStore
from readmedocs_fetcher_haystack import ReadmeDocsFetcher

load_dotenv()
README_API_KEY = os.getenv('README_API_KEY')

converter = MarkdownConverter(remove_code_snippets=False)
readme_fetcher = ReadmeDocsFetcher(api_key=README_API_KEY, markdown_converter=converter, base_url="https://docs.haystack.deepset.ai")

preprocessor = PreProcessor()
doc_store = InMemoryDocumentStore()

pipe = Pipeline()
pipe.add_node(component=readme_fetcher, name="ReadmeFetcher", inputs=["File"])
pipe.add_node(component=preprocessor, name="Preprocessor", inputs=["ReadmeFetcher"])
pipe.add_node(component=doc_store, name="DocumentStore", inputs=["Preprocessor"])
pipe.run()

To fetch a single documentation page:

pipe.run(params={"ReadmeFetcher":{"slugs": ["nodes_overview"]}})

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

readmedocs_fetcher_haystack-0.0.2.tar.gz (5.7 kB view details)

Uploaded Source

Built Distribution

File details

Details for the file readmedocs_fetcher_haystack-0.0.2.tar.gz.

File metadata

File hashes

Hashes for readmedocs_fetcher_haystack-0.0.2.tar.gz
Algorithm Hash digest
SHA256 5c1ba4c4815982c2f5de71dca5a9847991423e839f1a4ea25345998b649576f7
MD5 f37746a867c571245df6c264e66e08d9
BLAKE2b-256 0e02edcc8d61650e4ab8f44cb3b26baf48265fbddf928318f7808a4bee7ea131

See more details on using hashes here.

File details

Details for the file readmedocs_fetcher_haystack-0.0.2-py3-none-any.whl.

File metadata

File hashes

Hashes for readmedocs_fetcher_haystack-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 ae4a6bf3c133d6b5faccec602b35864ece0e52f434740aab78b020210b366dce
MD5 21cceb269c13c30c5367a98872575429
BLAKE2b-256 03fdb98d79bec08bdd7f469feb0c7b87f1b7153ecd89998f9898b9984b648d4e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page