Skip to main content

llama-index readers remote_depth integration

Project description

Remote Page/File Loader

This loader makes it easy to extract the text from the links available in a webpage URL, and extract the links presents in the page. It's based on RemoteReader (reading single page), that is based on SimpleDirectoryReader (parsing the document if file is a pdf, etc). It is an all-in-one tool for (almost) any group of urls.

You can try with this MIT lecture link, it will be able to extract the syllabus, the PDFs, etc: https://ocw.mit.edu/courses/5-05-principles-of-inorganic-chemistry-iii-spring-2005/pages/syllabus/

Usage

You need to specify the parameter depth to specify how many levels of links you want to extract. For example, if you want to extract the links in the page, and the links in the links in the page, you need to specify depth=2.

from llama_index import download_loader

RemoteDepthReader = download_loader("RemoteDepthReader")

loader = RemoteDepthReader()
documents = loader.load_data(
    url="https://ocw.mit.edu/courses/5-05-principles-of-inorganic-chemistry-iii-spring-2005/pages/syllabus/"
)

This loader is designed to be used as a way to load data into LlamaIndex and/or subsequently used as a Tool in a LangChain Agent. See here for examples.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llama_index_readers_remote_depth-0.1.1.tar.gz (3.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file llama_index_readers_remote_depth-0.1.1.tar.gz.

File metadata

File hashes

Hashes for llama_index_readers_remote_depth-0.1.1.tar.gz
Algorithm Hash digest
SHA256 59f3ea59d5c8dc1a3222328e56b9ef1e54a3092af773ca3384c759f25ddd259a
MD5 6c2ef06226664be0ef86b691ab47bcc5
BLAKE2b-256 15188c959cfda365e2e5f089ef992668588eba99e1b0ea0b76656ab07e86d6b9

See more details on using hashes here.

File details

Details for the file llama_index_readers_remote_depth-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for llama_index_readers_remote_depth-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 74c4aefb5975978e59b77afec95a1c1a9579f313b3716f0fc0156c0c1b9cf706
MD5 a0976d7af4a7f577c6f8e29df64515b4
BLAKE2b-256 058cebd99349011b0e7c5034b2dff48222dbebc37c78df119e7bae2d79c84732

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page