Skip to main content

llama-index readers remote_depth integration

Project description

Remote Page/File Loader

pip install llama-index-readers-remote-depth

This loader makes it easy to extract the text from the links available in a webpage URL, and extract the links presents in the page. It's based on RemoteReader (reading single page), that is based on SimpleDirectoryReader (parsing the document if file is a pdf, etc). It is an all-in-one tool for (almost) any group of urls.

You can try with this MIT lecture link, it will be able to extract the syllabus, the PDFs, etc: https://ocw.mit.edu/courses/5-05-principles-of-inorganic-chemistry-iii-spring-2005/pages/syllabus/

Usage

You need to specify the parameter depth to specify how many levels of links you want to extract. For example, if you want to extract the links in the page, and the links in the links in the page, you need to specify depth=2.

from llama_index.readers.remote_depth import RemoteDepthReader

loader = RemoteDepthReader()
documents = loader.load_data(
    url="https://ocw.mit.edu/courses/5-05-principles-of-inorganic-chemistry-iii-spring-2005/pages/syllabus/"
)

This loader is designed to be used as a way to load data into LlamaIndex.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llama_index_readers_remote_depth-0.4.1.tar.gz (4.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file llama_index_readers_remote_depth-0.4.1.tar.gz.

File metadata

File hashes

Hashes for llama_index_readers_remote_depth-0.4.1.tar.gz
Algorithm Hash digest
SHA256 38e4cb4901de7ad6e6bfd5d8b2527445ead034b2728042f6b316316227f6e144
MD5 073c5c22cac7bfcea09363f25d525975
BLAKE2b-256 0aea21884577dde614aa1b3509726bfca893f1d735b23afe512446616456efc8

See more details on using hashes here.

File details

Details for the file llama_index_readers_remote_depth-0.4.1-py3-none-any.whl.

File metadata

File hashes

Hashes for llama_index_readers_remote_depth-0.4.1-py3-none-any.whl
Algorithm Hash digest
SHA256 31af793affab648f5a614b4c9e5c795576efe437f96cac356bfe74ec33dae984
MD5 fa1154cd851416f654f1cae3a8292b1b
BLAKE2b-256 bd1378efaa07f14d0017431c989c4b25d57edaf0c8748c611f7854fa08737e77

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page