llama-index readers remote_depth integration
Project description
Remote Page/File Loader
pip install llama-index-readers-remote-depth
This loader makes it easy to extract the text from the links available in a webpage URL, and extract the links presents in the page. It's based on RemoteReader (reading single page), that is based on SimpleDirectoryReader (parsing the document if file is a pdf, etc). It is an all-in-one tool for (almost) any group of urls.
You can try with this MIT lecture link, it will be able to extract the syllabus, the PDFs, etc:
https://ocw.mit.edu/courses/5-05-principles-of-inorganic-chemistry-iii-spring-2005/pages/syllabus/
Usage
You need to specify the parameter depth to specify how many levels of links you want to extract. For example, if you want to extract the links in the page, and the links in the links in the page, you need to specify depth=2.
from llama_index.readers.remote_depth import RemoteDepthReader
loader = RemoteDepthReader()
documents = loader.load_data(
url="https://ocw.mit.edu/courses/5-05-principles-of-inorganic-chemistry-iii-spring-2005/pages/syllabus/"
)
This loader is designed to be used as a way to load data into LlamaIndex.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file llama_index_readers_remote_depth-0.4.1.tar.gz.
File metadata
- Download URL: llama_index_readers_remote_depth-0.4.1.tar.gz
- Upload date:
- Size: 4.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
38e4cb4901de7ad6e6bfd5d8b2527445ead034b2728042f6b316316227f6e144
|
|
| MD5 |
073c5c22cac7bfcea09363f25d525975
|
|
| BLAKE2b-256 |
0aea21884577dde614aa1b3509726bfca893f1d735b23afe512446616456efc8
|
File details
Details for the file llama_index_readers_remote_depth-0.4.1-py3-none-any.whl.
File metadata
- Download URL: llama_index_readers_remote_depth-0.4.1-py3-none-any.whl
- Upload date:
- Size: 4.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
31af793affab648f5a614b4c9e5c795576efe437f96cac356bfe74ec33dae984
|
|
| MD5 |
fa1154cd851416f654f1cae3a8292b1b
|
|
| BLAKE2b-256 |
bd1378efaa07f14d0017431c989c4b25d57edaf0c8748c611f7854fa08737e77
|