llama-index readers file integration
Project description
LlamaIndex Readers Integration: File
pip install llama-index-readers-file
This is the default integration for different loaders that are used within SimpleDirectoryReader
.
Provides support for the following loaders:
- DocxReader
- HWPReader
- PDFReader
- EpubReader
- FlatReader
- HTMLTagReader
- ImageCaptionReader
- ImageReader
- ImageVisionLLMReader
- IPYNBReader
- MarkdownReader
- MboxReader
- PptxReader
- PandasCSVReader
- VideoAudioReader
- UnstructuredReader
- PyMuPDFReader
- ImageTabularChartReader
- XMLReader
- PagedCSVReader
- CSVReader
- RTFReader
Installation
pip install llama-index-readers-file
Usage
Once installed, You can import any of the loader. Here's an example usage of one of the loader.
from llama_index.core import SimpleDirectoryReader
from llama_index.readers.file import (
DocxReader,
HWPReader,
PDFReader,
EpubReader,
FlatReader,
HTMLTagReader,
ImageCaptionReader,
ImageReader,
ImageVisionLLMReader,
IPYNBReader,
MarkdownReader,
MboxReader,
PptxReader,
PandasCSVReader,
VideoAudioReader,
UnstructuredReader,
PyMuPDFReader,
ImageTabularChartReader,
XMLReader,
PagedCSVReader,
CSVReader,
RTFReader,
)
# PDF Reader with `SimpleDirectoryReader`
parser = PDFReader()
file_extractor = {".pdf": parser}
documents = SimpleDirectoryReader(
"./data", file_extractor=file_extractor
).load_data()
# Docx Reader example
parser = DocxReader()
file_extractor = {".docx": parser}
documents = SimpleDirectoryReader(
"./data", file_extractor=file_extractor
).load_data()
# HWP Reader example
parser = HWPReader()
file_extractor = {".hwp": parser}
documents = SimpleDirectoryReader(
"./data", file_extractor=file_extractor
).load_data()
# Epub Reader example
parser = EpubReader()
file_extractor = {".epub": parser}
documents = SimpleDirectoryReader(
"./data", file_extractor=file_extractor
).load_data()
# Flat Reader example
parser = FlatReader()
file_extractor = {".txt": parser}
documents = SimpleDirectoryReader(
"./data", file_extractor=file_extractor
).load_data()
# HTML Tag Reader example
parser = HTMLTagReader()
file_extractor = {".html": parser}
documents = SimpleDirectoryReader(
"./data", file_extractor=file_extractor
).load_data()
# Image Reader example
parser = ImageReader()
file_extractor = {
".jpg": parser,
".jpeg": parser,
".png": parser,
} # Add other image formats as needed
documents = SimpleDirectoryReader(
"./data", file_extractor=file_extractor
).load_data()
# IPYNB Reader example
parser = IPYNBReader()
file_extractor = {".ipynb": parser}
documents = SimpleDirectoryReader(
"./data", file_extractor=file_extractor
).load_data()
# Markdown Reader example
parser = MarkdownReader()
file_extractor = {".md": parser}
documents = SimpleDirectoryReader(
"./data", file_extractor=file_extractor
).load_data()
# Mbox Reader example
parser = MboxReader()
file_extractor = {".mbox": parser}
documents = SimpleDirectoryReader(
"./data", file_extractor=file_extractor
).load_data()
# Pptx Reader example
parser = PptxReader()
file_extractor = {".pptx": parser}
documents = SimpleDirectoryReader(
"./data", file_extractor=file_extractor
).load_data()
# Pandas CSV Reader example
parser = PandasCSVReader()
file_extractor = {".csv": parser} # Add other CSV formats as needed
documents = SimpleDirectoryReader(
"./data", file_extractor=file_extractor
).load_data()
# PyMuPDF Reader example
parser = PyMuPDFReader()
file_extractor = {".pdf": parser}
documents = SimpleDirectoryReader(
"./data", file_extractor=file_extractor
).load_data()
# XML Reader example
parser = XMLReader()
file_extractor = {".xml": parser}
documents = SimpleDirectoryReader(
"./data", file_extractor=file_extractor
).load_data()
# Paged CSV Reader example
parser = PagedCSVReader()
file_extractor = {".csv": parser} # Add other CSV formats as needed
documents = SimpleDirectoryReader(
"./data", file_extractor=file_extractor
).load_data()
# CSV Reader example
parser = CSVReader()
file_extractor = {".csv": parser} # Add other CSV formats as needed
documents = SimpleDirectoryReader(
"./data", file_extractor=file_extractor
).load_data()
This loader is designed to be used as a way to load data into LlamaIndex.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Close
Hashes for llama_index_readers_file-0.1.23.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | fde8ecb588e703849e51dc0f075f56d1f5db3bc1479dd00c21b42e93b81b6267 |
|
MD5 | 21519429414a45a398d19f4d1ab8038e |
|
BLAKE2b-256 | 49b94d4b6aa92b45b89286403c11cd6c2d1aa465fda77297a28ea6cdbb80f0a2 |
Close
Hashes for llama_index_readers_file-0.1.23-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 32450d0a3edc6ef6af575f814beec39cd3a3351eaf0e3c97045bdd72a7a7b38d |
|
MD5 | 8b3c1a9021f57c159f082c0ee0a4b213 |
|
BLAKE2b-256 | ffe137bc2e995ca2c416f74a82b7fe9df5b8682af7c94e8792a4eb7ddf5d1ae8 |