llama-index readers file integration
Project description
LlamaIndex Readers Integration: File
pip install llama-index-readers-file
This is the default integration for different loaders that are used within SimpleDirectoryReader
.
Provides support for the following loaders:
- DocxReader
- HWPReader
- PDFReader
- EpubReader
- FlatReader
- HTMLTagReader
- ImageCaptionReader
- ImageReader
- ImageVisionLLMReader
- IPYNBReader
- MarkdownReader
- MboxReader
- PptxReader
- PandasCSVReader
- VideoAudioReader
- UnstructuredReader
- PyMuPDFReader
- ImageTabularChartReader
- XMLReader
- PagedCSVReader
- CSVReader
- RTFReader
Installation
pip install llama-index-readers-file
Usage
Once installed, You can import any of the loader. Here's an example usage of one of the loader.
from llama_index.core import SimpleDirectoryReader
from llama_index.readers.file import (
DocxReader,
HWPReader,
PDFReader,
EpubReader,
FlatReader,
HTMLTagReader,
ImageCaptionReader,
ImageReader,
ImageVisionLLMReader,
IPYNBReader,
MarkdownReader,
MboxReader,
PptxReader,
PandasCSVReader,
VideoAudioReader,
UnstructuredReader,
PyMuPDFReader,
ImageTabularChartReader,
XMLReader,
PagedCSVReader,
CSVReader,
RTFReader,
)
# PDF Reader with `SimpleDirectoryReader`
parser = PDFReader()
file_extractor = {".pdf": parser}
documents = SimpleDirectoryReader(
"./data", file_extractor=file_extractor
).load_data()
# Docx Reader example
parser = DocxReader()
file_extractor = {".docx": parser}
documents = SimpleDirectoryReader(
"./data", file_extractor=file_extractor
).load_data()
# HWP Reader example
parser = HWPReader()
file_extractor = {".hwp": parser}
documents = SimpleDirectoryReader(
"./data", file_extractor=file_extractor
).load_data()
# Epub Reader example
parser = EpubReader()
file_extractor = {".epub": parser}
documents = SimpleDirectoryReader(
"./data", file_extractor=file_extractor
).load_data()
# Flat Reader example
parser = FlatReader()
file_extractor = {".txt": parser}
documents = SimpleDirectoryReader(
"./data", file_extractor=file_extractor
).load_data()
# HTML Tag Reader example
parser = HTMLTagReader()
file_extractor = {".html": parser}
documents = SimpleDirectoryReader(
"./data", file_extractor=file_extractor
).load_data()
# Image Reader example
parser = ImageReader()
file_extractor = {
".jpg": parser,
".jpeg": parser,
".png": parser,
} # Add other image formats as needed
documents = SimpleDirectoryReader(
"./data", file_extractor=file_extractor
).load_data()
# IPYNB Reader example
parser = IPYNBReader()
file_extractor = {".ipynb": parser}
documents = SimpleDirectoryReader(
"./data", file_extractor=file_extractor
).load_data()
# Markdown Reader example
parser = MarkdownReader()
file_extractor = {".md": parser}
documents = SimpleDirectoryReader(
"./data", file_extractor=file_extractor
).load_data()
# Mbox Reader example
parser = MboxReader()
file_extractor = {".mbox": parser}
documents = SimpleDirectoryReader(
"./data", file_extractor=file_extractor
).load_data()
# Pptx Reader example
parser = PptxReader()
file_extractor = {".pptx": parser}
documents = SimpleDirectoryReader(
"./data", file_extractor=file_extractor
).load_data()
# Pandas CSV Reader example
parser = PandasCSVReader()
file_extractor = {".csv": parser} # Add other CSV formats as needed
documents = SimpleDirectoryReader(
"./data", file_extractor=file_extractor
).load_data()
# PyMuPDF Reader example
parser = PyMuPDFReader()
file_extractor = {".pdf": parser}
documents = SimpleDirectoryReader(
"./data", file_extractor=file_extractor
).load_data()
# XML Reader example
parser = XMLReader()
file_extractor = {".xml": parser}
documents = SimpleDirectoryReader(
"./data", file_extractor=file_extractor
).load_data()
# Paged CSV Reader example
parser = PagedCSVReader()
file_extractor = {".csv": parser} # Add other CSV formats as needed
documents = SimpleDirectoryReader(
"./data", file_extractor=file_extractor
).load_data()
# CSV Reader example
parser = CSVReader()
file_extractor = {".csv": parser} # Add other CSV formats as needed
documents = SimpleDirectoryReader(
"./data", file_extractor=file_extractor
).load_data()
This loader is designed to be used as a way to load data into LlamaIndex.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Close
Hashes for llama_index_readers_file-0.1.27.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 04c9bc3063a9409fcd877f77d9b2abfd502e072763aa6dd788e0c637418b8664 |
|
MD5 | 307e6b1e325cea31bb5aba9dffc64a2d |
|
BLAKE2b-256 | c76041b7e7ab11381b8fcc6812b2a35b14bc82d1a675d6b7a05c112727b8b623 |
Close
Hashes for llama_index_readers_file-0.1.27-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8ead32dfbd328642c20ab8bd434a61c616b5a955642583a3bedbe454ed0b0dda |
|
MD5 | 8abd3f2b5654afdf13ee8a1d64d10f5f |
|
BLAKE2b-256 | dad5f9a4f9cea7d24902e7177c12eb6cb110989c6c09e7e31d9b29498f5bc5db |