This is a documents loader.
Project description
docsloader
What is this?
- by: axiner
- docsloader
- This is a documents loader.
Installation
This package can be installed using pip (Python>=3.11):
pip install docsloader
Usage
The docsloader package provides asynchronous document loaders for various file suffixes. It includes dedicated loaders
for specific file types and an AutoLoader that automatically selects the appropriate loader based on file suffix.
Supported File Suffixes
The package supports loading documents from the following file suffixes:
- Text Files:
.txt - CSV Files:
.csv - Markdown Files:
.md - HTML Files:
.html,.htm - Excel Files:
.xlsx,.xls - PowerPoint Files:
.pptx,.ppt - Word Files:
.docx,.doc - PDF Files:
.pdf - Image Files:
.jpg,.jpeg,.png
Available Loaders
The package provides the following loader classes:
TxtLoader: For Text filesCsvLoader: For CSV filesMdLoader: For Markdown filesHtmlLoader: For HTML filesXlsxLoader: For Excel filesPptxLoader: For PowerPoint filesDocxLoader: For Word filesPdfLoader: For PDF filesImgLoader: For image filesAutoLoader: Automatically selects the appropriate loader based on file suffix
All loader classes implement asynchronous load methods for efficient document processing.
Example
import asyncio
from docsloader import AutoLoader
from toollib.log import init_logger
logger = init_logger(__name__)
async def main(path_or_url: str):
loader = AutoLoader(
path_or_url=path_or_url,
rm_tmpfile=False,
)
async for doc in loader.load():
logger.info(doc)
if __name__ == "__main__":
asyncio.run(main(path_or_url=r"E:/NewFolder/测试.docx"))
License
This project is released under the MIT License (MIT). See LICENSE
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file docsloader-0.0.4-py3-none-any.whl.
File metadata
- Download URL: docsloader-0.0.4-py3-none-any.whl
- Upload date:
- Size: 16.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4ad5dc08c6e243dc2610a64e268f8cfd539a9b72d228eb66756a15178b61b9f2
|
|
| MD5 |
b958ea235ac5b5c7b82eb81b8a815a69
|
|
| BLAKE2b-256 |
5c9d1e996a43aa9e9cb51bdedfe6958c3446d3168311f6d742948449df4580c6
|