Skip to main content

This is a documents loader.

Project description

docsloader

What is this?

  • by: axiner
  • docsloader
  • This is a documents loader.

Installation

This package can be installed using pip (Python>=3.11):

pip install docsloader

  • if you want to install all dependencies: pip install docsloader[all]
  • if you want to install specific dependencies:
    • txt: pip install docsloader[txt]
    • csv: pip install docsloader[csv]
    • md: pip install docsloader[md]
    • xlsx: pip install docsloader[xlsx]
    • pptx: pip install docsloader[pptx]
    • docx: pip install docsloader[docx]
    • pdf: pip install docsloader[pdf]
    • img: pip install docsloader[img]
    • auto: pip install docsloader[auto]

Usage

The docsloader package provides asynchronous document loaders for various file suffixes. It includes dedicated loaders for specific file types and an AutoLoader that automatically selects the appropriate loader based on file suffix.

Supported File Suffixes

The package supports loading documents from the following file suffixes:

  • Text Files: .txt
  • CSV Files: .csv
  • Markdown Files: .md
  • HTML Files: .html, .htm
  • Excel Files: .xlsx, .xls
  • PowerPoint Files: .pptx, .ppt
  • Word Files: .docx, .doc
  • PDF Files: .pdf
  • Image Files: .jpg, .jpeg, .png

Available Loaders

The package provides the following loader classes:

  • TxtLoader: For Text files
  • CsvLoader: For CSV files
  • MdLoader: For Markdown files
  • HtmlLoader: For HTML files
  • XlsxLoader: For Excel files
  • PptxLoader: For PowerPoint files
  • DocxLoader: For Word files
  • PdfLoader: For PDF files
  • ImgLoader: For image files
  • AutoLoader: Automatically selects the appropriate loader based on file suffix

All loader classes implement asynchronous load methods for efficient document processing.

Example

import asyncio

from docsloader import AutoLoader
from toollib.log import init_logger

logger = init_logger(__name__)


async def main(path_or_url: str):
    loader = AutoLoader(
        path_or_url=path_or_url,
        rm_tmpfile=False,
    )
    async for doc in loader.load():
        logger.info(doc)


if __name__ == "__main__":
    asyncio.run(main(path_or_url=r"E:/NewFolder/测试.docx"))

License

This project is released under the MIT License (MIT). See LICENSE

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

docsloader-0.0.8-py3-none-any.whl (19.4 kB view details)

Uploaded Python 3

File details

Details for the file docsloader-0.0.8-py3-none-any.whl.

File metadata

  • Download URL: docsloader-0.0.8-py3-none-any.whl
  • Upload date:
  • Size: 19.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.11

File hashes

Hashes for docsloader-0.0.8-py3-none-any.whl
Algorithm Hash digest
SHA256 d24feb668c8afb7c9f5876c990a2e678711066adcb5ccda73305b19537437243
MD5 d8a69721788476458bf5485fcdf018e1
BLAKE2b-256 a01a9231734afcbd674d76dc094124129758e3ff747b7841c965131c6519c822

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page