Skip to main content

This is a documents loader.

Project description

docsloader

What is this?

  • by: axiner
  • docsloader
  • This is a documents loader.

Installation

This package can be installed using pip (Python>=3.11):

pip install docsloader

  • if you want to install all dependencies: pip install docsloader[all]
  • if you want to install specific dependencies:
    • txt: pip install docsloader[txt]
    • csv: pip install docsloader[csv]
    • md: pip install docsloader[md]
    • xlsx: pip install docsloader[xlsx]
    • pptx: pip install docsloader[pptx]
    • docx: pip install docsloader[docx]
    • pdf: pip install docsloader[pdf]
    • img: pip install docsloader[img]
    • auto: pip install docsloader[auto]

Usage

The docsloader package provides asynchronous document loaders for various file suffixes. It includes dedicated loaders for specific file types and an AutoLoader that automatically selects the appropriate loader based on file suffix.

Supported File Suffixes

The package supports loading documents from the following file suffixes:

  • Text Files: .txt
  • CSV Files: .csv
  • Markdown Files: .md
  • HTML Files: .html, .htm
  • Excel Files: .xlsx, .xls
  • PowerPoint Files: .pptx, .ppt
  • Word Files: .docx, .doc
  • PDF Files: .pdf
  • Image Files: .jpg, .jpeg, .png

Available Loaders

The package provides the following loader classes:

  • TxtLoader: For Text files
  • CsvLoader: For CSV files
  • MdLoader: For Markdown files
  • HtmlLoader: For HTML files
  • XlsxLoader: For Excel files
  • PptxLoader: For PowerPoint files
  • DocxLoader: For Word files
  • PdfLoader: For PDF files
  • ImgLoader: For image files
  • AutoLoader: Automatically selects the appropriate loader based on file suffix

All loader classes implement asynchronous load methods for efficient document processing.

Example

import asyncio

from docsloader import AutoLoader
from toollib.log import init_logger

logger = init_logger(__name__)


async def main(path_or_url: str):
    loader = AutoLoader(
        path_or_url=path_or_url,
        rm_tmpfile=False,
    )
    async for doc in loader.load():
        logger.info(doc)


if __name__ == "__main__":
    asyncio.run(main(path_or_url=r"E:/NewFolder/测试.docx"))

License

This project is released under the MIT License (MIT). See LICENSE

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

docsloader-0.0.13-py3-none-any.whl (24.1 kB view details)

Uploaded Python 3

File details

Details for the file docsloader-0.0.13-py3-none-any.whl.

File metadata

  • Download URL: docsloader-0.0.13-py3-none-any.whl
  • Upload date:
  • Size: 24.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.11

File hashes

Hashes for docsloader-0.0.13-py3-none-any.whl
Algorithm Hash digest
SHA256 eb47b5dce0484ed362cceb1aba7f73343eaf42e79343c4406fd527c5c4777d40
MD5 72486f7913db99ab93fb13123286dbbe
BLAKE2b-256 5c3ff4f660616d5df2ae9613a4cecb4fa4c5dffa27700bc0f7a8e6f31c2b5d96

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page