Skip to main content

This is a documents loader.

Project description

docsloader

What is this?

  • by: axiner
  • docsloader
  • This is a documents loader.

Installation

This package can be installed using pip (Python>=3.11):

pip install docsloader

  • if you want to install all dependencies: pip install docsloader[all]
  • if you want to install specific dependencies:
    • txt: pip install docsloader[txt]
    • csv: pip install docsloader[csv]
    • md: pip install docsloader[md]
    • xlsx: pip install docsloader[xlsx]
    • pptx: pip install docsloader[pptx]
    • docx: pip install docsloader[docx]
    • pdf: pip install docsloader[pdf]
    • img: pip install docsloader[img]
    • auto: pip install docsloader[auto]

Usage

The docsloader package provides asynchronous document loaders for various file suffixes. It includes dedicated loaders for specific file types and an AutoLoader that automatically selects the appropriate loader based on file suffix.

Supported File Suffixes

The package supports loading documents from the following file suffixes:

  • Text Files: .txt
  • CSV Files: .csv
  • Markdown Files: .md
  • HTML Files: .html, .htm
  • Excel Files: .xlsx, .xls
  • PowerPoint Files: .pptx, .ppt
  • Word Files: .docx, .doc
  • PDF Files: .pdf
  • Image Files: .jpg, .jpeg, .png

Available Loaders

The package provides the following loader classes:

  • TxtLoader: For Text files
  • CsvLoader: For CSV files
  • MdLoader: For Markdown files
  • HtmlLoader: For HTML files
  • XlsxLoader: For Excel files
  • PptxLoader: For PowerPoint files
  • DocxLoader: For Word files
  • PdfLoader: For PDF files
  • ImgLoader: For image files
  • AutoLoader: Automatically selects the appropriate loader based on file suffix

All loader classes implement asynchronous load methods for efficient document processing.

Example

import asyncio

from docsloader import AutoLoader
from toollib.log import init_logger

logger = init_logger(__name__)


async def main(path_or_url: str):
    loader = AutoLoader(
        path_or_url=path_or_url,
        rm_tmpfile=False,
    )
    async for doc in loader.load():
        logger.info(doc)


if __name__ == "__main__":
    asyncio.run(main(path_or_url=r"E:/NewFolder/测试.docx"))

License

This project is released under the MIT License (MIT). See LICENSE

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

docsloader-0.0.11-py3-none-any.whl (19.5 kB view details)

Uploaded Python 3

File details

Details for the file docsloader-0.0.11-py3-none-any.whl.

File metadata

  • Download URL: docsloader-0.0.11-py3-none-any.whl
  • Upload date:
  • Size: 19.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.11

File hashes

Hashes for docsloader-0.0.11-py3-none-any.whl
Algorithm Hash digest
SHA256 842d2436d9aaf14336b69c75bcc0f69896159473fe54fe5ec0b7b608d65915d6
MD5 3e1ccdbc4f8038ee36a1a4d62b210059
BLAKE2b-256 12cb1fbdbee5e4953a7a63a029de7fca4ff0ec6c3ad881c511503b99a1638a4f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page