Skip to main content

This is a documents loader.

Project description

docsloader

What is this?

  • by: axiner
  • docsloader
  • This is a documents loader.

Installation

This package can be installed using pip (Python>=3.11):

pip install docsloader

  • if you want to install all dependencies: pip install docsloader[all]
  • if you want to install specific dependencies:
    • txt: pip install docsloader[txt]
    • csv: pip install docsloader[csv]
    • md: pip install docsloader[md]
    • xlsx: pip install docsloader[xlsx]
    • pptx: pip install docsloader[pptx]
    • docx: pip install docsloader[docx]
    • pdf: pip install docsloader[pdf]
    • img: pip install docsloader[img]
    • auto: pip install docsloader[auto]

Usage

The docsloader package provides asynchronous document loaders for various file suffixes. It includes dedicated loaders for specific file types and an AutoLoader that automatically selects the appropriate loader based on file suffix.

Supported File Suffixes

The package supports loading documents from the following file suffixes:

  • Text Files: .txt
  • CSV Files: .csv
  • Markdown Files: .md
  • HTML Files: .html, .htm
  • Excel Files: .xlsx, .xls
  • PowerPoint Files: .pptx, .ppt
  • Word Files: .docx, .doc
  • PDF Files: .pdf
  • Image Files: .jpg, .jpeg, .png

Available Loaders

The package provides the following loader classes:

  • TxtLoader: For Text files
  • CsvLoader: For CSV files
  • MdLoader: For Markdown files
  • HtmlLoader: For HTML files
  • XlsxLoader: For Excel files
  • PptxLoader: For PowerPoint files
  • DocxLoader: For Word files
  • PdfLoader: For PDF files
  • ImgLoader: For image files
  • AutoLoader: Automatically selects the appropriate loader based on file suffix

All loader classes implement asynchronous load methods for efficient document processing.

Example

import asyncio

from docsloader import AutoLoader
from toollib.log import init_logger

logger = init_logger(__name__)


async def main(path_or_url: str):
    loader = AutoLoader(
        path_or_url=path_or_url,
        rm_tmpfile=False,
    )
    async for doc in loader.load():
        logger.info(doc)


if __name__ == "__main__":
    asyncio.run(main(path_or_url=r"E:/NewFolder/测试.docx"))

License

This project is released under the MIT License (MIT). See LICENSE

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

docsloader-0.0.9-py3-none-any.whl (19.4 kB view details)

Uploaded Python 3

File details

Details for the file docsloader-0.0.9-py3-none-any.whl.

File metadata

  • Download URL: docsloader-0.0.9-py3-none-any.whl
  • Upload date:
  • Size: 19.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.11

File hashes

Hashes for docsloader-0.0.9-py3-none-any.whl
Algorithm Hash digest
SHA256 7c02c40cc00af01ed500e3135f064431d073b3c08478f27434f596288bb4ea77
MD5 c71a5eb8e74f028dad70e4ffb3634741
BLAKE2b-256 15857782d3547c6195e192e5247fc88ae110460799382f165845ccc284f82b3e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page