Skip to main content

A powerful and automated document parser built with LangChain for intelligent document processing

Project description

Automated Document Parser

PyPI version Python Version CI codecov License: MIT Code style: ruff

A powerful and automated document parser built with LangChain for intelligent document processing. This library automatically detects file types and uses the appropriate loader to parse documents into LangChain-compatible formats.

Features

  • Automatic file type detection based on file extensions
  • Support for multiple document formats: PDF, TXT, CSV, JSON, DOCX, HTML, Markdown
  • Built on LangChain for seamless integration with RAG applications
  • Type-safe implementation with comprehensive error handling
  • Batch processing support for multiple documents

Installation

Install from PyPI:

pip install automated-document-parser

Or using uv:

uv add automated-document-parser

Quick Start

Basic Usage

from automated_document_parser import DocumentParser

# Initialize the parser
parser = DocumentParser()

# Parse a single document
documents = parser.parse("path/to/document.pdf")

# Parse multiple documents
file_paths = ["doc1.pdf", "doc2.txt", "data.csv"]
parsed_docs = parser.parse_multiple(file_paths)

Working with Parsed Documents

# Access document content and metadata
for doc in documents:
    print(f"Content: {doc.page_content}")
    print(f"Source: {doc.metadata['source']}")
    print(f"Type: {doc.metadata['file_type']}")

License

MIT License - see LICENSE file for details.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

automated_document_parser-0.1.4.tar.gz (17.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

automated_document_parser-0.1.4-py3-none-any.whl (9.5 kB view details)

Uploaded Python 3

File details

Details for the file automated_document_parser-0.1.4.tar.gz.

File metadata

File hashes

Hashes for automated_document_parser-0.1.4.tar.gz
Algorithm Hash digest
SHA256 49e2fe94494f409e11e1342cf0c34c86ea7bf4f1ebe807249518f0aa055a7794
MD5 98f88d8b8be842e517293fc29edd21e5
BLAKE2b-256 b520fbdd839187f4e1ab7e4b012c9e2867a2f81261805ba2af9f0d192b766609

See more details on using hashes here.

File details

Details for the file automated_document_parser-0.1.4-py3-none-any.whl.

File metadata

File hashes

Hashes for automated_document_parser-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 01a0ecd48381b08cb992db96bade461520316578161efa5f22c95d105b185919
MD5 7d064e852f461d0fb3accb03f700bbdc
BLAKE2b-256 1e4a0315fbda832acb25b39fd20be5f1768088684797a37806972697e7f814d9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page