Skip to main content

A powerful and automated document parser built with LangChain for intelligent document processing

Project description

Automated Document Parser

PyPI version Python Version CI codecov License: MIT Code style: ruff

A powerful and automated document parser built with LangChain for intelligent document processing. This library automatically detects file types and uses the appropriate loader to parse documents into LangChain-compatible formats.

Features

  • Automatic file type detection based on file extensions
  • Support for multiple document formats: PDF, TXT, CSV, JSON, DOCX, HTML, Markdown
  • Built on LangChain for seamless integration with RAG applications
  • Type-safe implementation with comprehensive error handling
  • Batch processing support for multiple documents

Installation

Install from PyPI:

pip install automated-document-parser

Or using uv:

uv add automated-document-parser

Quick Start

Basic Usage

from automated_document_parser import DocumentParser

# Initialize the parser
parser = DocumentParser()

# Parse a single document
documents = parser.parse("path/to/document.pdf")

# Parse multiple documents
file_paths = ["doc1.pdf", "doc2.txt", "data.csv"]
parsed_docs = parser.parse_multiple(file_paths)

Working with Parsed Documents

# Access document content and metadata
for doc in documents:
    print(f"Content: {doc.page_content}")
    print(f"Source: {doc.metadata['source']}")
    print(f"Type: {doc.metadata['file_type']}")

License

MIT License - see LICENSE file for details.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

automated_document_parser-0.1.5.tar.gz (34.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

automated_document_parser-0.1.5-py3-none-any.whl (27.9 kB view details)

Uploaded Python 3

File details

Details for the file automated_document_parser-0.1.5.tar.gz.

File metadata

File hashes

Hashes for automated_document_parser-0.1.5.tar.gz
Algorithm Hash digest
SHA256 a474df0de151bb946feb0cd06671871287d0f0354c054ebbc2d3d86903ef0c0a
MD5 0ea468ab94d6eecd2447426ff678630f
BLAKE2b-256 a2aa00d1d031f4ef43d081265ce27c140488c3a7f18255ee796454e73a49f13f

See more details on using hashes here.

File details

Details for the file automated_document_parser-0.1.5-py3-none-any.whl.

File metadata

File hashes

Hashes for automated_document_parser-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 749ffb1e74d5d959195237c48e6dd6bffe926930706c2268f2a901123071a6a3
MD5 f2f026697cf3f4f6dbb0058a4239d196
BLAKE2b-256 4cfbfa8aa3eb3616e7f2d8b9fcbebfc6621dfee7dd3cbda2ebb2b1d95a370c67

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page