Skip to main content

AutoLoader for structured and unstructured documents using LangChain

Project description

AutoLoader

AutoLoader is a Python utility that automatically loads and processes both structured and unstructured documents using the LangChain framework.

It supports:

  • Structured files: CSV, Excel (.csv, .xlsx, .xls)
  • Unstructured files: PDFs, Word docs, PowerPoints, Emails, and more

📦 Features

  • ✅ Load files from a single file or a directory
  • ✅ Automatically detects file type
  • ✅ Converts rows to langchain.schema.Document objects
  • ✅ Extracts metadata (source file, sheet name)
  • ✅ Logs file loading progress and errors
  • ✅ Supports LangChain-compatible document structure

🛠 Installation

pip install pandas langchain langchain-unstructured

🚀 Usage

from autoloader import AutoLoader

# Load from a single file or directory
loader = AutoLoader(path="./data")

# Process all supported files
documents = loader.load()

# Join and format documents into a single string
structured_docs = "\n\n".join(
    f"[Source: {doc.metadata.get('source', 'unknown')}]\n{doc.page_content}"
    for doc in documents
)

# Print the first 1000 characters
print(structured_docs[:1000])
 

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

auto_doc_loader-0.1.0.tar.gz (4.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

auto_doc_loader-0.1.0-py3-none-any.whl (4.6 kB view details)

Uploaded Python 3

File details

Details for the file auto_doc_loader-0.1.0.tar.gz.

File metadata

  • Download URL: auto_doc_loader-0.1.0.tar.gz
  • Upload date:
  • Size: 4.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for auto_doc_loader-0.1.0.tar.gz
Algorithm Hash digest
SHA256 100ae0948121eedcdbea2ef69531cfbd0c5e0801e9f8f31e1846ebac43cd1313
MD5 6862d399db448c01416512681c63ff71
BLAKE2b-256 734747f275992573d29a25d0a5159aeae83a58f6f9a6c3597494c5d1ec3d3628

See more details on using hashes here.

File details

Details for the file auto_doc_loader-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for auto_doc_loader-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 cc39a1b34318eb91012cc85b97d107857e251232d30377f3df9ef31b19c22a9d
MD5 5e5f69eac28c8028e98c0db389b82f31
BLAKE2b-256 f963b8a8597aeb8338074a65b769f00040dedca7a4f8ebbfc88981d31d0964d4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page