AutoLoader for structured and unstructured documents using LangChain
Project description
AutoLoader
AutoLoader is a Python utility that automatically loads and processes both structured and unstructured documents using the LangChain framework.
It supports:
- Structured files: CSV, Excel (
.csv,.xlsx,.xls) - Unstructured files: PDFs, Word docs, PowerPoints, Emails, and more
📦 Features
- ✅ Load files from a single file or a directory
- ✅ Automatically detects file type
- ✅ Converts rows to
langchain.schema.Documentobjects - ✅ Extracts metadata (source file, sheet name)
- ✅ Logs file loading progress and errors
- ✅ Supports LangChain-compatible document structure
🛠 Installation
pip install pandas langchain langchain-unstructured
🚀 Usage
from autoloader import AutoLoader
# Load from a single file or directory
loader = AutoLoader(path="./data")
# Process all supported files
documents = loader.load()
# Join and format documents into a single string
structured_docs = "\n\n".join(
f"[Source: {doc.metadata.get('source', 'unknown')}]\n{doc.page_content}"
for doc in documents
)
# Print the first 1000 characters
print(structured_docs[:1000])
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
auto_doc_loader-0.1.0.tar.gz
(4.1 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file auto_doc_loader-0.1.0.tar.gz.
File metadata
- Download URL: auto_doc_loader-0.1.0.tar.gz
- Upload date:
- Size: 4.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
100ae0948121eedcdbea2ef69531cfbd0c5e0801e9f8f31e1846ebac43cd1313
|
|
| MD5 |
6862d399db448c01416512681c63ff71
|
|
| BLAKE2b-256 |
734747f275992573d29a25d0a5159aeae83a58f6f9a6c3597494c5d1ec3d3628
|
File details
Details for the file auto_doc_loader-0.1.0-py3-none-any.whl.
File metadata
- Download URL: auto_doc_loader-0.1.0-py3-none-any.whl
- Upload date:
- Size: 4.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cc39a1b34318eb91012cc85b97d107857e251232d30377f3df9ef31b19c22a9d
|
|
| MD5 |
5e5f69eac28c8028e98c0db389b82f31
|
|
| BLAKE2b-256 |
f963b8a8597aeb8338074a65b769f00040dedca7a4f8ebbfc88981d31d0964d4
|