docitup is a Python package designed to simplify document processing for LangChain. It provides various loaders to extract content from different file types and convert them into LangChain-compatible document classes, ready for storage in LangChain-supported vector stores.

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3
- Python :: 3.12

Project description

Docitup

This package provides various document loaders that utilize different methods for processing and chunking documents. It is designed to facilitate the loading of documents in various formats into a structured format suitable for using them with langchain vector databases

Overview

The package includes the following loaders:

PyMUPdf4LLMLoader: Loads and splits documents from files using the pymupdf4llm library.
MarkitdownLoader: Loads documents using the MarkItDown library.
LlamaparseLoader: Loads documents using the LlamaParse library and processes different file types.
DoclingPDFLoader: Converts documents to text and splits them accordingly.

Installation

To install this package, simply run:

pip install docitup

Usage

PyMUPdf4LLMLoader

from docitup import PyMUPdf4LLMLoader 
  
loader = PyMUPdf4LLMLoader(file_path='path/to/your/file.pdf')  
documents = loader.load()

MarkitdownLoader

from docitup import MarkitDownLoader
  
loader = MarkitdownLoader(file_path='path/to/your/file.md')  
documents = loader.load()

LlamaparseLoader

from docitup import LlamaparseLoader
from llama_parse.utils import ResultType
  
loader = LlamaparseLoader(file_path='path/to/your/directory', result_type=ResultType.MD, api_key='your_api_key')  
documents = loader.load()

DoclingPDFLoader

from docitup import DoclingLoader
  
loader = DoclingLoader(file_path='path/to/your/file.pdf')  
documents = loader.load()

FitzPyMUPDFLoader

from docitup import FitzPyMUPDFLoader
  
loader = FitzPyMUPDFLoader(file_path='path/to/your/file.pdf')  
documents = loader.load()

PyPdfLoader

from docitup import PyPdfLoader
  
loader = PyPdfLoader(file_path='path/to/your/file.pdf')  
documents = loader.load()

PyPdfLoader2

from docitup import PyPdfLoader2
  
loader = PyPdf2Loader(file_path='path/to/your/file.pdf')  
documents = loader.load()

Configuration Options

Each loader can be configured with the following optional parameters:

splitter_type: The type of text splitter to use ("recursive" or other).

chunk_size: The size of each chunk (default is 1000).

chunk_overlap: The number of overlapping characters between chunks (default is 100).

Example Usage with all parameters

from docitup import LlamaparseLoader

# Initialize the loader
loader = LlamaparseLoader(
    file_path="example.pdf",
    api_key="your_api_key",
    splitter_type="recursive",
    chunk_size=500,
    chunk_overlap=50,
    extra_metadata={"category": "example"}
)

# Load documents lazily
for document in loader.load():
    print("Text Chunk:", document.text)
    print("Metadata:", document.metadata)

Contributing

Contributions are welcome! Please feel free to submit issues or pull requests for improvements or bug fixes.

License

This project is licensed under the MIT License. See the LICENSE file for more information.

Acknowledgements

This package is made possible by the following libraries:

Project details

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3
- Python :: 3.12

Release history Release notifications | RSS feed

0.1.4

Jan 14, 2025

This version

0.1.3

Jan 4, 2025

0.1.2

Jan 4, 2025

0.1.1

Jan 1, 2025

0.1.0

Dec 31, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

docitup-0.1.3.tar.gz (7.8 kB view details)

Uploaded Jan 4, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

docitup-0.1.3-py3-none-any.whl (10.7 kB view details)

Uploaded Jan 4, 2025 Python 3

File details

Details for the file docitup-0.1.3.tar.gz.

File metadata

Download URL: docitup-0.1.3.tar.gz
Upload date: Jan 4, 2025
Size: 7.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.0.1 CPython/3.12.0

File hashes

Hashes for docitup-0.1.3.tar.gz
Algorithm	Hash digest
SHA256	`85322810bae1fc9d065966cfbca46d72a08f85d8ea93523e1079a45d2923ee57`
MD5	`eca47723ba15ee4fb6140b1e6611f7fd`
BLAKE2b-256	`7ba91bb6ce2a078a8b8fa44640e3a08127af368ac7ee920dff85324f664f09cf`

See more details on using hashes here.

File details

Details for the file docitup-0.1.3-py3-none-any.whl.

File metadata

Download URL: docitup-0.1.3-py3-none-any.whl
Upload date: Jan 4, 2025
Size: 10.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.0.1 CPython/3.12.0

File hashes

Hashes for docitup-0.1.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`6fc653c093ba321b982e54e763c7337b3e63b717cd154f53fcf1fee4089492d4`
MD5	`19ebf9afd1fa935d68ae7c5f787a68c2`
BLAKE2b-256	`0503d30be16e571cc9cd7fa9cf77bf18817a0ab94ccc0c84ee420f28cd747078`

See more details on using hashes here.

docitup 0.1.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Docitup

Overview

Installation

Usage

PyMUPdf4LLMLoader

MarkitdownLoader

LlamaparseLoader

DoclingPDFLoader

FitzPyMUPDFLoader

PyPdfLoader

PyPdfLoader2

Configuration Options

Example Usage with all parameters

Contributing

License

Acknowledgements

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes