A module to extract text from documents and chat with the content.
Project description
DOCK_BYTE
The DOCK_BYTE module provides tools for extracting text from PDF and TXT documents and enables interactive chat-based exploration of the extracted content using a language model. It leverages various libraries for document processing and integrates with Streamlit for a GUI-based interface.
Features
- Extract text from PDF documents using PyMuPDF.
- Perform OCR on PDF documents using Tesseract.
- Extract text from TXT files.
- Use a language model to chat with the content of the documents.
- GUI support with Streamlit for interactive usage.
Installation
pip install DOCK_BYTE
Usage
from my_module import chat_with_doc
chat_with_doc("gemma:2b", "data.txt", use_gui=True)
License
This project is licensed under the MIT License - see the LICENSE file for details.
Repository
For more information and to contribute, please visit the GitHub repository.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file DOCK_BYTE-0.1.tar.gz.
File metadata
- Download URL: DOCK_BYTE-0.1.tar.gz
- Upload date:
- Size: 3.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.11.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
432d25ecfa8a712d66cd43d2ac3649ee796927c98bc7cc4cab02af8e43cbed8c
|
|
| MD5 |
be47a71f4e5a5d76c33e5ab319520104
|
|
| BLAKE2b-256 |
dba35c10421b5a58ad49b0cb4b710794a77fbdfa0c5df14cac9495bd2cbda0cb
|
File details
Details for the file DOCK_BYTE-0.1-py3-none-any.whl.
File metadata
- Download URL: DOCK_BYTE-0.1-py3-none-any.whl
- Upload date:
- Size: 4.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.11.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
44701bfea598ab38fb7be3c58d8300b8f4aee877741b3da4545d795f46e2220d
|
|
| MD5 |
f1a33772fa499300f180b2b75b19b9f3
|
|
| BLAKE2b-256 |
f633ce7fcddf042193052a9ecb894632d0f88081c1515b4f24eeaf3c2a3d3e7e
|