Skip to main content

A module to extract text from documents and chat with the content.

Project description

DOCK_BYTE

The DOCK_BYTE module provides tools for extracting text from PDF and TXT documents and enables interactive chat-based exploration of the extracted content using a language model. It leverages various libraries for document processing and integrates with Streamlit for a GUI-based interface.

Features

  • Extract text from PDF documents using PyMuPDF.
  • Perform OCR on PDF documents using Tesseract.
  • Extract text from TXT files.
  • Use a language model to chat with the content of the documents.
  • GUI support with Streamlit for interactive usage.

Installation

pip install DOCK_BYTE

Usage

from my_module import chat_with_doc

chat_with_doc("gemma:2b", "data.txt", use_gui=True)

License

This project is licensed under the MIT License - see the LICENSE file for details.

Repository

For more information and to contribute, please visit the GitHub repository.

Project details


Release history Release notifications | RSS feed

This version

0.1

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

DOCK_BYTE-0.1.tar.gz (3.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

DOCK_BYTE-0.1-py3-none-any.whl (4.1 kB view details)

Uploaded Python 3

File details

Details for the file DOCK_BYTE-0.1.tar.gz.

File metadata

  • Download URL: DOCK_BYTE-0.1.tar.gz
  • Upload date:
  • Size: 3.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.2

File hashes

Hashes for DOCK_BYTE-0.1.tar.gz
Algorithm Hash digest
SHA256 432d25ecfa8a712d66cd43d2ac3649ee796927c98bc7cc4cab02af8e43cbed8c
MD5 be47a71f4e5a5d76c33e5ab319520104
BLAKE2b-256 dba35c10421b5a58ad49b0cb4b710794a77fbdfa0c5df14cac9495bd2cbda0cb

See more details on using hashes here.

File details

Details for the file DOCK_BYTE-0.1-py3-none-any.whl.

File metadata

  • Download URL: DOCK_BYTE-0.1-py3-none-any.whl
  • Upload date:
  • Size: 4.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.2

File hashes

Hashes for DOCK_BYTE-0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 44701bfea598ab38fb7be3c58d8300b8f4aee877741b3da4545d795f46e2220d
MD5 f1a33772fa499300f180b2b75b19b9f3
BLAKE2b-256 f633ce7fcddf042193052a9ecb894632d0f88081c1515b4f24eeaf3c2a3d3e7e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page