LLM powered Q&A over extracted PDF text
Project description
PDF Chatter
Question Answering over PDFs using Nougat-OCR and GPT-4.
Getting Started
Prerequisites
- Python 3.9 or later
- a NVIDIA GPU with CUDA support
- environment variable
OPENAI_API_KEY
set to your OpenAI API key
Installation
pip install pdf-chatter
Usage
pdf-chatter path/to/pdf
which opens a REPL where you can ask questions, and GPT-4 will answer them based on the content of the PDF.
Note: pdf-chatter will save a .mmd (multi-markdown) next to the target pdf. This contains the extracted text from the PDF, and is used as a cache so the same PDF doesn't need to be re-processed every time you run pdf-chatter.
Additionally you can run the summarize command to get a summary of the PDF before entering the REPL.
pdf-summarize path/to/pdf
Example
Tips & Notes
- Nougat-OCR doesn't extract images, so any questions about images in the document will not be answered
- Nougart-OCR works best on documents similar to scientific papers, reports, etc.
How it works
- Extract text from the PDF using Nougat-OCR
- The entire document is fed to GPT-4 as part of its chat history via the OpenAI API
- A simple REPL collects the user's questions and feeds them to GPT-4, which streams the answer back.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file pdf_chatter-0.1.5.tar.gz
.
File metadata
- Download URL: pdf_chatter-0.1.5.tar.gz
- Upload date:
- Size: 5.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.7.1 CPython/3.10.12 Linux/6.2.0-1019-azure
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 025446db188651c4e087b26938bf94c5fbd2f0ee7558a5f2dc8605d7192a1f81 |
|
MD5 | eaf9ba7d7184f0d6b6615f90a55326ec |
|
BLAKE2b-256 | d3aa125e7589eee256e11593964a848fcacd8cc795b10e200ff2a27d4e5a2423 |
File details
Details for the file pdf_chatter-0.1.5-py3-none-any.whl
.
File metadata
- Download URL: pdf_chatter-0.1.5-py3-none-any.whl
- Upload date:
- Size: 6.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.7.1 CPython/3.10.12 Linux/6.2.0-1019-azure
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4088750fdcced7f8961ed47d15fca4e14f4027edb1d845ba2eb5df796fac7b11 |
|
MD5 | 1738665bf97a5b288bc3ae11aa447a83 |
|
BLAKE2b-256 | e1140a865fb1f2225b27b011e69f5af66fde9dbaef4dbcdd5306b974265bb1cb |