Skip to main content

LLM powered Q&A over extracted PDF text

Project description

PDF Chatter

Question Answering over PDFs using Nougat-OCR and GPT-4.

Getting Started

Prerequisites

  • Python 3.9 or later
  • a NVIDIA GPU with CUDA support
  • environment variable OPENAI_API_KEY set to your OpenAI API key

Installation

pip install pdf-chatter

Usage

pdf-chatter path/to/pdf

which opens a REPL where you can ask questions, and GPT-4 will answer them based on the content of the PDF.

Note: pdf-chatter will save a .mmd (multi-markdown) next to the target pdf. This contains the extracted text from the PDF, and is used as a cache so the same PDF doesn't need to be re-processed every time you run pdf-chatter.

Additionally you can run the summarize command to get a summary of the PDF before entering the REPL.

pdf-summarize path/to/pdf

Example

Tips & Notes

  • Nougat-OCR doesn't extract images, so any questions about images in the document will not be answered
  • Nougart-OCR works best on documents similar to scientific papers, reports, etc.

How it works

  1. Extract text from the PDF using Nougat-OCR
  2. The entire document is fed to GPT-4 as part of its chat history via the OpenAI API
  3. A simple REPL collects the user's questions and feeds them to GPT-4, which streams the answer back.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdf_chatter-0.1.5.tar.gz (5.0 kB view details)

Uploaded Source

Built Distribution

pdf_chatter-0.1.5-py3-none-any.whl (6.4 kB view details)

Uploaded Python 3

File details

Details for the file pdf_chatter-0.1.5.tar.gz.

File metadata

  • Download URL: pdf_chatter-0.1.5.tar.gz
  • Upload date:
  • Size: 5.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.7.1 CPython/3.10.12 Linux/6.2.0-1019-azure

File hashes

Hashes for pdf_chatter-0.1.5.tar.gz
Algorithm Hash digest
SHA256 025446db188651c4e087b26938bf94c5fbd2f0ee7558a5f2dc8605d7192a1f81
MD5 eaf9ba7d7184f0d6b6615f90a55326ec
BLAKE2b-256 d3aa125e7589eee256e11593964a848fcacd8cc795b10e200ff2a27d4e5a2423

See more details on using hashes here.

File details

Details for the file pdf_chatter-0.1.5-py3-none-any.whl.

File metadata

  • Download URL: pdf_chatter-0.1.5-py3-none-any.whl
  • Upload date:
  • Size: 6.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.7.1 CPython/3.10.12 Linux/6.2.0-1019-azure

File hashes

Hashes for pdf_chatter-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 4088750fdcced7f8961ed47d15fca4e14f4027edb1d845ba2eb5df796fac7b11
MD5 1738665bf97a5b288bc3ae11aa447a83
BLAKE2b-256 e1140a865fb1f2225b27b011e69f5af66fde9dbaef4dbcdd5306b974265bb1cb

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page