Chat with your PDFs using local embedding search and OpenAI.
Project description
📄 Chat with PDF
Chat with your PDF documents easily using local embeddings and powerful LLMs through a unified SDK. Upload any PDF and ask natural language questions about its content — powered by semantic search and AI.
🛠️ Installation
pip install chat-with-pdf
Or using Poetry:
poetry add chat-with-pdf
✨ Quickstart Example
- Set your credentials and optionally choose a model/provider:
# Default provider key
export OPENAI_API_KEY="sk-your-openai-key"
# The model for the provider
export OPENAI_MODEL="gpt-4"
# Switch to another provider (e.g., perplexity, openai or deepseek)
export LLM_PROVIDER="perplexity"
- Use the SDK to chat with any PDF:
from chat_with_pdf import PDFChat
# Local PDF file
chat = PDFChat("path/to/your/document.pdf")
print(chat.ask("Summarize the introduction section."))
# Remote URL
chat = PDFChat("https://example.com/sample.pdf")
print(chat.ask("What is the main point of this document?"))
# PDF in memory
with open("path/to/your/document.pdf", "rb") as f:
data = f.read()
chat = PDFChat(data)
print(chat.ask("Give me a brief overview."))
⚙️ Configuration Options
Configure via environment variables (in order of precedence):
| Variable | Purpose | Default |
|---|---|---|
LLM_PROVIDER |
Provider to use (openai, perplexity, deepseek) |
openai |
OPENAI_API_KEY |
Your OpenAI API key | — |
OPENAI_MODEL |
GPT model name (used for all providers) | gpt-3.5-turbo |
EMBEDDING_MODEL |
Embedding model | all-MiniLM-L6-v2 |
DEFAULT_CHUNK_SIZE |
Characters per text chunk | 500 |
TOP_K_RETRIEVAL |
Number of chunks to retrieve per query | 5 |
💡 For local development, you can also create a
.envfile with these variables and the SDK will load it automatically.
--------------------- | :------------------------------------------------ | :------------------------ |
| LLM_PROVIDER | Provider to use (openai, perplexity, deepseek) | openai |
| OPENAI_API_KEY | Your OpenAI API key | — |
| OPENAI_MODEL | GPT model name (OpenAI) | gpt-3.5-turbo |
| PERPLEXITY_API_KEY | Your Perplexity API key | — |
| PERPLEXITY_MODEL | Model name (Perplexity) | perplexity-v1 |
| DEEPSEEK_API_KEY | Your DeepSeek API key | — |
| DEEPSEEK_MODEL | Model name (DeepSeek) | deepseek-v1 |
| EMBEDDING_MODEL | Embedding model | all-MiniLM-L6-v2 |
| DEFAULT_CHUNK_SIZE | Characters per text chunk | 500 |
| TOP_K_RETRIEVAL | Number of chunks to retrieve per query | 5 |
💡 For local development, you can also create a
.envfile with these variables and the SDK will load it automatically.
🔥 Advanced Usage
Override provider/model at runtime:
from chat_with_pdf import PDFChat
# Use GPT-4 on OpenAI
chat = PDFChat("doc.pdf")
print(chat.ask("What are the key findings?", provider="openai", model="gpt-4"))
# Use DeepSeek
print(chat.ask("Summarize", provider="deepseek"))
📝 License
This project is licensed under the MIT License.
🌟 Acknowledgements
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file chat_with_pdf-0.3.3.tar.gz.
File metadata
- Download URL: chat_with_pdf-0.3.3.tar.gz
- Upload date:
- Size: 9.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
541efd725a2c9ce374aff548c18efa8039059ef78b2f728e25d985e86a371fe7
|
|
| MD5 |
4859c6e366bb84ddbf7d4f02dc70ecfe
|
|
| BLAKE2b-256 |
3a13dc9e4663261a0efd81888711546d7581f299fcb2eecf54cf4614019cb456
|
File details
Details for the file chat_with_pdf-0.3.3-py3-none-any.whl.
File metadata
- Download URL: chat_with_pdf-0.3.3-py3-none-any.whl
- Upload date:
- Size: 12.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
77c0198c3634a55917be38e57841005b8e45440678189471c5707fe645d1bd2e
|
|
| MD5 |
55558443186efbb06e633830d2fe16ca
|
|
| BLAKE2b-256 |
acc544a554346db81983550f07d768cedcaa47ede0e9d7dacbe0d770e7213d82
|