Skip to main content

Chat with your PDFs using local embedding search and OpenAI.

Project description

📄 Chat with PDF

PyPI version Build Status

Chat with your PDF documents easily using local embeddings and powerful LLMs through a unified SDK. Upload any PDF and ask natural language questions about its content — powered by semantic search and AI.


🛠️ Installation

pip install chat-with-pdf

Or using Poetry:

poetry add chat-with-pdf

✨ Quickstart Example

  1. Set your credentials and optionally choose a model/provider:
# Default provider key
export OPENAI_API_KEY="sk-your-openai-key"

# The model for the provider
export OPENAI_MODEL="gpt-4"

# Switch to another provider (e.g., perplexity, openai or deepseek)
export LLM_PROVIDER="perplexity"
  1. Use the SDK to chat with any PDF:
from chat_with_pdf import PDFChat

# Local PDF file
chat = PDFChat("path/to/your/document.pdf")
print(chat.ask("Summarize the introduction section."))

# Remote URL
chat = PDFChat("https://example.com/sample.pdf")
print(chat.ask("What is the main point of this document?"))

# PDF in memory
with open("path/to/your/document.pdf", "rb") as f:
    data = f.read()
chat = PDFChat(data)
print(chat.ask("Give me a brief overview."))

⚙️ Configuration Options

Configure via environment variables (in order of precedence):

Variable Purpose Default
LLM_PROVIDER Provider to use (openai, perplexity, deepseek) openai
OPENAI_API_KEY Your OpenAI API key
OPENAI_MODEL GPT model name (used for all providers) gpt-3.5-turbo
EMBEDDING_MODEL Embedding model all-MiniLM-L6-v2
DEFAULT_CHUNK_SIZE Characters per text chunk 500
TOP_K_RETRIEVAL Number of chunks to retrieve per query 5

💡 For local development, you can also create a .env file with these variables and the SDK will load it automatically.


🔥 Advanced Usage

Override provider/model at runtime:

from chat_with_pdf import PDFChat

# Use GPT-4 on OpenAI
chat = PDFChat("doc.pdf")
print(chat.ask("What are the key findings?", provider="openai", model="gpt-4"))

# Use DeepSeek
print(chat.ask("Summarize", provider="deepseek", model="deepseek-chat"))

# Use Perplexity
print(chat.ask("Summarize", provider="perplexity", model="sonar"))

📝 License

This project is licensed under the MIT License.


🌟 Acknowledgements

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chat_with_pdf-0.4.1.tar.gz (9.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

chat_with_pdf-0.4.1-py3-none-any.whl (12.6 kB view details)

Uploaded Python 3

File details

Details for the file chat_with_pdf-0.4.1.tar.gz.

File metadata

  • Download URL: chat_with_pdf-0.4.1.tar.gz
  • Upload date:
  • Size: 9.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.4

File hashes

Hashes for chat_with_pdf-0.4.1.tar.gz
Algorithm Hash digest
SHA256 0546a8803ad65f6399bc75ac75e6be221e5f1193bb935fbf10ae5d271db00191
MD5 7708bbfa639f7f1dad95e75a6f888749
BLAKE2b-256 ad008f70a82c3a4e33ad4bba2482aa6e86cb7da265196ed2a5d5a491b79f5db8

See more details on using hashes here.

File details

Details for the file chat_with_pdf-0.4.1-py3-none-any.whl.

File metadata

  • Download URL: chat_with_pdf-0.4.1-py3-none-any.whl
  • Upload date:
  • Size: 12.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.4

File hashes

Hashes for chat_with_pdf-0.4.1-py3-none-any.whl
Algorithm Hash digest
SHA256 dd049cd5e45fa0c6673b04b606672ac5c8b89dc0b43c40bf9d5e2aefc7c98f8f
MD5 87bfb710404ca9f67e86ebc19ecceaa6
BLAKE2b-256 5dfd9210150c41c02bf17d5862f50357dbe432de1a2e994513bc87ca8bbd9ceb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page