Skip to main content

Chat with your PDFs using local embedding search and OpenAI.

Project description

📄 Chat with PDF

PyPI version Build Status

Chat with your PDF documents easily using local embeddings and powerful LLMs through a unified SDK. Upload any PDF and ask natural language questions about its content — powered by semantic search and AI.


🛠️ Installation

pip install chat-with-pdf

Or using Poetry:

poetry add chat-with-pdf

✨ Quickstart Example

  1. Set your credentials and optionally choose a model/provider:
# Default provider key
export OPENAI_API_KEY="sk-your-openai-key"

# The model for the provider
export OPENAI_MODEL="gpt-4"

# Switch to another provider (e.g., perplexity, openai or deepseek)
export LLM_PROVIDER="perplexity"
  1. Use the SDK to chat with any PDF:
from chat_with_pdf import PDFChat

# Local PDF file
chat = PDFChat("path/to/your/document.pdf")
print(chat.ask("Summarize the introduction section."))

# Remote URL
chat = PDFChat("https://example.com/sample.pdf")
print(chat.ask("What is the main point of this document?"))

# PDF in memory
with open("path/to/your/document.pdf", "rb") as f:
    data = f.read()
chat = PDFChat(data)
print(chat.ask("Give me a brief overview."))

⚙️ Configuration Options

Configure via environment variables (in order of precedence):

Variable Purpose Default
LLM_PROVIDER Provider to use (openai, perplexity, deepseek) openai
OPENAI_API_KEY Your OpenAI API key
OPENAI_MODEL GPT model name (used for all providers) gpt-3.5-turbo
EMBEDDING_MODEL Embedding model all-MiniLM-L6-v2
DEFAULT_CHUNK_SIZE Characters per text chunk 500
TOP_K_RETRIEVAL Number of chunks to retrieve per query 5

💡 For local development, you can also create a .env file with these variables and the SDK will load it automatically.

--------------------- | :------------------------------------------------ | :------------------------ | | LLM_PROVIDER | Provider to use (openai, perplexity, deepseek) | openai | | OPENAI_API_KEY | Your OpenAI API key | — | | OPENAI_MODEL | GPT model name (OpenAI) | gpt-3.5-turbo | | PERPLEXITY_API_KEY | Your Perplexity API key | — | | PERPLEXITY_MODEL | Model name (Perplexity) | perplexity-v1 | | DEEPSEEK_API_KEY | Your DeepSeek API key | — | | DEEPSEEK_MODEL | Model name (DeepSeek) | deepseek-v1 | | EMBEDDING_MODEL | Embedding model | all-MiniLM-L6-v2 | | DEFAULT_CHUNK_SIZE | Characters per text chunk | 500 | | TOP_K_RETRIEVAL | Number of chunks to retrieve per query | 5 |

💡 For local development, you can also create a .env file with these variables and the SDK will load it automatically.


🔥 Advanced Usage

Override provider/model at runtime:

from chat_with_pdf import PDFChat

# Use GPT-4 on OpenAI
chat = PDFChat("doc.pdf")
print(chat.ask("What are the key findings?", provider="openai", model="gpt-4"))

# Use DeepSeek
print(chat.ask("Summarize", provider="deepseek"))

📝 License

This project is licensed under the MIT License.


🌟 Acknowledgements

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chat_with_pdf-0.3.3.tar.gz (9.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

chat_with_pdf-0.3.3-py3-none-any.whl (12.7 kB view details)

Uploaded Python 3

File details

Details for the file chat_with_pdf-0.3.3.tar.gz.

File metadata

  • Download URL: chat_with_pdf-0.3.3.tar.gz
  • Upload date:
  • Size: 9.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.4

File hashes

Hashes for chat_with_pdf-0.3.3.tar.gz
Algorithm Hash digest
SHA256 541efd725a2c9ce374aff548c18efa8039059ef78b2f728e25d985e86a371fe7
MD5 4859c6e366bb84ddbf7d4f02dc70ecfe
BLAKE2b-256 3a13dc9e4663261a0efd81888711546d7581f299fcb2eecf54cf4614019cb456

See more details on using hashes here.

File details

Details for the file chat_with_pdf-0.3.3-py3-none-any.whl.

File metadata

  • Download URL: chat_with_pdf-0.3.3-py3-none-any.whl
  • Upload date:
  • Size: 12.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.4

File hashes

Hashes for chat_with_pdf-0.3.3-py3-none-any.whl
Algorithm Hash digest
SHA256 77c0198c3634a55917be38e57841005b8e45440678189471c5707fe645d1bd2e
MD5 55558443186efbb06e633830d2fe16ca
BLAKE2b-256 acc544a554346db81983550f07d768cedcaa47ede0e9d7dacbe0d770e7213d82

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page