Skip to main content

Chat with your PDFs using local embedding search and OpenAI.

Project description

📄 Chat with PDF

PyPI version Build Status

Chat with your PDF documents easily using local embeddings and powerful LLMs through a unified SDK. Upload any PDF and ask natural language questions about its content — powered by semantic search and AI.


🛠️ Installation

pip install chat-with-pdf

Or using Poetry:

poetry add chat-with-pdf

✨ Quickstart Example

  1. Set your credentials and optionally choose a model/provider:
# Default provider key
export OPENAI_API_KEY="sk-your-openai-key"

# The model for the provider
export OPENAI_MODEL="gpt-4"

# Switch to another provider (e.g., perplexity, openai or deepseek)
export LLM_PROVIDER="perplexity"
  1. Use the SDK to chat with any PDF:
from chat_with_pdf import PDFChat

# Local PDF file
chat = PDFChat("path/to/your/document.pdf")
print(chat.ask("Summarize the introduction section."))

# Remote URL
chat = PDFChat("https://example.com/sample.pdf")
print(chat.ask("What is the main point of this document?"))

# PDF in memory
with open("path/to/your/document.pdf", "rb") as f:
    data = f.read()
chat = PDFChat(data)
print(chat.ask("Give me a brief overview."))

⚙️ Configuration Options

Configure via environment variables (in order of precedence):

Variable Purpose Default
LLM_PROVIDER Provider to use (openai, perplexity, deepseek) openai
OPENAI_API_KEY Your OpenAI API key
OPENAI_MODEL GPT model name (used for all providers) gpt-3.5-turbo
EMBEDDING_MODEL Embedding model all-MiniLM-L6-v2
DEFAULT_CHUNK_SIZE Characters per text chunk 500
TOP_K_RETRIEVAL Number of chunks to retrieve per query 5

💡 For local development, you can also create a .env file with these variables and the SDK will load it automatically.

--------------------- | :------------------------------------------------ | :------------------------ | | LLM_PROVIDER | Provider to use (openai, perplexity, deepseek) | openai | | OPENAI_API_KEY | Your OpenAI API key | — | | OPENAI_MODEL | GPT model name (OpenAI) | gpt-3.5-turbo | | PERPLEXITY_API_KEY | Your Perplexity API key | — | | PERPLEXITY_MODEL | Model name (Perplexity) | perplexity-v1 | | DEEPSEEK_API_KEY | Your DeepSeek API key | — | | DEEPSEEK_MODEL | Model name (DeepSeek) | deepseek-v1 | | EMBEDDING_MODEL | Embedding model | all-MiniLM-L6-v2 | | DEFAULT_CHUNK_SIZE | Characters per text chunk | 500 | | TOP_K_RETRIEVAL | Number of chunks to retrieve per query | 5 |

💡 For local development, you can also create a .env file with these variables and the SDK will load it automatically.


🔥 Advanced Usage

Override provider/model at runtime:

from chat_with_pdf import PDFChat

# Use GPT-4 on OpenAI
chat = PDFChat("doc.pdf")
print(chat.ask("What are the key findings?", provider="openai", model="gpt-4"))

# Use DeepSeek
print(chat.ask("Summarize", provider="deepseek"))

📝 License

This project is licensed under the MIT License.


🌟 Acknowledgements

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chat_with_pdf-0.4.0.tar.gz (9.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

chat_with_pdf-0.4.0-py3-none-any.whl (12.7 kB view details)

Uploaded Python 3

File details

Details for the file chat_with_pdf-0.4.0.tar.gz.

File metadata

  • Download URL: chat_with_pdf-0.4.0.tar.gz
  • Upload date:
  • Size: 9.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.4

File hashes

Hashes for chat_with_pdf-0.4.0.tar.gz
Algorithm Hash digest
SHA256 09f1e0a0ab93ffac7436344b96103163661e48fc60a9ac407975aee7139add9e
MD5 ebb85b04e36a41b224b24ade7f38acce
BLAKE2b-256 d9f4386a41902a85d3b9895c56e2ce9ea0f7592c2d19f3251983f51ad47a30bb

See more details on using hashes here.

File details

Details for the file chat_with_pdf-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: chat_with_pdf-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 12.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.4

File hashes

Hashes for chat_with_pdf-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9b28cb925058638db1c036e69ec5fc9b626147fe07d642371559db78c4e96397
MD5 42c291ea62052aa0d961e39829baf80e
BLAKE2b-256 ecda9246c58d14681fe7ffdb968a2af35939790101cbf7848cd35e248d55a735

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page