Skip to main content

Chat with your PDFs using local embedding search and OpenAI.

Project description

📄 Chat with PDF

PyPI version Build Status

Chat with your PDF documents easily using local embeddings and powerful LLMs like OpenAI's GPT models.

Chat with your PDF documents easily using local embeddings and powerful LLMs like OpenAI's GPT models.

Upload any PDF and ask natural language questions about its content — powered by semantic search and AI.


🛠️ Installation

pip install chat-with-pdf

Or using Poetry:

poetry add chat-with-pdf

✨ Quickstart Example

from chat_with_pdf import PDFChat

chat = PDFChat('path/to/your/document.pdf')


response = chat.ask("Summarize the introduction section.")
print(response)

You can pass a file path, URL, or binary bytes of the PDF to PDFChat.

Example:

chat = PDFChat("path/to/file.pdf")
chat = PDFChat("https://example.com/file.pdf")
chat = PDFChat(binary_pdf_data)

⚙️ Configuration Options

You can configure your usage via arguments, environment variables, or let it fallback to defaults.

Priority:

  1. Arguments passed to PDFChat
  2. Environment Variables
  3. Library defaults

Supported Environment Variables:

Variable Purpose Default
OPENAI_API_KEY Your OpenAI API key "" (empty)
OPENAI_MODEL GPT model name to use "gpt-3.5-turbo"
EMBEDDING_MODEL Embedding model for vector search "all-MiniLM-L6-v2"
DEFAULT_CHUNK_SIZE Number of characters per text chunk 500
TOP_K_RETRIEVAL Number of similar chunks to retrieve per question 5

Example .env file:

OPENAI_API_KEY=sk-xxxxx
OPENAI_MODEL=gpt-4
DEFAULT_CHUNK_SIZE=600
TOP_K_RETRIEVAL=8
EMBEDDING_MODEL=all-mpnet-base-v2

If you have a .env file at your project root, chat-with-pdf will automatically load it.


🔥 Advanced Usage Example

Explicitly passing all settings:

from chat_with_pdf import PDFChat

chat = PDFChat(
    'path/to/your/document.pdf',
    openai_api_key="sk-your-openai-key",
    model="gpt-4",
    embedding_model="all-mpnet-base-v2",
    chunk_size=600,
    top_k_retrieval=8
)

response = chat.ask("Summarize the key points.")
print(response)

📝 License

This project is licensed under the MIT License.


🌟 Acknowledgements

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chat_with_pdf-0.3.2.tar.gz (5.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

chat_with_pdf-0.3.2-py3-none-any.whl (7.6 kB view details)

Uploaded Python 3

File details

Details for the file chat_with_pdf-0.3.2.tar.gz.

File metadata

  • Download URL: chat_with_pdf-0.3.2.tar.gz
  • Upload date:
  • Size: 5.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.4

File hashes

Hashes for chat_with_pdf-0.3.2.tar.gz
Algorithm Hash digest
SHA256 c4db10c3a58669736bc69c26e7b91b5ef9c41bd6ff320570c50a98cd9130c469
MD5 cc774e031ebfc742461ab1087ed3b351
BLAKE2b-256 b084a58b19670babcca4aeef1a1014f173892194bcfb59e869f2489608ddbc92

See more details on using hashes here.

File details

Details for the file chat_with_pdf-0.3.2-py3-none-any.whl.

File metadata

  • Download URL: chat_with_pdf-0.3.2-py3-none-any.whl
  • Upload date:
  • Size: 7.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.4

File hashes

Hashes for chat_with_pdf-0.3.2-py3-none-any.whl
Algorithm Hash digest
SHA256 b05399259b19c2607af36571d867809e0f35e9a00195685614dfb4abf6857b10
MD5 06514d552473f13ca5d47060b488e9b9
BLAKE2b-256 88bf9afa8c249e780ac88919ff76abc42078172bdb38df227e7c2748f4e34799

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page