A Python library for document-based RAG
Project description
rag-kit
rag-kit is a simple, modular Python library for building PDF-based RAG applications with conversational memory and flexible LLM provider support.
It is designed to hide most of the LangChain complexity behind a clean API:
from ragkit import PDFRAG
rag = PDFRAG("data/sample.pdf")
print(rag.ask("What is LangChain?"))
Features
- PDF-based RAG
- Conversational chat with session memory
- Follow-up handling for queries like:
hindi m bataotell me in englishwhat did I ask earlier?
- Query rewriting for better retrieval
- Source return support
- Configurable chunking and retrieval
- Multiple LLM provider support:
- Sarvam (default)
- OpenAI
- Anthropic / Claude
- Custom LangChain-compatible chat models
Installation
Basic install
pip install rag-kit
Optional provider extras
pip install "rag-kit[openai]"
pip install "rag-kit[anthropic]"
pip install "rag-kit[all]"
Local development install
pip install -e .
Environment Variables
Create a .env file in your project root:
SARVAM_API_KEY=
OPENAI_API_KEY=
ANTHROPIC_API_KEY=
An example template is provided in .env.example.
Quick Start
Stateless Q&A
from ragkit import PDFRAG
rag = PDFRAG("data/sample.pdf")
answer = rag.ask("What is memory?")
print(answer)
Chat with memory
from ragkit import PDFRAG
rag = PDFRAG("data/sample.pdf")
session_id = "user1"
print(rag.chat("What is memory?", session_id=session_id))
print(rag.chat("hindi m batao", session_id=session_id))
print(rag.chat("tell me in english", session_id=session_id))
Return sources
from ragkit import PDFRAG
rag = PDFRAG("data/sample.pdf")
result = rag.ask("What is memory?", return_sources=True)
print(result["answer"])
print(result["sources"])
Example shape:
{
"answer": "Memory in LangChain stores previous conversation turns...",
"sources": [
{
"content": "Memory in chat applications is created by storing earlier conversation turns...",
"page": 2,
"source": "data/sample.pdf",
"metadata": {
"page": 2,
"source": "data/sample.pdf"
}
}
]
}
ask() vs chat()
| Method | Purpose |
|---|---|
ask() |
Stateless document Q&A |
chat() |
History-aware conversational interaction |
Use ask() when you want a direct answer from the document.
Use chat() when you want:
- follow-up questions
- translation of the previous answer
- history-based conversation
LLM Providers
Default: Sarvam
from ragkit import PDFRAG
rag = PDFRAG("file.pdf")
OpenAI
from ragkit import PDFRAG
rag = PDFRAG(
"file.pdf",
llm_provider="openai",
llm_config={
"model": "gpt-4o-mini",
"temperature": 0.1,
},
)
Claude
from ragkit import PDFRAG
rag = PDFRAG(
"file.pdf",
llm_provider="claude",
llm_config={
"model": "claude-3-5-haiku-latest",
"temperature": 0.2,
},
)
Custom LLM
from langchain_openai import ChatOpenAI
from ragkit import PDFRAG
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
rag = PDFRAG("file.pdf", llm=llm)
Configuration
from ragkit import PDFRAG, RAGConfig
config = RAGConfig(
chunk_size=800,
chunk_overlap=150,
top_k=5,
use_multi_query=True,
enable_query_rewrite=True,
)
rag = PDFRAG("file.pdf", config=config)
Configurable options currently include:
persist_directorychunk_sizechunk_overlaptop_kuse_multi_queryenable_query_rewritecollection_nameverbosellm_providerllm_modelllm_temperaturellm_kwargs
Add More Documents
rag.add_documents("data/another.pdf")
Reset Chat
rag.reset_chat("user1")
Project Structure
rag-kit/
├── .env.example
├── .gitignore
├── README.md
├── pyproject.toml
├── examples/
├── data/
├── src/
│ └── ragkit/
└── third_party/
Do You Need requirements.txt?
Not necessarily.
For modern Python packaging, pyproject.toml is enough and should be the main source of dependencies.
Use requirements.txt only if you want one of these:
- easier local setup for teammates
- pinned development environment
- quick install for people who do not use packaging workflows
Recommendation
Keep:
pyproject.tomlas the main dependency file
Optional:
requirements-dev.txtfor local development and testing
Example requirements-dev.txt:
pytest
black
ruff
build
twine
If you want, you can also generate a plain requirements.txt, but it should not replace pyproject.toml.
Current Limitations
- Primarily optimized for PDF-based RAG
- Sarvam support may depend on vendored or local integration setup
- No streaming support yet
- No FastAPI server or UI layer yet
- Agent support is planned, but not included in the current public API
Roadmap
- Better source citations
- Improved multi-file indexing isolation
- Streaming responses
- FastAPI server mode
- Playground / UI
- Agent support via
ragkit.agent
Examples
Check the examples/ folder for runnable examples such as:
basic_ask.pychat_example.pyprovider_openai.py
License
MIT License
Version
Current version: 0.1.0-beta
APIs may evolve in future releases.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file nishant_ragkit-0.1.10.tar.gz.
File metadata
- Download URL: nishant_ragkit-0.1.10.tar.gz
- Upload date:
- Size: 10.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a1b0aa06fa7567420073c06f11f22f19a62c850c65cfbad90c1211a87397d524
|
|
| MD5 |
166d77c2dd9b7d9a45de493bde89f8b9
|
|
| BLAKE2b-256 |
1ce5ea10acba10aade1081289ed2d4adf1ed33ff4069879714d9a17ac62dff8b
|
File details
Details for the file nishant_ragkit-0.1.10-py3-none-any.whl.
File metadata
- Download URL: nishant_ragkit-0.1.10-py3-none-any.whl
- Upload date:
- Size: 14.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7b572ba5e6ecd96355384bac5d885aafc755d054e4c32de594f0df2ef20f4b38
|
|
| MD5 |
a3ccd99efbd937c9a143bda06fb9ba99
|
|
| BLAKE2b-256 |
e4701ad402ff20deed1c5b4e2d1bb6387d1e00f6746c66256877c9c132329320
|