Local-first RAG pipeline — PDF/Markdown ingestion, Qdrant retrieval, bge reranking, and an answer-quality eval harness. Pairs with turboquant-ml for quantized LLM serving.
Project description
RAGforge
Local-first RAG pipeline — PDF & Markdown ingestion, Qdrant retrieval, bge reranking, and an answer-quality eval harness.
Pairs with turboquant-ml for quantized LLM serving.
Why RAGforge?
Most "RAG starter" repos are a 30-line glue between LangChain and OpenAI that nobody can reproduce because it hides retrieval quality, reranking, latency, and cost behind a single .invoke() call. RAGforge is the opposite: a small, readable, local-first pipeline that you can run end-to-end on your own laptop with open-source models, and that ships with an answer-quality eval harness so you can actually measure what changing a knob does.
Three opinions:
- Local-first. Default everywhere is open-source: BAAI/bge-small-en-v1.5 for embeddings, BAAI/bge-reranker-base for reranking, Qdrant in embedded mode (no Docker required), and any HuggingFace causal LM for generation. No OpenAI key required to try the project.
- Measurable. Every change should answer the question "did the answer get better?". RAGforge ships
ragforge evalwith built-incontext_recall,answer_relevance, andfaithfulnessmetrics — no RAGAS dependency required, but RAGAS-compatible. - Composable, not framework-y. Each stage (ingest, embed, retrieve, rerank, generate, evaluate) is one short module behind a small interface. Swap the encoder, swap the vector store, swap the LLM — no
Runnable.invoke()magic to debug.
Features
| Stage | Default | Swappable for |
|---|---|---|
| Ingest | PDF (pypdf), Markdown (markdown-it-py) | Anything that yields (text, metadata) |
| Chunk | Recursive char splitter, ~512 tokens, 64 overlap | Token-aware splitter, sentence splitter |
| Embed | BAAI/bge-small-en-v1.5 (sentence-transformers) |
Any sentence-transformers model |
| Vector store | Qdrant (embedded, no server required) | Qdrant remote, in-memory NumPy backend |
| Rerank | BAAI/bge-reranker-base |
Any cross-encoder |
| LLM | Any HF causal LM | Same model, NF4-quantized via turboquant-ml |
| Eval | context_recall, answer_relevance, faithfulness |
RAGAS, hand-rolled |
| Serve | FastAPI /ingest, /ask, /eval |
— |
| CLI | ragforge ingest / ask / eval / serve |
— |
Installation
The PyPI distribution is named ragforge-ml (the unsuffixed ragforge
name was taken by an unrelated project). Python import and CLI are just
ragforge / rf:
pip install ragforge-ml # core
pip install "ragforge-ml[serve]" # + FastAPI
pip install "ragforge-ml[quantized]" # + turboquant-ml NF4 path
pip install "ragforge-ml[all]" # everything
60-second tour
from ragforge import Pipeline
rag = Pipeline.from_defaults(model_id="Qwen/Qwen2.5-3B-Instruct")
rag.ingest(["docs/policy.pdf", "notes/onboarding.md"])
answer = rag.ask("What is the maximum reimbursable amount for client lunches?")
print(answer.text)
for src in answer.sources:
print(f" {src.score:.3f} {src.metadata['path']}#chunk{src.metadata['chunk']}")
CLI
rf ingest docs/ --collection company
rf ask "How do I rotate an API key?" --collection company --k 5
rf eval datasets/qa.jsonl --collection company --metrics context_recall,faithfulness
rf serve --collection company --host 0.0.0.0 --port 8080
Quantized LLM via TurboQuant
from ragforge import Pipeline
from ragforge.llm import QuantizedHFLLM
llm = QuantizedHFLLM("meta-llama/Llama-3.2-3B-Instruct", method="bnb-nf4")
rag = Pipeline.from_defaults(llm=llm)
Architecture
ragforge/
├── ingest/ # PDF + Markdown loaders, chunking
├── embed/ # sentence-transformers wrapper
├── vectorstore/ # Qdrant embedded + remote
├── rerank/ # bge-reranker-base
├── llm/ # HF causal LM + turboquant-ml integration
├── pipeline.py # The end-to-end orchestrator
├── eval/ # context_recall, answer_relevance, faithfulness
├── serve/ # FastAPI app
└── cli.py # ragforge / rf
Each module is short, readable, and replaceable through a small interface
(Encoder, VectorStore, Reranker, LLM). The pipeline calls them in
order — no DAG, no runnables, no callbacks.
Eval harness
The reason RAGforge exists. Most RAG projects ship without measuring whether their retrieval is any good. RAGforge ships three metrics in pure Python (no external API), all RAGAS-compatible:
| Metric | What it measures |
|---|---|
context_recall |
Of the gold-context tokens, what fraction were retrieved? |
answer_relevance |
Cosine similarity between the answer and synthetic questions back-generated from the answer (RAGAS recipe) |
faithfulness |
Fraction of answer claims that are entailed by the retrieved context (NLI-based, can fall back to embedding overlap) |
rf eval datasets/qa.jsonl --collection company
+---------------+--------+
| metric | mean |
+---------------+--------+
| context_recall| 0.84 |
| answer_rel | 0.78 |
| faithfulness | 0.91 |
+---------------+--------+
n=120 · latency_p50=620ms · latency_p95=1.4s
Roadmap
- PDF + Markdown ingestion
- Recursive char chunker with overlap
- BGE embeddings + BGE reranker
- Qdrant embedded + remote
- FastAPI serve
- CLI: ingest, ask, eval, serve
- Eval: context_recall, answer_relevance, faithfulness
-
turboquant-mlintegration for NF4 LLM serving - Hybrid retrieval (BM25 + dense)
- Streaming generation in
/ask - Notion / Confluence loaders (community PRs welcome)
- SQL agent for structured-data questions
Contributing
See docs/CONTRIBUTING.md.
git clone https://github.com/Ademo93/ragforge
cd ragforge
pip install -e ".[dev,serve,eval]"
pytest
License
MIT.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ragforge_ml-0.1.0.tar.gz.
File metadata
- Download URL: ragforge_ml-0.1.0.tar.gz
- Upload date:
- Size: 31.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e9456d9d28f9c19279cbe9009c453daff9439781ac10e453044b2f7500f2ddcc
|
|
| MD5 |
9d233cacfc9bdc135dab5c7140292530
|
|
| BLAKE2b-256 |
7b0d574cd8a22feea2a56af7a9e283bfdca43b932b23c73e96f1b5a02b6a9720
|
File details
Details for the file ragforge_ml-0.1.0-py3-none-any.whl.
File metadata
- Download URL: ragforge_ml-0.1.0-py3-none-any.whl
- Upload date:
- Size: 28.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1b259fa311034b8cf0b9af9975c278ba2b3659d5376575fd4b1214d6b0a83a0f
|
|
| MD5 |
55cad2938eefc5b476c9a84247441c44
|
|
| BLAKE2b-256 |
234c44684c6216444c7cce7397aef997aa08299b32397ff4c1f78d60626bcd69
|