Local-first RAG pipeline — PDF/Markdown ingestion, Qdrant retrieval, bge reranking, and an answer-quality eval harness. Pairs with turboquant-ml for quantized LLM serving.

These details have not been verified by PyPI

Project links

Project description

RAGforge

Local-first RAG pipeline — PDF & Markdown ingestion, Qdrant retrieval, bge reranking, and an answer-quality eval harness.
Pairs with turboquant-ml for quantized LLM serving.

Why RAGforge?

Most "RAG starter" repos are a 30-line glue between LangChain and OpenAI that nobody can reproduce because it hides retrieval quality, reranking, latency, and cost behind a single .invoke() call. RAGforge is the opposite: a small, readable, local-first pipeline that you can run end-to-end on your own laptop with open-source models, and that ships with an answer-quality eval harness so you can actually measure what changing a knob does.

Three opinions:

Local-first. Default everywhere is open-source: BAAI/bge-small-en-v1.5 for embeddings, BAAI/bge-reranker-base for reranking, Qdrant in embedded mode (no Docker required), and any HuggingFace causal LM for generation. No OpenAI key required to try the project.
Measurable. Every change should answer the question "did the answer get better?". RAGforge ships ragforge eval with built-in context_recall, answer_relevance, and faithfulness metrics — no RAGAS dependency required, but RAGAS-compatible.
Composable, not framework-y. Each stage (ingest, embed, retrieve, rerank, generate, evaluate) is one short module behind a small interface. Swap the encoder, swap the vector store, swap the LLM — no Runnable.invoke() magic to debug.

Features

Stage	Default	Swappable for
Ingest	PDF (pypdf), Markdown (markdown-it-py)	Anything that yields `(text, metadata)`
Chunk	Recursive char splitter, ~512 tokens, 64 overlap	Token-aware splitter, sentence splitter
Embed	`BAAI/bge-small-en-v1.5` (sentence-transformers)	Any sentence-transformers model
Vector store	Qdrant (embedded, no server required)	Qdrant remote, in-memory NumPy backend
Rerank	`BAAI/bge-reranker-base`	Any cross-encoder
LLM	Any HF causal LM	Same model, NF4-quantized via `turboquant-ml`
Eval	`context_recall`, `answer_relevance`, `faithfulness`	RAGAS, hand-rolled
Serve	FastAPI `/ingest`, `/ask`, `/eval`	—
CLI	`ragforge ingest / ask / eval / serve`	—

Installation

The PyPI distribution is named ragforge-ml (the unsuffixed ragforge name was taken by an unrelated project). Python import and CLI are just ragforge / rf:

pip install ragforge-ml                       # core
pip install "ragforge-ml[serve]"              # + FastAPI
pip install "ragforge-ml[quantized]"          # + turboquant-ml NF4 path
pip install "ragforge-ml[all]"                # everything

60-second tour

from ragforge import Pipeline

rag = Pipeline.from_defaults(model_id="Qwen/Qwen2.5-3B-Instruct")
rag.ingest(["docs/policy.pdf", "notes/onboarding.md"])

answer = rag.ask("What is the maximum reimbursable amount for client lunches?")
print(answer.text)
for src in answer.sources:
    print(f"  {src.score:.3f}  {src.metadata['path']}#chunk{src.metadata['chunk']}")

CLI

rf ingest docs/ --collection company
rf ask "How do I rotate an API key?" --collection company --k 5
rf eval datasets/qa.jsonl --collection company --metrics context_recall,faithfulness
rf serve --collection company --host 0.0.0.0 --port 8080

Quantized LLM via TurboQuant

from ragforge import Pipeline
from ragforge.llm import QuantizedHFLLM

llm = QuantizedHFLLM("meta-llama/Llama-3.2-3B-Instruct", method="bnb-nf4")
rag = Pipeline.from_defaults(llm=llm)

Architecture

ragforge/
├── ingest/        # PDF + Markdown loaders, chunking
├── embed/         # sentence-transformers wrapper
├── vectorstore/   # Qdrant embedded + remote
├── rerank/        # bge-reranker-base
├── llm/           # HF causal LM + turboquant-ml integration
├── pipeline.py    # The end-to-end orchestrator
├── eval/          # context_recall, answer_relevance, faithfulness
├── serve/         # FastAPI app
└── cli.py         # ragforge / rf

Each module is short, readable, and replaceable through a small interface (Encoder, VectorStore, Reranker, LLM). The pipeline calls them in order — no DAG, no runnables, no callbacks.

Eval harness

The reason RAGforge exists. Most RAG projects ship without measuring whether their retrieval is any good. RAGforge ships three metrics in pure Python (no external API), all RAGAS-compatible:

Metric	What it measures
`context_recall`	Of the gold-context tokens, what fraction were retrieved?
`answer_relevance`	Cosine similarity between the answer and synthetic questions back-generated from the answer (RAGAS recipe)
`faithfulness`	Fraction of answer claims that are entailed by the retrieved context (NLI-based, can fall back to embedding overlap)

rf eval datasets/qa.jsonl --collection company

                            +---------------+--------+
                            |  metric       |  mean  |
                            +---------------+--------+
                            | context_recall|  0.84  |
                            | answer_rel    |  0.78  |
                            | faithfulness  |  0.91  |
                            +---------------+--------+
                            n=120  ·  latency_p50=620ms  ·  latency_p95=1.4s

Roadmap

PDF + Markdown ingestion
Recursive char chunker with overlap
BGE embeddings + BGE reranker
Qdrant embedded + remote
FastAPI serve
CLI: ingest, ask, eval, serve
Eval: context_recall, answer_relevance, faithfulness
turboquant-ml integration for NF4 LLM serving
Hybrid retrieval (BM25 + dense)
Streaming generation in /ask
Notion / Confluence loaders (community PRs welcome)
SQL agent for structured-data questions

Contributing

See docs/CONTRIBUTING.md.

git clone https://github.com/Ademo93/ragforge
cd ragforge
pip install -e ".[dev,serve,eval]"
pytest

License

MIT.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Jun 19, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ragforge_ml-0.1.0.tar.gz (31.2 kB view details)

Uploaded Jun 19, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ragforge_ml-0.1.0-py3-none-any.whl (28.5 kB view details)

Uploaded Jun 19, 2026 Python 3

File details

Details for the file ragforge_ml-0.1.0.tar.gz.

File metadata

Download URL: ragforge_ml-0.1.0.tar.gz
Upload date: Jun 19, 2026
Size: 31.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for ragforge_ml-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`e9456d9d28f9c19279cbe9009c453daff9439781ac10e453044b2f7500f2ddcc`
MD5	`9d233cacfc9bdc135dab5c7140292530`
BLAKE2b-256	`7b0d574cd8a22feea2a56af7a9e283bfdca43b932b23c73e96f1b5a02b6a9720`

See more details on using hashes here.

File details

Details for the file ragforge_ml-0.1.0-py3-none-any.whl.

File metadata

Download URL: ragforge_ml-0.1.0-py3-none-any.whl
Upload date: Jun 19, 2026
Size: 28.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for ragforge_ml-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`1b259fa311034b8cf0b9af9975c278ba2b3659d5376575fd4b1214d6b0a83a0f`
MD5	`55cad2938eefc5b476c9a84247441c44`
BLAKE2b-256	`234c44684c6216444c7cce7397aef997aa08299b32397ff4c1f78d60626bcd69`

See more details on using hashes here.

ragforge-ml 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

RAGforge

Why RAGforge?

Features

Installation

60-second tour

CLI

Quantized LLM via TurboQuant

Architecture

Eval harness

Roadmap

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes