Skip to main content

Local-first RAG pipeline — PDF/Markdown ingestion, Qdrant retrieval, bge reranking, and an answer-quality eval harness. Pairs with turboquant-ml for quantized LLM serving.

Project description

RAGforge

Local-first RAG pipeline — PDF & Markdown ingestion, Qdrant retrieval, bge reranking, and an answer-quality eval harness.
Pairs with turboquant-ml for quantized LLM serving.

PyPI Python PyTorch License Docs


Why RAGforge?

Most "RAG starter" repos are a 30-line glue between LangChain and OpenAI that nobody can reproduce because it hides retrieval quality, reranking, latency, and cost behind a single .invoke() call. RAGforge is the opposite: a small, readable, local-first pipeline that you can run end-to-end on your own laptop with open-source models, and that ships with an answer-quality eval harness so you can actually measure what changing a knob does.

Three opinions:

  1. Local-first. Default everywhere is open-source: BAAI/bge-small-en-v1.5 for embeddings, BAAI/bge-reranker-base for reranking, Qdrant in embedded mode (no Docker required), and any HuggingFace causal LM for generation. No OpenAI key required to try the project.
  2. Measurable. Every change should answer the question "did the answer get better?". RAGforge ships ragforge eval with built-in context_recall, answer_relevance, and faithfulness metrics — no RAGAS dependency required, but RAGAS-compatible.
  3. Composable, not framework-y. Each stage (ingest, embed, retrieve, rerank, generate, evaluate) is one short module behind a small interface. Swap the encoder, swap the vector store, swap the LLM — no Runnable.invoke() magic to debug.

Features

Stage Default Swappable for
Ingest PDF (pypdf), Markdown (markdown-it-py) Anything that yields (text, metadata)
Chunk Recursive char splitter, ~512 tokens, 64 overlap Token-aware splitter, sentence splitter
Embed BAAI/bge-small-en-v1.5 (sentence-transformers) Any sentence-transformers model
Vector store Qdrant (embedded, no server required) Qdrant remote, in-memory NumPy backend
Rerank BAAI/bge-reranker-base Any cross-encoder
LLM Any HF causal LM Same model, NF4-quantized via turboquant-ml
Eval context_recall, answer_relevance, faithfulness RAGAS, hand-rolled
Serve FastAPI /ingest, /ask, /eval
CLI ragforge ingest / ask / eval / serve

Installation

The PyPI distribution is named ragforge-ml (the unsuffixed ragforge name was taken by an unrelated project). Python import and CLI are just ragforge / rf:

pip install ragforge-ml                       # core
pip install "ragforge-ml[serve]"              # + FastAPI
pip install "ragforge-ml[quantized]"          # + turboquant-ml NF4 path
pip install "ragforge-ml[all]"                # everything

60-second tour

from ragforge import Pipeline

rag = Pipeline.from_defaults(model_id="Qwen/Qwen2.5-3B-Instruct")
rag.ingest(["docs/policy.pdf", "notes/onboarding.md"])

answer = rag.ask("What is the maximum reimbursable amount for client lunches?")
print(answer.text)
for src in answer.sources:
    print(f"  {src.score:.3f}  {src.metadata['path']}#chunk{src.metadata['chunk']}")

CLI

rf ingest docs/ --collection company
rf ask "How do I rotate an API key?" --collection company --k 5
rf eval datasets/qa.jsonl --collection company --metrics context_recall,faithfulness
rf serve --collection company --host 0.0.0.0 --port 8080

Quantized LLM via TurboQuant

from ragforge import Pipeline
from ragforge.llm import QuantizedHFLLM

llm = QuantizedHFLLM("meta-llama/Llama-3.2-3B-Instruct", method="bnb-nf4")
rag = Pipeline.from_defaults(llm=llm)

Architecture

ragforge/
├── ingest/        # PDF + Markdown loaders, chunking
├── embed/         # sentence-transformers wrapper
├── vectorstore/   # Qdrant embedded + remote
├── rerank/        # bge-reranker-base
├── llm/           # HF causal LM + turboquant-ml integration
├── pipeline.py    # The end-to-end orchestrator
├── eval/          # context_recall, answer_relevance, faithfulness
├── serve/         # FastAPI app
└── cli.py         # ragforge / rf

Each module is short, readable, and replaceable through a small interface (Encoder, VectorStore, Reranker, LLM). The pipeline calls them in order — no DAG, no runnables, no callbacks.

Eval harness

The reason RAGforge exists. Most RAG projects ship without measuring whether their retrieval is any good. RAGforge ships three metrics in pure Python (no external API), all RAGAS-compatible:

Metric What it measures
context_recall Of the gold-context tokens, what fraction were retrieved?
answer_relevance Cosine similarity between the answer and synthetic questions back-generated from the answer (RAGAS recipe)
faithfulness Fraction of answer claims that are entailed by the retrieved context (NLI-based, can fall back to embedding overlap)
rf eval datasets/qa.jsonl --collection company
                            +---------------+--------+
                            |  metric       |  mean  |
                            +---------------+--------+
                            | context_recall|  0.84  |
                            | answer_rel    |  0.78  |
                            | faithfulness  |  0.91  |
                            +---------------+--------+
                            n=120  ·  latency_p50=620ms  ·  latency_p95=1.4s

Roadmap

  • PDF + Markdown ingestion
  • Recursive char chunker with overlap
  • BGE embeddings + BGE reranker
  • Qdrant embedded + remote
  • FastAPI serve
  • CLI: ingest, ask, eval, serve
  • Eval: context_recall, answer_relevance, faithfulness
  • turboquant-ml integration for NF4 LLM serving
  • Hybrid retrieval (BM25 + dense)
  • Streaming generation in /ask
  • Notion / Confluence loaders (community PRs welcome)
  • SQL agent for structured-data questions

Contributing

See docs/CONTRIBUTING.md.

git clone https://github.com/Ademo93/ragforge
cd ragforge
pip install -e ".[dev,serve,eval]"
pytest

License

MIT.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ragforge_ml-0.1.0.tar.gz (31.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ragforge_ml-0.1.0-py3-none-any.whl (28.5 kB view details)

Uploaded Python 3

File details

Details for the file ragforge_ml-0.1.0.tar.gz.

File metadata

  • Download URL: ragforge_ml-0.1.0.tar.gz
  • Upload date:
  • Size: 31.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for ragforge_ml-0.1.0.tar.gz
Algorithm Hash digest
SHA256 e9456d9d28f9c19279cbe9009c453daff9439781ac10e453044b2f7500f2ddcc
MD5 9d233cacfc9bdc135dab5c7140292530
BLAKE2b-256 7b0d574cd8a22feea2a56af7a9e283bfdca43b932b23c73e96f1b5a02b6a9720

See more details on using hashes here.

File details

Details for the file ragforge_ml-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: ragforge_ml-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 28.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for ragforge_ml-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1b259fa311034b8cf0b9af9975c278ba2b3659d5376575fd4b1214d6b0a83a0f
MD5 55cad2938eefc5b476c9a84247441c44
BLAKE2b-256 234c44684c6216444c7cce7397aef997aa08299b32397ff4c1f78d60626bcd69

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page