Vendor-neutral RAG + LLM serving infrastructure: swappable LLM protocol and vector store (FAISS/NumPy/Qdrant), cached embedding index, and observability.
Project description
RAG + LLM Serving Infrastructure
An installable, vendor-neutral foundation for retrieval-augmented LLM applications: a swappable vector store, a cached embedding index, a provider-agnostic LLM protocol, the observability around them, a FastAPI serving layer, and a retrieval-quality eval gate.
Distilled infrastructure layer — typed, tested, packaged, and runnable on its own.
Install
# from Git (works today):
pip install "git+https://github.com/MarwaBS/rag-llm-infra"
pip install "rag-llm-infra[faiss,qdrant,openai,serve] @ git+https://github.com/MarwaBS/rag-llm-infra"
# from a local clone, for development:
pip install -e ".[dev]"
A tagged PyPI release is planned; until then, install from Git as above.
Quickstart — end-to-end RAG (no API key, no network)
git clone https://github.com/MarwaBS/rag-llm-infra && cd rag-llm-infra
pip install -e .
python example.py
embed documents → index in a VectorStore → retrieve top-k for a query
→ build a grounded prompt → answer with an LLMProtocol backend
Runs on the NumPy vector store + the deterministic mock LLM, so it needs no key.
In production, swap the demo embedder for EmbeddingEngine and get_llm("mock")
for get_llm("openai").
Serve it
pip install "rag-llm-infra[serve] @ git+https://github.com/MarwaBS/rag-llm-infra"
uvicorn rag_llm_infra.serve:app # or: docker build -t rag-llm-infra . && docker run -p 8000:8000 rag-llm-infra
curl -XPOST localhost:8000/index -d '{"documents":["FAISS is in-process vector search","Qdrant is a vector database"]}' -H 'content-type: application/json'
curl -XPOST localhost:8000/query -d '{"query":"vector search","k":1}' -H 'content-type: application/json'
What's inside
| Module | Responsibility |
|---|---|
rag_llm_infra.llm_protocol |
LLMProtocol — runtime_checkable Protocol over OpenAI / Anthropic-stub / Mock; factory get_llm() |
rag_llm_infra.vector_store |
VectorStoreProtocol — in-process FAISS IndexFlatIP, pure-NumPy fallback, real Qdrant (batched search) |
rag_llm_infra.evidence_index |
EmbeddingEngine — SentenceTransformers embeddings + adaptive, memory-pressure-aware LRU cache; reader/writer lock |
rag_llm_infra.tracing |
OpenTelemetry spans with console-exporter + no-op fallbacks |
rag_llm_infra.log_config |
structured JSON logging + an llm_call latency/token timer |
rag_llm_infra.serve |
FastAPI service (/index, /query, /health) wiring the parts together |
rag_llm_infra.faithfulness |
groundedness(answer, contexts) — lexical faithfulness metric for RAG output |
rag_llm_infra.fallback |
FallbackLLM — budget-aware multi-provider routing; drop-in LLMProtocol |
Quality gates
python -m eval.retrieval_eval # recall@1 / MRR on a labelled paraphrase corpus
python -m eval.generation_eval # groundedness (faithfulness) of generated answers
Both run in CI: a retrieval regression (recall@1 ≥ 0.80, MRR ≥ 0.85) or a
faithfulness regression (grounded answer below threshold, or the metric failing
to flag a hallucinated control) fails the build and cannot merge.
Engineering principles demonstrated
- Swap by interface —
LLMProtocol/VectorStoreProtocolmake the model and the index runtime-swappable. - Degrade, don't crash — FAISS / Qdrant / OpenTelemetry / SentenceTransformers are lazily imported with working fallbacks; missing infra never hard-fails import.
- Measured, not asserted — a retrieval eval gate, not just unit tests; packaged and CI-built end to end.
Develop / test
pip install -e ".[dev]" # installs FAISS + Qdrant + serve extras too
ruff check . && pytest && python -m eval.retrieval_eval
CI installs the native backends, so the FAISS and Qdrant tests run there (they skip only when those libraries are absent).
License
MIT — see LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file rag_llm_infra-0.1.0.tar.gz.
File metadata
- Download URL: rag_llm_infra-0.1.0.tar.gz
- Upload date:
- Size: 32.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4dd06d6ce3820640f27eb1908ba25122f68cf7daa195a042c2d44c2674a19d3f
|
|
| MD5 |
abc3eeeff18cc82c6b3d3ee6f824af4a
|
|
| BLAKE2b-256 |
4161e2a927ae19aceeb1517288b652f0289750f73233982903ddadf292204657
|
File details
Details for the file rag_llm_infra-0.1.0-py3-none-any.whl.
File metadata
- Download URL: rag_llm_infra-0.1.0-py3-none-any.whl
- Upload date:
- Size: 22.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d11fc7e1edc5ba93520b762df985a1b8a1d722268a87e6ab53caffc89677ba3d
|
|
| MD5 |
0b0a3a324b3a9e845132580ba09b7ae5
|
|
| BLAKE2b-256 |
95f9bf7cea83541b9b8cc0dcb021776f2c0437182daac66e6735efd27af6be49
|