Skip to main content

Vendor-neutral RAG + LLM serving infrastructure: swappable LLM protocol and vector store (FAISS/NumPy/Qdrant), cached embedding index, and observability.

Project description

RAG + LLM Serving Infrastructure

CI

An installable, vendor-neutral foundation for retrieval-augmented LLM applications: a swappable vector store, a cached embedding index, a provider-agnostic LLM protocol, the observability around them, a FastAPI serving layer, and a retrieval-quality eval gate.

Distilled infrastructure layer — typed, tested, packaged, and runnable on its own.

Install

# from Git (works today):
pip install "git+https://github.com/MarwaBS/rag-llm-infra"
pip install "rag-llm-infra[faiss,qdrant,openai,serve] @ git+https://github.com/MarwaBS/rag-llm-infra"
# from a local clone, for development:
pip install -e ".[dev]"

A tagged PyPI release is planned; until then, install from Git as above.

Quickstart — end-to-end RAG (no API key, no network)

git clone https://github.com/MarwaBS/rag-llm-infra && cd rag-llm-infra
pip install -e .
python example.py
embed documents → index in a VectorStore → retrieve top-k for a query
                → build a grounded prompt → answer with an LLMProtocol backend

Runs on the NumPy vector store + the deterministic mock LLM, so it needs no key. In production, swap the demo embedder for EmbeddingEngine and get_llm("mock") for get_llm("openai").

Serve it

pip install "rag-llm-infra[serve] @ git+https://github.com/MarwaBS/rag-llm-infra"
uvicorn rag_llm_infra.serve:app          # or: docker build -t rag-llm-infra . && docker run -p 8000:8000 rag-llm-infra
curl -XPOST localhost:8000/index -d '{"documents":["FAISS is in-process vector search","Qdrant is a vector database"]}' -H 'content-type: application/json'
curl -XPOST localhost:8000/query -d '{"query":"vector search","k":1}'      -H 'content-type: application/json'

What's inside

Module Responsibility
rag_llm_infra.llm_protocol LLMProtocolruntime_checkable Protocol over OpenAI / Anthropic-stub / Mock; factory get_llm()
rag_llm_infra.vector_store VectorStoreProtocol — in-process FAISS IndexFlatIP, pure-NumPy fallback, real Qdrant (batched search)
rag_llm_infra.evidence_index EmbeddingEngine — SentenceTransformers embeddings + adaptive, memory-pressure-aware LRU cache; reader/writer lock
rag_llm_infra.tracing OpenTelemetry spans with console-exporter + no-op fallbacks
rag_llm_infra.log_config structured JSON logging + an llm_call latency/token timer
rag_llm_infra.serve FastAPI service (/index, /query, /health) wiring the parts together
rag_llm_infra.faithfulness groundedness(answer, contexts) — lexical faithfulness metric for RAG output
rag_llm_infra.fallback FallbackLLM — budget-aware multi-provider routing; drop-in LLMProtocol

Quality gates

python -m eval.retrieval_eval      # recall@1 / MRR on a labelled paraphrase corpus
python -m eval.generation_eval     # groundedness (faithfulness) of generated answers

Both run in CI: a retrieval regression (recall@1 ≥ 0.80, MRR ≥ 0.85) or a faithfulness regression (grounded answer below threshold, or the metric failing to flag a hallucinated control) fails the build and cannot merge.

Engineering principles demonstrated

  • Swap by interfaceLLMProtocol / VectorStoreProtocol make the model and the index runtime-swappable.
  • Degrade, don't crash — FAISS / Qdrant / OpenTelemetry / SentenceTransformers are lazily imported with working fallbacks; missing infra never hard-fails import.
  • Measured, not asserted — a retrieval eval gate, not just unit tests; packaged and CI-built end to end.

Develop / test

pip install -e ".[dev]"     # installs FAISS + Qdrant + serve extras too
ruff check . && pytest && python -m eval.retrieval_eval

CI installs the native backends, so the FAISS and Qdrant tests run there (they skip only when those libraries are absent).

License

MIT — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rag_llm_infra-0.1.0.tar.gz (32.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rag_llm_infra-0.1.0-py3-none-any.whl (22.7 kB view details)

Uploaded Python 3

File details

Details for the file rag_llm_infra-0.1.0.tar.gz.

File metadata

  • Download URL: rag_llm_infra-0.1.0.tar.gz
  • Upload date:
  • Size: 32.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for rag_llm_infra-0.1.0.tar.gz
Algorithm Hash digest
SHA256 4dd06d6ce3820640f27eb1908ba25122f68cf7daa195a042c2d44c2674a19d3f
MD5 abc3eeeff18cc82c6b3d3ee6f824af4a
BLAKE2b-256 4161e2a927ae19aceeb1517288b652f0289750f73233982903ddadf292204657

See more details on using hashes here.

File details

Details for the file rag_llm_infra-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: rag_llm_infra-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 22.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for rag_llm_infra-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d11fc7e1edc5ba93520b762df985a1b8a1d722268a87e6ab53caffc89677ba3d
MD5 0b0a3a324b3a9e845132580ba09b7ae5
BLAKE2b-256 95f9bf7cea83541b9b8cc0dcb021776f2c0437182daac66e6735efd27af6be49

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page