Best-in-class RAG: semantic chunking + BGE embeddings + hybrid retrieval + cross-encoder reranking + context compression + RAGAS evaluation
Project description
ragvault
The best-in-class RAG library. Five production-grade techniques in one clean package.
pip install ragvault
What's inside
| Stage | Technique | Model / Library |
|---|---|---|
| Chunking | Semantic (similarity-based splits) | BGE embeddings |
| Embeddings | BGE dense vectors | BAAI/bge-large-en-v1.5 |
| Retrieval | Hybrid = FAISS + BM25 fused via RRF | faiss-cpu + rank-bm25 |
| Reranking | Cross-encoder joint scoring | BAAI/bge-reranker-large |
| Compression | LLM token reduction | LLMLingua-2 |
| Evaluation | RAG quality metrics | RAGAS |
Quick start
from ragvault import RagVault
vault = RagVault()
# Index a document
vault.index(open("my_doc.txt").read())
# Get compressed context ready for any LLM
context = vault.query("What is hybrid retrieval?")
# Or let ragvault call Claude and return the answer directly
answer = vault.ask("What is hybrid retrieval?")
print(answer)
Installation
pip install ragvault
Dependencies installed automatically:
FlagEmbedding # BGE embeddings + reranker
faiss-cpu # vector index
rank-bm25 # sparse retrieval
llmlingua # context compression
ragas # evaluation
anthropic # Claude integration
torch # model inference
For GPU acceleration replace
faiss-cpuwithfaiss-gpuafter install.
Pipeline explained
1. Semantic Chunking
Unlike fixed-size chunking, ragvault embeds every sentence and splits only when cosine similarity between adjacent sentences drops below a threshold. Each chunk covers a single coherent topic.
from ragvault import SemanticChunker, BGEEmbedder
embedder = BGEEmbedder()
chunker = SemanticChunker(embedder=embedder, threshold=0.75)
chunks = chunker.chunk(long_text)
2. BGE Embeddings
State-of-the-art dense embeddings from BAAI, consistently top-ranked on the MTEB leaderboard. Separate encoding paths for queries vs passages apply the correct instruction prefix automatically.
from ragvault import BGEEmbedder
embedder = BGEEmbedder(model_name="BAAI/bge-large-en-v1.5")
doc_vecs = embedder.embed(["passage one", "passage two"])
query_vec = embedder.embed_query("what is rag?")
3. Hybrid Retrieval (FAISS + BM25 + RRF)
Dense vector search captures semantic similarity; BM25 captures exact keyword matches. Reciprocal Rank Fusion merges both ranked lists:
score(doc) = Σ 1 / (60 + rank_in_system)
No score normalisation needed — only ranks matter.
from ragvault import HybridRetriever
import numpy as np
retriever = HybridRetriever(chunks, embeddings)
results = retriever.retrieve("my query", query_embedding, top_n=20)
4. Cross-Encoder Reranking
The hybrid retriever produces ~20 candidates fast. The cross-encoder then
scores each (query, chunk) pair jointly, giving much more precise relevance
scores. Only the top-N are kept.
from ragvault import CrossEncoderReranker
reranker = CrossEncoderReranker(model_name="BAAI/bge-reranker-large")
top5 = reranker.rerank(query, candidates, top_n=5)
5. Context Compression
LLMLingua-2 removes redundant tokens from the retrieved context, typically achieving 50% compression with minimal quality loss. Reduces LLM API costs and latency.
from ragvault import ContextCompressor
compressor = ContextCompressor()
compressed = compressor.compress(context, rate=0.5)
# With token stats
stats = compressor.compress_with_stats(context, rate=0.5)
print(stats["origin_tokens"], "→", stats["compressed_tokens"])
6. LLM Answer Generation
vault.ask() wraps the full retrieval pipeline and calls Claude to generate
an answer. Requires ANTHROPIC_API_KEY in your environment.
vault = RagVault()
vault.index(document)
answer = vault.ask(
"What is semantic chunking?",
model="claude-sonnet-4-6",
max_tokens=512,
)
7. RAGAS Evaluation
Evaluate your pipeline across four key dimensions:
| Metric | What it measures |
|---|---|
| Faithfulness | Does the answer contain only information from the context? |
| Answer Relevancy | Does the answer actually address the question? |
| Context Precision | Are the retrieved chunks useful (not noisy)? |
| Context Recall | Were all necessary chunks retrieved? |
results = vault.evaluate(
questions=["What is RRF?"],
answers=["RRF combines ranked lists by scoring 1/(k+rank)."],
contexts=[["RRF merges ranked lists without score normalisation."]],
ground_truths=["RRF fuses retrieval systems using rank-based scores."],
)
print(results)
Multi-document indexing
vault = RagVault()
# Index the first document
vault.index(doc1)
# Add more documents incrementally (no re-indexing from scratch)
vault.add_document(doc2)
vault.add_documents([doc3, doc4, doc5])
print(f"{len(vault)} chunks indexed")
Configuration
vault = RagVault(
embedding_model="BAAI/bge-large-en-v1.5", # swap for bge-m3 for multilingual
reranker_model="BAAI/bge-reranker-large",
chunk_threshold=0.75, # lower = fewer, larger chunks
retrieval_candidates=20, # candidates passed to cross-encoder
rerank_top_n=5, # chunks kept after reranking
compression_rate=0.5, # 0.5 = keep 50% of tokens
use_compression=True, # set False to skip LLMLingua
)
Running the demo
git clone <repo>
cd ragvault
pip install -e .
# Optional: set your API key for Claude + RAGAS
export ANTHROPIC_API_KEY=sk-ant-...
python example.py
Running tests
pip install -e ".[dev]"
pytest tests/ -v
All 26 tests run with lightweight NumPy mocks — no GPU or model download needed.
26 passed in 2.46s
Architecture
Document(s)
│
▼
SemanticChunker ← topic-shift detection via cosine similarity
│ chunks
▼
BGEEmbedder ← BAAI/bge-large-en-v1.5, normalised L2 vectors
│ embeddings
▼
HybridRetriever
├── FaissVectorStore ← dense inner-product search
├── BM25Retriever ← sparse keyword scoring
└── RRF Fusion ← score = Σ 1/(60 + rank)
│ top-20 candidates
▼
CrossEncoderReranker ← BAAI/bge-reranker-large, joint (q, doc) scoring
│ top-5 chunks
▼
ContextCompressor ← LLMLingua-2, ~50% token reduction
│ compressed context
▼
LLM (Claude) ← anthropic SDK, grounded answer generation
│ answer
▼
RAGASEvaluator ← faithfulness · relevancy · precision · recall
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ragvault-0.1.0.tar.gz.
File metadata
- Download URL: ragvault-0.1.0.tar.gz
- Upload date:
- Size: 16.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cb58d8c8eb9f9dfc448fd52e12602d5ee92253434ba9a0504635df243e918c82
|
|
| MD5 |
c12c46a495a342f64a59bd6f8264dd82
|
|
| BLAKE2b-256 |
066b73e54cae7d80c75c1e6ce22d54cd7c1e3b378be1d1ac61d16a47c0f18905
|
File details
Details for the file ragvault-0.1.0-py3-none-any.whl.
File metadata
- Download URL: ragvault-0.1.0-py3-none-any.whl
- Upload date:
- Size: 14.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cf47beec083f95a708102677d0363ab612325e3c040da60645f6995813774e55
|
|
| MD5 |
612843bdd5efe1c33bf160ff297d5005
|
|
| BLAKE2b-256 |
1eb51ff177804b77a780bdb4271167f2ac14e76461ae2c1617c848a97727d40e
|