Best-in-class RAG: semantic chunking + BGE embeddings + hybrid retrieval + cross-encoder reranking + context compression + RAGAS evaluation

These details have not been verified by PyPI

Project description

ragvault

The best-in-class RAG library. Five production-grade techniques in one clean package.

pip install ragvault

What's inside

Stage	Technique	Model / Library
Chunking	Semantic (similarity-based splits)	BGE embeddings
Embeddings	BGE dense vectors	`BAAI/bge-large-en-v1.5`
Retrieval	Hybrid = FAISS + BM25 fused via RRF	`faiss-cpu` + `rank-bm25`
Reranking	Cross-encoder joint scoring	`BAAI/bge-reranker-large`
Compression	LLM token reduction	LLMLingua-2
Evaluation	RAG quality metrics	RAGAS

Quick start

from ragvault import RagVault

vault = RagVault()

# Index a document
vault.index(open("my_doc.txt").read())

# Get compressed context ready for any LLM
context = vault.query("What is hybrid retrieval?")

# Or let ragvault call Claude and return the answer directly
answer = vault.ask("What is hybrid retrieval?")
print(answer)

Installation

pip install ragvault

Dependencies installed automatically:

FlagEmbedding   # BGE embeddings + reranker
faiss-cpu       # vector index
rank-bm25       # sparse retrieval
llmlingua       # context compression
ragas           # evaluation
anthropic       # Claude integration
torch           # model inference

For GPU acceleration replace faiss-cpu with faiss-gpu after install.

Pipeline explained

1. Semantic Chunking

Unlike fixed-size chunking, ragvault embeds every sentence and splits only when cosine similarity between adjacent sentences drops below a threshold. Each chunk covers a single coherent topic.

from ragvault import SemanticChunker, BGEEmbedder

embedder = BGEEmbedder()
chunker  = SemanticChunker(embedder=embedder, threshold=0.75)
chunks   = chunker.chunk(long_text)

2. BGE Embeddings

State-of-the-art dense embeddings from BAAI, consistently top-ranked on the MTEB leaderboard. Separate encoding paths for queries vs passages apply the correct instruction prefix automatically.

from ragvault import BGEEmbedder

embedder = BGEEmbedder(model_name="BAAI/bge-large-en-v1.5")
doc_vecs  = embedder.embed(["passage one", "passage two"])
query_vec = embedder.embed_query("what is rag?")

3. Hybrid Retrieval (FAISS + BM25 + RRF)

Dense vector search captures semantic similarity; BM25 captures exact keyword matches. Reciprocal Rank Fusion merges both ranked lists:

score(doc) = Σ  1 / (60 + rank_in_system)

No score normalisation needed — only ranks matter.

from ragvault import HybridRetriever
import numpy as np

retriever = HybridRetriever(chunks, embeddings)
results   = retriever.retrieve("my query", query_embedding, top_n=20)

4. Cross-Encoder Reranking

The hybrid retriever produces ~20 candidates fast. The cross-encoder then scores each (query, chunk) pair jointly, giving much more precise relevance scores. Only the top-N are kept.

from ragvault import CrossEncoderReranker

reranker = CrossEncoderReranker(model_name="BAAI/bge-reranker-large")
top5     = reranker.rerank(query, candidates, top_n=5)

5. Context Compression

LLMLingua-2 removes redundant tokens from the retrieved context, typically achieving 50% compression with minimal quality loss. Reduces LLM API costs and latency.

from ragvault import ContextCompressor

compressor = ContextCompressor()
compressed = compressor.compress(context, rate=0.5)

# With token stats
stats = compressor.compress_with_stats(context, rate=0.5)
print(stats["origin_tokens"], "→", stats["compressed_tokens"])

6. LLM Answer Generation

vault.ask() wraps the full retrieval pipeline and calls Claude to generate an answer. Requires ANTHROPIC_API_KEY in your environment.

vault = RagVault()
vault.index(document)

answer = vault.ask(
    "What is semantic chunking?",
    model="claude-sonnet-4-6",
    max_tokens=512,
)

7. RAGAS Evaluation

Evaluate your pipeline across four key dimensions:

Metric	What it measures
Faithfulness	Does the answer contain only information from the context?
Answer Relevancy	Does the answer actually address the question?
Context Precision	Are the retrieved chunks useful (not noisy)?
Context Recall	Were all necessary chunks retrieved?

results = vault.evaluate(
    questions=["What is RRF?"],
    answers=["RRF combines ranked lists by scoring 1/(k+rank)."],
    contexts=[["RRF merges ranked lists without score normalisation."]],
    ground_truths=["RRF fuses retrieval systems using rank-based scores."],
)
print(results)

Multi-document indexing

vault = RagVault()

# Index the first document
vault.index(doc1)

# Add more documents incrementally (no re-indexing from scratch)
vault.add_document(doc2)
vault.add_documents([doc3, doc4, doc5])

print(f"{len(vault)} chunks indexed")

Configuration

vault = RagVault(
    embedding_model="BAAI/bge-large-en-v1.5",   # swap for bge-m3 for multilingual
    reranker_model="BAAI/bge-reranker-large",
    chunk_threshold=0.75,      # lower = fewer, larger chunks
    retrieval_candidates=20,   # candidates passed to cross-encoder
    rerank_top_n=5,            # chunks kept after reranking
    compression_rate=0.5,      # 0.5 = keep 50% of tokens
    use_compression=True,      # set False to skip LLMLingua
)

Running the demo

git clone <repo>
cd ragvault
pip install -e .

# Optional: set your API key for Claude + RAGAS
export ANTHROPIC_API_KEY=sk-ant-...

python example.py

Running tests

pip install -e ".[dev]"
pytest tests/ -v

All 26 tests run with lightweight NumPy mocks — no GPU or model download needed.

26 passed in 2.46s

Architecture

Document(s)
    │
    ▼
SemanticChunker          ← topic-shift detection via cosine similarity
    │  chunks
    ▼
BGEEmbedder              ← BAAI/bge-large-en-v1.5, normalised L2 vectors
    │  embeddings
    ▼
HybridRetriever
  ├── FaissVectorStore   ← dense inner-product search
  ├── BM25Retriever      ← sparse keyword scoring
  └── RRF Fusion         ← score = Σ 1/(60 + rank)
    │  top-20 candidates
    ▼
CrossEncoderReranker     ← BAAI/bge-reranker-large, joint (q, doc) scoring
    │  top-5 chunks
    ▼
ContextCompressor        ← LLMLingua-2, ~50% token reduction
    │  compressed context
    ▼
LLM (Claude)             ← anthropic SDK, grounded answer generation
    │  answer
    ▼
RAGASEvaluator           ← faithfulness · relevancy · precision · recall

License

MIT

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.2.0

Apr 28, 2026

This version

0.1.0

Apr 26, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ragvault-0.1.0.tar.gz (16.4 kB view details)

Uploaded Apr 26, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ragvault-0.1.0-py3-none-any.whl (14.7 kB view details)

Uploaded Apr 26, 2026 Python 3

File details

Details for the file ragvault-0.1.0.tar.gz.

File metadata

Download URL: ragvault-0.1.0.tar.gz
Upload date: Apr 26, 2026
Size: 16.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for ragvault-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`cb58d8c8eb9f9dfc448fd52e12602d5ee92253434ba9a0504635df243e918c82`
MD5	`c12c46a495a342f64a59bd6f8264dd82`
BLAKE2b-256	`066b73e54cae7d80c75c1e6ce22d54cd7c1e3b378be1d1ac61d16a47c0f18905`

See more details on using hashes here.

File details

Details for the file ragvault-0.1.0-py3-none-any.whl.

File metadata

Download URL: ragvault-0.1.0-py3-none-any.whl
Upload date: Apr 26, 2026
Size: 14.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for ragvault-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`cf47beec083f95a708102677d0363ab612325e3c040da60645f6995813774e55`
MD5	`612843bdd5efe1c33bf160ff297d5005`
BLAKE2b-256	`1eb51ff177804b77a780bdb4271167f2ac14e76461ae2c1617c848a97727d40e`

See more details on using hashes here.

ragvault 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

ragvault

What's inside

Quick start

Installation

Pipeline explained

1. Semantic Chunking

2. BGE Embeddings

3. Hybrid Retrieval (FAISS + BM25 + RRF)

4. Cross-Encoder Reranking

5. Context Compression

6. LLM Answer Generation

7. RAGAS Evaluation

Multi-document indexing

Configuration

Running the demo

Running tests

Architecture

License

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes