BM25S + PyStemmer-powered lexical retrieval (with stemming) and routing layer for LLM tools, documents, and hybrid RAG
Project description
vrraj-bm25s-retriever
Interactive Demo UI:
The GitHub repo includes a FastAPI-powered Demo Web UI for testing retrieval behavior, inspecting ranked results, adding documents, and tuning search parameters. See Demo Web UI for setup instructions.
A lightweight BM25S-powered lexical retrieval package for Python applications, REST services, LLM systems, and MCP-based tool workflows.
Use it to search documents, route LLM tool calls, filter MCP-discovered tools, and build fast lexical retrieval layers without running a vector database.
Why this exists
LLM applications often have too much context available: too many tools, too many documents, too many chunks, and too many near-duplicate choices.
This becomes more important in agentic systems where the LLM may have access to large tool registries. As the number of tools grows (20+), this becomes a scaling problem: context size increases, token costs rise, and tool selection becomes less reliable.
vrraj-bm25s-retriever gives you a small, deterministic lexical retrieval layer that can sit before an LLM and narrow the candidate set before prompt assembly.
This package is designed for applications where many tools are available, but only a small subset is relevant for any given request.
Typical flow:
User Query → BM25S Retrieval → Filtered Tools / Documents → LLM Context → Execution
This becomes especially important in systems with large tool registries, where user intent maps to a bounded set of actions: trading, customer support, CRM, finance workflows, operations, and other tool-driven systems.
In these domains, the retrieval problem is often not broad semantic discovery. It is selecting the right tool, command, document, or workflow from a known set of possibilities.
Clear action language matters: tool names, workflow names, order actions, support tasks, CRM operations, command phrases, and domain-specific vocabulary.
What you get
- Python retrieval library for programmatic lexical search
- REST service for remote retrieval and document management
- HTTP client for application integration
- YAML-backed document/tool registry support for LLM and MCP tool-routing workflows
- BM25S + PyStemmer for fast stemming-aware lexical matching
- Normalized response schema with scores, rankings, metadata, and settings
- Softmax relevance scoring with configurable temperature and cutoff filtering
- Demo Web UI for testing retrieval behavior during development
Install
pip install vrraj-bm25s-retriever
Links:
- PyPI: https://pypi.org/project/vrraj-bm25s-retriever/
- GitHub: https://github.com/vrraj/bm25s-retriever
- Documentation: https://vrraj.github.io/bm25s-retriever/
Quick start
Option A: Use directly in Python
For Python applications (most common)
Requires only the base package (no server extras):
pip install vrraj-bm25s-retriever
from bm25s_retriever import BM25SRetriever, Document
retriever = BM25SRetriever()
retriever.add_documents([
Document(
id="create_order",
title="Create Order",
content="Place a buy or sell order for a stock or equity trade.",
keywords=["place order", "buy order", "sell order", "stock trade"],
metadata={"category": "trading", "type": "tool"},
),
Document(
id="get_market_movers",
title="Get Market Movers",
content="Retrieve top gaining, losing, or most active market movers.",
keywords=["market movers", "top gainers", "top losers", "most active"],
metadata={"category": "trading", "type": "tool"},
),
])
results = retriever.retrieve_documents("place a limit buy order")
for doc in results["documents"]:
print(doc["id"], doc["title"], doc["score_percentage"])
Option B: Use as a REST service
For shared services and web UI
Install with server dependencies (includes FastAPI, Uvicorn, Jinja2):
pip install "vrraj-bm25s-retriever[server]"
Start the server:
bm25s-server --config settings.yaml
Search documents:
curl -X POST http://localhost:9200/retrieve \
-H "Content-Type: application/json" \
-d '{"query": "show open customer orders"}'
Use the Python HTTP client:
from bm25s_retriever import BM25SClient
client = BM25SClient("http://localhost:9200")
results = client.retrieve("show open customer orders")
print(f"Found {len(results['documents'])} matching tools/documents")
Option C: Run the example script
For quick testing (not production)
curl -L -O https://raw.githubusercontent.com/vrraj/bm25s-retriever/main/examples/bm25s_basic_usage.py
python bm25s_basic_usage.py
Primary use case: LLM and MCP tool routing
In LLM-driven systems, exposing every available tool to the model increases token usage, creates context bloat, and makes tool selection less reliable as registries grow.
This package works best when user intent maps to a bounded set of actions: quotes, market movers, order placement, customer order lookup, CRM updates, follow-up emails, escalations, and similar workflow-driven tasks.
BM25S can retrieve the most relevant tools before the LLM sees the tool list. This works with traditional tool registries, agent frameworks, and Model Context Protocol (MCP) clients.
With MCP, servers can standardize tool discovery, but tool discovery is not the same as tool selection. The MCP client, host application, or orchestrator still decides which discovered tools should be passed to the LLM. BM25S acts as the relevance layer between discovery and prompt assembly.
Mental model:
Discover / Load → Inject → Index → Filter → Focused LLM Context
In practice:
YAML Tool Registry + MCP-Discovered Tools + Internal Tool Definitions
→ Inject into BM25S Index (REST or in-process)
→ Query-Time Tool Filtering
→ Focused LLM Context
Tools can come from YAML, MCP discovery, or internal registries. The client or orchestration layer transforms these into BM25S documents and injects them into a unified in-memory index. At query time, BM25S filters the relevant subset before passing tools to the LLM.
Hybrid registry pattern:
YAML Tool Registry + MCP-Discovered Tools → Dynamic BM25S Index → Query-Time Tool Filtering
You can start with your own YAML-based tool registry and augment it at runtime. If an MCP server discovers additional tools, the client or orchestration layer can transform those tool definitions into BM25S documents and add them to the retriever index. This lets static tool definitions and newly discovered MCP tools participate in the same lexical search and ranking flow.
Useful for domains like:
- Trading and market data tools
- Customer support workflows
- CRM and sales operations
- Finance and account workflows
- Internal enterprise tools and MCP server tool catalogs
- Hybrid RAG pipelines
Benefits:
- Combine static YAML tool definitions, MCP-discovered tools, and internal tool definitions in the same BM25S retrieval index
- Filter MCP-discovered tools on demand before passing tool definitions to the LLM
- Reduce tool context from large registries to a small, relevant candidate set
- Lower token usage, latency, and cost by avoiding unnecessary tool definitions in the prompt
- Improve tool selection when tools have narrow, specific purposes
- Return metadata with retrieved tools/documents so the client or orchestrator can apply its own scope, policy, or routing logic
- Keep routing deterministic and explainable
Example:
python examples/llm_tool_routing_example.py
See:
Other use cases
Domain-constrained retrieval
Use BM25S to search curated document sets, tool registries, or MCP tool catalogs where the language is controlled and exact matches matter.
The tool catalog does not have to be static. Applications can load a YAML registry at startup, then add or refresh tool definitions discovered from MCP servers during runtime.
Examples:
- Trading actions and market-data tools
- Support case workflows
- CRM tasks and follow-up actions
- Internal process documentation
- Compliance or policy snippets
Hybrid RAG
BM25S works well alongside embeddings, especially when you want lexical precision before or alongside semantic search:
- Use BM25S for keyword precision
- Use embeddings for semantic recall
- Merge or rerank results before passing context to the LLM
This is helpful when semantic retrieval may miss exact tool names, workflow names, commands, abbreviations, or domain-specific terms.
Vector search is powerful for broad semantic discovery, but it can add latency and cost when embedding calls are required at runtime or when the system has to sort through many semantically similar matches. For bounded tool-selection problems, a lexical pass can be faster, cheaper, and easier to reason about.
Lightweight retrieval service
For small-to-medium document sets, BM25S can be enough by itself:
- No vector database required
- Fast in-memory retrieval
- Deterministic scoring
- Simple deployment
- Easy YAML-based configuration
Demo Web UI
The GitHub repository includes a FastAPI-powered demo UI for testing retrieval behavior, inspecting ranked results, adding documents, and tuning search parameters.
It also acts as an interactive tuning environment. You can load your own YAML documents or tool definitions, test retrieval parameters such as temperature, softmax scoring, and cutoff settings, and iteratively refine keywords and tool descriptions using the included UI.
This helps you visualize the ranking logic and see how tools or documents are prioritized before pushing retrieval settings into production.
Run locally:
git clone https://github.com/vrraj/bm25s-retriever.git
cd bm25s-retriever
pip install -e ".[dev]"
bm25s-server --config settings.yaml
Open:
http://localhost:9200/
Manual start:
uvicorn bm25s_retriever.main:app --reload --port 9200
Public API overview
Library API
BM25SRetriever()- Create a retriever instanceretriever.add_documents(...) -> None- Add documents to the indexretriever.retrieve_documents(...) -> Dict- Search documents with BM25S scoringretriever.rebuild_index() -> None- Reload documents from YAML and rebuild the index
HTTP Client API
BM25SClient(base_url)- Create an HTTP clientclient.retrieve(...) -> Dict- Search documentsclient.add_document(...) -> Dict- Add a documentclient.get_documents() -> Dict- List documentsclient.delete_document(doc_id) -> Dict- Delete a documentclient.get_settings() -> Dict- Read search settingsclient.update_settings(...) -> Dict- Update search settings
For complete method signatures and response details, see:
Search response schema
{
"success": bool,
"message": str,
"documents": [
{
"id": str,
"title": str,
"content": str,
"keywords": list[str],
"metadata": dict,
"bm25_score": float,
"score_percentage": float,
"rank": int,
}
],
"total_retrieved": int,
"cutoff_percentage": float,
"settings": {
"temperature": float,
"ignore_zero": bool,
"llm_tools_cutoff": float,
},
}
Document schema
{
"id": str,
"title": str,
"content": str,
"keywords": list[str],
"metadata": dict,
}
Searchable fields:
titlecontentkeywords
Reference fields:
idmetadataparameterswhen present in YAML tool definitions
metadata is returned with each document/tool result so the client or orchestration layer can decide how to use it for routing, display, filtering, policy checks, or downstream logic.
Configuration
settings.yaml
bm25s:
temperature: 0.5 # Softmax temperature control
ignore_zero: true # Filter out zero-score results
llm_tools_cutoff: 10.0 # Minimum softmax score percentage
documents:
source: "source_files/tools_list.yaml"
auto_reload: true
server:
host: "0.0.0.0"
port: 9200
reload: false
tools_list.yaml
documents:
- id: "get_customer_orders"
title: "Get Customer Orders"
content: "Retrieve open, closed, priority, delayed, or historical customer orders."
keywords: ["orders", "customer orders", "open orders", "order history"]
metadata:
category: "customer_support"
type: "tool"
Environment variables
# Server configuration
BM25S_HOST=0.0.0.0
BM25S_PORT=9200
BM25S_RELOAD=false
# Document configuration
BM25S_DOCUMENTS_PATH=./source_files/tools_list.yaml
BM25S_AUTO_RELOAD=true
# BM25S defaults
BM25S_TEMPERATURE=0.5
BM25S_IGNORE_ZERO=true
BM25S_CUTOFF=10.0
Document loading
Load from a custom YAML file:
from bm25s_retriever import BM25SRetriever
retriever = BM25SRetriever(document_file="path/to/your/tools_list.yaml")
Or add documents programmatically:
from bm25s_retriever import BM25SRetriever, Document
retriever = BM25SRetriever()
retriever.add_documents([
Document(
id="custom_doc",
title="Custom Document",
content="Your searchable content here.",
keywords=["tag1", "tag2"],
)
])
After editing a YAML source file, reload the index manually:
retriever.rebuild_index()
Or create a new retriever instance:
retriever = BM25SRetriever()
Dynamic tool injection
You can also add tool definitions at runtime. This is useful when your application starts with a YAML registry but discovers additional tools from MCP servers or other tool providers and wants those tools to participate in lexical retrieval.
from bm25s_retriever import Document
retriever.add_documents([
Document(
id="mcp_get_account_summary",
title="Get Account Summary",
content="Retrieve account balances, buying power, positions, and account status from an MCP-discovered tool.",
keywords=["account", "balances", "buying power", "positions"],
metadata={
"source": "mcp",
"server": "brokerage_tools",
"type": "tool",
},
)
])
Retrieved results include metadata, allowing the client or orchestrator to map the selected document back to the underlying tool provider, MCP server, or execution layer.
Search tuning
The GitHub repo is useful for hands-on retrieval tuning. Run the demo UI locally with your own data to test temperature, softmax scoring, and cutoff settings, then refine your keywords and tool descriptions based on the ranked results.
Stemming
The retriever uses PyStemmer to improve lexical recall across related word forms.
Examples:
trade,trading,tradedinvest,investing,investmentorder,orders,ordering
Temperature
0.1 - 0.5: More focused and selective0.5 - 1.5: Balanced retrieval1.5+: Broader retrieval
Default: 0.5 in the sample configuration above. Tune based on your data and use case.
Cutoff percentage
5 - 15%: Typical range- Lower values return more results
- Higher values return only stronger matches
Default: 10.0 in the sample configuration above. Tune based on your desired selectivity.
Score interpretation
>20%: Strong match8-20%: Good match<8%: Weak match0%: No lexical relevance
Example scripts
YAML file usage
python examples/load_yaml_documents.py
Covers:
- Loading custom YAML documents
- Search configuration
- Document management patterns
REST API usage
bm25s-server --config settings.yaml
python examples/rest_api_examples.py
Covers:
- HTTP client operations
- REST-based document management
- Error handling patterns
curl examples
bm25s-server --config settings.yaml
./examples/curl_api_examples.sh
Covers:
- Command-line API operations
- Search, add, list, and delete endpoints
REST API examples
Add a document:
curl -X POST http://localhost:9200/documents \
-H "Content-Type: application/json" \
-d '{
"id": "get_customer_orders",
"title": "Get Customer Orders",
"content": "Retrieve open, closed, priority, delayed, or historical customer orders.",
"keywords": ["orders", "customer orders", "open orders", "order history"]
}'
Search:
curl -X POST http://localhost:9200/retrieve \
-H "Content-Type: application/json" \
-d '{"query": "show open customer orders", "temperature": 0.5}'
List documents:
curl http://localhost:9200/documents
Delete a document:
curl -X DELETE http://localhost:9200/documents/get_customer_orders
Performance notes
Approximate guidance:
- Small collections (<100 docs): sub-second indexing, instant search
- Medium collections (100-1,000 docs): 1-3 second indexing, usually <100ms search
- Larger collections (1,000+ docs): 3-10 second indexing, roughly 100-500ms search depending on content size
Documents and the BM25S index are stored in memory for fast access.
Optimization tips:
- Keep
contentfocused and specific - Add realistic
keywordsthat match how users ask questions - Use lower temperature for more selective tool routing
- Use cutoff filtering to reduce noisy matches
- Use returned metadata in the client or orchestration layer for filtering, routing, display, policy checks, or downstream decisions
Development
git clone https://github.com/vrraj/bm25s-retriever.git
cd bm25s-retriever
pip install -e ".[dev]"
bm25s-server --config settings.yaml
Run tests:
pytest
pytest -m integration
pytest -m "integration or unit"
Documentation
License
MIT License.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file vrraj_bm25s_retriever-1.0.0.tar.gz.
File metadata
- Download URL: vrraj_bm25s_retriever-1.0.0.tar.gz
- Upload date:
- Size: 68.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
86274f78a9e068fc130ebb78c1124eb3919cf533af68bb0c5c3317cafc040dd3
|
|
| MD5 |
3e41c82cc699dc7bdf257b4de5367b7b
|
|
| BLAKE2b-256 |
87adc4b0588039931fc4af977bb9605aebabef73ad02a963914656c7d6200da3
|
File details
Details for the file vrraj_bm25s_retriever-1.0.0-py3-none-any.whl.
File metadata
- Download URL: vrraj_bm25s_retriever-1.0.0-py3-none-any.whl
- Upload date:
- Size: 32.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
69fc4e1590ed7be3feb0a54b5085ef6c3a4fba7230b57b5235cd2294103b5660
|
|
| MD5 |
befc6b0d9a72749a947c6c12f6c9ef42
|
|
| BLAKE2b-256 |
ce9e6ce32e472d6cd0aa7c33811eaa90b6812a4872b72b6ed8b896aac27431de
|