A Modular Search Framework for Military Developers
Project description
Modular Search Framework for Military Developers
Tew En Hao, Cheong Sik Feng, Aekas Singh Gulati, Dillion Lim, Nicholas Lee Wei Jun, Jaye Koh Bo Jay, Aloysius Han Keng Siew, Lim Yong Zhi
This repository contains all relevant codes and materials prepared for our paper, "Modular Search Framework for Military Developers", at the 2025 International Conference on Military Communication and Information Systems (ICMCIS).
📜 Abstract
Military developers often face unique challenges when searching for information due to the restrictive and specialized environments in which they operate. In recent years, Large Language Models (LLMs) have demonstrated exceptional capabilities in generating coherent, human-like text and answering complex queries across a range of natural language tasks. A modular architecture is ideal, where core LLM capabilities (e.g., code understanding, summarization, and retrieval) operate independently of the specific search engine. We propose a modular, adaptable information retrieval framework tailored for military use, which integrates LLMs as a core component and we developed a prototype based on our proposed framework and conducted a preliminary evaluation using a curated dataset. Our prototype achieved a recall of 95.94%. This modular and adaptable approach underscores the importance of integrating advanced information retrieval techniques in military contexts, paving the way for secure, efficient, and context-aware development processes.
🛠️ Installation and Set-Up
Installing from PyPI
Yes, we have published our framework on PyPI! To install Modular Search and all its dependencies, the easiest method would be to use pip to query PyPI. This should, by default, be present in your Python installation. To, install run the following command in a terminal or Command Prompt / Powershell:
$ pip install modular-search
Depending on the OS, you might need to use pip3 instead. If the command is not found, you can choose to use the following command too:
$ python -m pip install modular-search
Here too, python or pip might be replaced with py or python3 and pip3 depending on the OS and installation configuration. If you have any issues with this, it is always helpful to consult
Stack Overflow.
Installing from Source
Git is needed to install this repository from source. This is not completely necessary as you can also install the zip file for this repository and store it on a local drive manually. To install Git, follow this guide.
After you have successfully installed Git, you can run the following command in a terminal / Command Prompt:
$ git clone https://github.com/aether-raid/modular-search.git
This stores a copy in the folder modular-search. You can then navigate into it using cd modular-search. Then, you can run the following:
$ pip install .
This should install modular-search to your local Python instance.
💻 Getting Started
Search Engines
Our framework supports a generic SearchEngine, which takes in a query and outputs a list of outputs. We currently support two search engines built in, Google Search and Deep Google Search.
Google Search uses the googlesearch-python package to scrape Google Search results. An example of usage is as follows:
from modular_search.engines import GoogleSearchEngine
engine = GoogleSearchEngine(num_results = 5)
results = engine("How to train a LLM?")
for result in results:
print(result)
# prints 5 lines of URLs
Deep Google Search is a modified version of the above Google Search, that goes to each page returned by the Google Search and extracts links from those pages. This process can be made recursive up to a specific depth. An example of the usage is as follows:
from modular_search.engines import DeepGoogleSearchEngine
engine = DeepGoogleSearchEngine(num_results = 5, depth = 2)
results = engine("How to train a LLM?")
for result in results:
print(result)
# prints a lot more than 5 lines of URLs
One can also develop their own engine with the abstract and generic SearchEngine class. For instance:
from typing import List
from pydantic import BaseModel
from modular_search.engines import SearchEngine
class MyCustomSearchEngineOutput(BaseModel):
# ...
class MyCustomSearchEngine(SearchEngine[MyCustomSearchEngineOutput]):
def search(self, query: str) -> List[MyCustomSearchEngineOutput]:
list_of_results = []
# insert logic
return list_of_results
Unit Search Blocks
Each unit search block is designed with modularity and search engine independence as core principles, enabling developers to easily customize the suite of search engines to align with their familiarity and missionspecific informational needs.
Within each unit search block, use case-specific submodules further process the results retrieved by the search engines. These submodules are abstracted within the framework and can be tailored to meet the needs of specific use cases. They also incorporate modular Large Language Model (LLM) components, designed to refine the initial search results.
The modular architecture of the unit search block facilitates seamless adaptation to a wide range of search requirements from general queries to highly specialized ones, while reducing the need for significant modifications to the core framework.
We support a generic UnitSearchBlock for defining basic search methods. To define a custom Unit Search Block, users need to define the abstract search function. Here is an example:
from pydantic import BaseModel
from modular_search.engines import GoogleSearchEngine
from modular_search.blocks import UnitSearchBlock
class MyCustomSearchResult(BaseModel):
# ...
class MyCustomSearchBlock(UnitSearchBlock[MyCustomSearchResult]):
def __init__(self):
self.engine = GoogleSearchEngine(num_results = 5)
def search(self, query: str) -> List[MyCustomSearchResult]:
results = []
search_results = self.engine.search()
# logic
return results
We also implement a CodebaseSearchBlock based on the proposed implementation in the paper. Here is a sample usage of this class:
from modular_search.engines import GoogleSearchEngine
from modular_search.blocks import CodebaseSearchBlock
engine = GoogleSearchEngine()
block = CodebaseSearchBlock(engine)
results = block("How to train a LLM?")
for result in results:
print(result.url, result.occurrences)
Search Controllers
The search controller provides 3 roles in our framework:
- It serves as the central management component for all unit search blocks within the framework. Each unit search block operates independently, allowing the search controller to orchestrate their concurrent utilization in a parallelized manner. In military operations, this capability is particularly advantageous, as it accelerates the retrieval of critical information during time-sensitive development phases.
- It provides military developers with a configurable user interface, enabling them to select specific search engines to employ based on the query at hand. This flexibility allows developers to tailor the search process to meet diverse operational requirements, development priorities, and stringent security constraints. For example, a developer tasked with retrieving documentation on encryption protocols might prioritize local search engines for classified materials while simultaneously querying web-based sources for publicly available algorithms. By offering centralized control, the search controller facilitates seamless coordination of the search process while ensuring strict adherence to military security protocols and operational standards.
- It also provides the capability to configure which unit search blocks are queried for a given developer request. This ensures that only the most relevant unit search blocks are utilized, minimizing the computational overhead and avoiding the inclusion of results from blocks that may not contribute meaningful outputs. By selectively engaging the appropriate unit search blocks, our framework enhances efficiency and ensures that the returned results are consistently aligned with the developer’s specific needs and context.
In other words, the search controller acts as a router to the various search blocks, not unlike a router in a MoE model. It allows for the dynamic selection of search blocks based on the query and the active blocks specified by the user. This design enables more granular control over the search process, allowing developers to tailor the search experience to their specific needs and operational requirements.
We support a generic SearchController that is able to select blocks to activate, select from activated blocks and aggregate. To define a custom Search Controller, users need to provide a dictionary of unit blocks, and define the abstract select_blocks and aggregate functions. Here is an example:
from typing import List, Dict
from pydantic import BaseModel
from modular_search.controllers import SearchController
# from ... import XXXSearchBlock
class MyCustomSearchResult(BaseModel):
# ...
class MyCustomSearchController(SearchController[MyCustomSearchResult]):
def __init__(self, blocks: Dict[str, XXXSearchBlock]):
super().__init__(blocks)
def select_blocks(self, query: str) -> List[str]:
active_blocks = []
# insert logic
return active_blocks
def aggregate(self, search_results: Dict[str, List[MyCustomSearchResult]]) -> List[MyCustomSearchResult]:
results = []
# insert logic
return results
We also implement a CodebaseSearchController based on the proposed implementation in the paper. Here is a sample usage of this class:
from modular_search.engines import GoogleSearchEngine
from modular_search.blocks import CodebaseSearchBlock
from modular_search.controllers import CodebaseSearchController
engine = GoogleSearchEngine()
block = CodebaseSearchBlock(engine)
controller = CodebaseSearchController(block)
results = controller("How to train a LLM?")
for result in results:
print(result.url, result.occurrences)
Notably, the CodebaseSearchController only has one block.
Rerankers & Extractors
In information retrieval, results re-ranking is a critical post-processing step aimed at improving the relevance and accuracy of search results. By reorganizing the retrieved results, re-ranking ensures that the most pertinent information is prioritized, enabling developers to access the most relevant insights quickly and efficiently. This process is particularly valuable in contexts where the quality and order of information significantly impact decision-making, such as military operations.
Within the framework, re-ranking leverages additional contextual and evaluative data collected by the submodules within each unit search block. These submodules generate rich metadata such as content relevance, security classifications, and domain-specific metrics that are integral to refining the order and priority of search results.
The implementation of the re-ranking system is intentionally flexible, enabling developers to adopt methodologies aligned with their operational requirements. Potential implementations range from traditional rule-based approaches and heuristic algorithms to advanced neural networks or the integration of LLMs.
After re-ranking, the top $k$ results are filtered and returned to the developer with additional information extracted for the final analysis, where $k$ is a configurable parameter determined by the developer. This parameter allows developers to adjust the breadth of their results pool to balance comprehensiveness with operational efficiency. In scenarios requiring rapid decision-making, a narrower $k$ may be chosen to focus on highly relevant results, while broader values can support exploratory tasks where diverse information is critical.
Our framework supports generic Reranker and Extractor models that attempt to rerank, filter and extract relevant information. To implement a custom Reranker, users need to define the abstract rerank function. An example is shown below:
from typing import List
from pydantic import BaseModel
from modular_search.rerankers import Reranker
class MyCustomSearchResult(BaseModel):
# ...
class MyCustomSearchRerankerResult(BaseModel):
# ...
class MyCustomSearchReranker(Reranker[MyCustomSearchResult, MyCustomSearchRerankerResult]):
def rerank(self, query: str, candidates: List[MyCustomSearchResult]) -> List[MyCustomSearchRerankerResult]:
results = []
# logic
return results
To implement a custom Extractor, users need to define the abstract extract function. An example is shown below:
from typing import List
from pydantic import BaseModel
from modular_search.extractors import Extractor
class MyCustomSearchRerankerResult(BaseModel):
# ...
class MyCustomSearchExtractorResult(BaseModel):
# ...
class MyCustomSearchExtractor(Extractor[MyCustomSearchRerankerResult, MyCustomSearchExtractorResult]):
def extract(self, candidates: List[MyCustomSearchRerankerResult]) -> List[MyCustomSearchExtractorResult]:
results = []
# logic
return results
We also implement a CodebaseSearchReranker and CodebaseSearchExtractor based on the proposed implementation in the paper. Here is a sample usage of these classes:
from modular_search.blocks import CodebaseSearchResult
from modular_search.rerankers import CodebaseSearchReranker
from modular_search.extractors import CodebaseSearchExtractor
def llm(query: str) -> str:
# insert logic
return ""
query = "How to train a LLM?"
results = [
CodebaseSearchResult(url = "...", occurrences = 4),
CodebaseSearchResult(url = "...", occurrences = 3),
CodebaseSearchResult(url = "...", occurrences = 1),
]
reranker = CodebaseSearchReranker(llm)
reranked_results = reranker(query, results)
for result in reranked_results:
print(result.url, result.occurrences, result.accuracy)
extractor = CodebaseSearchExtractor()
extracted_results = extractor(reranked_results)
for result in extracted_results:
print(result.url, result.occurrences, result.accuracy, result.code_blocks)
Putting it all Together
We define our own flow for Codebase Search, which you can find below:
from modular_search.engines import GoogleSearchEngine
from modular_search.blocks import CodebaseSearchBlock
from modular_search.controllers import CodebaseSearchController
from modular_search.rerankers import CodebaseSearchReranker
from modular_search.extractors import CodebaseSearchExtractor
def llm(query: str) -> str:
# insert logic
return ""
query = "How to train a LLM?"
engine = GoogleSearchEngine()
block = CodebaseSearchBlock(engine)
controller = CodebaseSearchController(block)
results = controller(query)
for result in results:
print(result.url, result.occurrences)
reranker = CodebaseSearchReranker(llm)
reranked_results = reranker(query, results)
extractor = CodebaseSearchExtractor()
extracted_results = extractor(reranked_results)
for result in extracted_results:
print(result.url, result.occurrences, result.accuracy, result.code_blocks)
This should provide a well-supported list of codebase links.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file modular_search-0.1.7.tar.gz.
File metadata
- Download URL: modular_search-0.1.7.tar.gz
- Upload date:
- Size: 120.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.7.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
04796dd91c448ee06de18ee15bb3fd8ca7f727429be56d7ee61e8d0cf535acaf
|
|
| MD5 |
c274315f75c27ed59e5cdef9c9faa768
|
|
| BLAKE2b-256 |
8a65de3aa0cd2d702163eef5b4cbf6561e380c8a8c76ad0a77cca9c35f56ebf8
|
File details
Details for the file modular_search-0.1.7-py3-none-any.whl.
File metadata
- Download URL: modular_search-0.1.7-py3-none-any.whl
- Upload date:
- Size: 25.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.7.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4db27c15e5257bf150c2a3405b6d0675b8d9cbb2b5945241d5546dfbe753e197
|
|
| MD5 |
dece23c44aae7234b9f3c18eb46fe696
|
|
| BLAKE2b-256 |
608e009323100b49a4efb8ae889cdb59c6292bdf5b75ea45a7d7f04540a27507
|