A tool to search and retrieve relevant documents from a knowledge base using BERT/MiniLM embedding or a custom search implementation, then generate human-readable answers using OpenAI API.

These details have not been verified by PyPI

Project links

Homepage

Project description

Knowledge Base Search

This project provides an efficient and scalable solution to search and query a large knowledge base of documents. It allows users to search for information easily by leveraging advanced NLP techniques like BERT embeddings.

Features

Organized code structure following SOLID principles
BERT search for semantic similarity between queries and documents
Preprocessing using SpaCy for efficient text processing
Caching system to store preprocessed data and search algorithm instances for faster subsequent searches
Logging to track search-related information and potential issues

Methodology

The Knowledge Base Search tool employs a two-step process to find relevant documents and generate human-readable answers:

Semantic Search: The tool preprocesses and indexes the input documents using advanced NLP techniques like BERT/MiniLM embeddings or a custom search implementation. These embeddings capture the semantic meaning of the text, allowing the search algorithm to find documents that are not just textually similar, but also semantically related to the input query. This approach ensures a more accurate and context-aware selection of relevant documents.
Answer Generation: After retrieving the most relevant documents, the tool integrates with OpenAI's Chat GPT API to generate human-readable answers based on the provided context. By only sending the relevant context, we can reduce the cost and improve the performance of the API calls, while ensuring that the generated answers are accurate and contextually appropriate.

This methodology is designed to be easily extensible and customizable, allowing users to implement their own search algorithms or NLP models to tailor the solution to their specific use case.

Installation

To set up the project, follow these steps:

Clone the repository:

git clone https://github.com/your_username/knowledge_base_search.git

Change the directory:

cd knowledge_base_search

Create a virtual environment:

For Windows:

python -m venv venv

For Linux/Mac:

python3 -m venv venv

Activate the virtual environment:

For Windows:

venv\Scripts\activate

For Linux/Mac:

source venv/bin/activate

Install the required packages:

pip install -r requirements.txt

Create a .env file in the root of your project and add the openai_api_key variable. Replace <your_api_key> with your actual API key:

openai_api_key=<your_api_key>

Usage

Add your documents in JSON format to the data/raw_data/documents.json file.
Update the main.py file with your query and other necessary modifications.
Run the main.py script:

python main.py

This will load the documents, preprocess them, and index them using the specified search algorithm (e.g., BERT). Then, it will search for relevant documents based on your query and return the top matching results.

Contributing

Contributions are welcome! Please feel free to open issues or submit pull requests to improve the project.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.1.0

Apr 24, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

knowledge-base-search-0.1.0.tar.gz (8.0 kB view hashes)

Uploaded Apr 24, 2023 Source

Built Distribution

knowledge_base_search-0.1.0-py3-none-any.whl (10.2 kB view hashes)

Uploaded Apr 24, 2023 Python 3

Hashes for knowledge-base-search-0.1.0.tar.gz

Hashes for knowledge-base-search-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`c99981de73ddd15596f17bfcb6d537a3f778470366c610358d9f011f4ac8ea5f`
MD5	`544733a52e4d9d29232e806bbf7dc918`
BLAKE2b-256	`3e89c3ed744622b608d9528aeae580fa51cce9684b72b5868ee732c00037d00b`

Hashes for knowledge_base_search-0.1.0-py3-none-any.whl

Hashes for knowledge_base_search-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`cc63fd372c2e4cb6192f787457ee2452f1d9dfc03a5fe4f7ddb6091cf7fb3fdd`
MD5	`3193e9b0c03f3a8b6569a94ff6bd33cb`
BLAKE2b-256	`47c3643f690bc463c3e3092d25bd893507c5d5f13a0f8e178e5c4aeb4ad70cf2`