A mock handler for simulating a vector database.
Project description
Mocker db
MockerDB is a python module that contains mock vector database like solution built around python dictionary data type. It contains methods necessary to interact with this 'database', embed, search and persist.
Mocker DB
This class is a mock handler for simulating a vector database, designed primarily for testing and development scenarios. It offers functionalities such as text embedding, hierarchical navigable small world (HNSW) search, and basic data management within a simulated environment resembling a vector database.
# import sys
# sys.path.append('../')
import numpy as np
from sentence_transformers import SentenceTransformer
from mocker_db import MockerDB, SentenceTransformerEmbedder, MockerSimilaritySearch
Usage examples
The examples contain:
- Inserting values into the database
- Seaching and retrieving values from the database
- Removing values from the database
- Testing the HNSW Search Algorithm
1. Inseting values into the database
# Initialization
handler = MockerDB(
# optional
embedder_params = {'model_name_or_path' : 'paraphrase-multilingual-mpnet-base-v2',
'processing_type' : 'batch',
'tbatch_size' : 500,
'SentenceTransformer' : SentenceTransformer},
use_embedder = True,
embedder = SentenceTransformerEmbedder,
## optional/ for similarity search
similarity_search = MockerSimilaritySearch,
return_keys_list = None,
search_results_n = 3,
similarity_search_type = 'linear',
similarity_params = {'space':'cosine'},
## optional/ inputs with defaults
file_path = "./mock_persist",
persist = True,
embedder_error_tolerance = 0.0
)
# Initialize empty database
handler.establish_connection()
# Insert Data
values_list = [
{"text": "Sample text 1",
"text2": "Sample text 1"},
{"text": "Sample text 2",
"text2": "Sample text 2"}
]
handler.insert_values(values_list, "text")
print(f"Items in the database {len(handler.data)}")
Items in the database 2
2. Seaching and retrieving values from the database
- get all keys
results = handler.search_database(
query = "text",
filter_criteria = {
"text" : "Sample text 1",
}
)
print([{k: str(v)[:30] + "..." for k, v in result.items()} for result in results])
[{'text': 'Sample text 1...', 'text2': 'Sample text 1...'}]
- get all keys with keywords search
results = handler.search_database(
query = "text",
# when keyword key is provided filter is used to pass keywords
filter_criteria = {
"text" : ["1"],
},
keyword_check_keys = ['text'],
# percentage of filter keyword allowed to be different
keyword_check_cutoff = 1,
return_keys_list=['text']
)
print([{k: str(v)[:30] + "..." for k, v in result.items()} for result in results])
[{'text': 'Sample text 1...'}]
- get all key - text2
results = handler.search_database(
query = "text",
filter_criteria = {
"text" : "Sample text 1",
},
return_keys_list=["-text2"])
print([{k: str(v)[:30] + "..." for k, v in result.items()} for result in results])
[{'text': 'Sample text 1...'}]
- get all keys + distance
results = handler.search_database(
query = "text",
filter_criteria = {
"text" : "Sample text 1"
},
return_keys_list=["+&distance"]
)
print([{k: str(v)[:30] + "..." for k, v in result.items()} for result in results])
[{'text': 'Sample text 1...', 'text2': 'Sample text 1...', '&distance': '0.6744726...'}]
- get distance
results = handler.search_database(
query = "text",
filter_criteria = {
"text" : "Sample text 1"
},
return_keys_list=["&distance"]
)
print([{k: str(v)[:30] + "..." for k, v in result.items()} for result in results])
[{'&distance': '0.6744726...'}]
- get all keys + embeddings
results = handler.search_database(
query = "text",
filter_criteria = {
"text" : "Sample text 1"
},
return_keys_list=["+embedding"]
)
print([{k: str(v)[:30] + "..." for k, v in result.items()} for result in results])
[{'text': 'Sample text 1...', 'text2': 'Sample text 1...', 'embedding': '[-4.94665056e-02 -2.38676026e-...'}]
- get embeddings
results = handler.search_database(
query = "text",
filter_criteria = {
"text" : "Sample text 1"
},
return_keys_list=["embedding"]
)
print([{k: str(v)[:30] + "..." for k, v in result.items()} for result in results])
[{'embedding': '[-4.94665056e-02 -2.38676026e-...'}]
- get embeddings and embedded field
results = handler.search_database(
query = "text",
filter_criteria = {
"text" : "Sample text 1"
},
return_keys_list=["embedding", "+&embedded_field"]
)
print([{k: str(v)[:30] + "..." for k, v in result.items()} for result in results])
[{'&embedded_field': 'text...', 'embedding': '[-4.94665056e-02 -2.38676026e-...'}]
3. Removing values from the database
print(f"Items in the database {len(handler.data)}")
handler.remove_from_database(filter_criteria = {"text": "Sample text 1"})
print(f"Items left in the database {len(handler.data)}")
Items in the database 2
Items left in the database 1
4. Testing the HNSW Search Algorithm
mss = MockerSimilaritySearch(
# optional
search_results_n = 3,
similarity_params = {'space':'cosine'},
similarity_search_type ='linear'
)
ste = SentenceTransformerEmbedder(# optional / adaptor parameters
processing_type = '',
tbatch_size = 500,
max_workers = 2,
# sentence transformer parameters
model_name_or_path = 'paraphrase-multilingual-mpnet-base-v2',
SentenceTransformer = SentenceTransformer)
# Create embeddings
embeddings = [ste.embed("example1"), ste.embed("example2")]
# Assuming embeddings are pre-calculated and stored in 'embeddings'
data_with_embeddings = {"record1": {"embedding": embeddings[0]}, "record2": {"embedding": embeddings[1]}}
handler.data = data_with_embeddings
# HNSW Search
query_embedding = embeddings[0] # Example query embedding
labels, distances = mss.hnsw_search(query_embedding, np.array(embeddings), k=1)
print(labels, distances)
[0] [1.1920929e-07]
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file mocker_db-0.2.3.tar.gz
.
File metadata
- Download URL: mocker_db-0.2.3.tar.gz
- Upload date:
- Size: 15.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.14
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d20937b4de1a08ba6467896f605c81b341a29754f3e2ec5d14ac857edf155d26 |
|
MD5 | 6fb7206ef9a124ae09dfd260b4660e5c |
|
BLAKE2b-256 | 44f7b5d0e9a61d6ca386f60b605f4d5c474ab747cffc9a36fdde163c046caf7a |
File details
Details for the file mocker_db-0.2.3-py3-none-any.whl
.
File metadata
- Download URL: mocker_db-0.2.3-py3-none-any.whl
- Upload date:
- Size: 13.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.14
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b6a61b2c65621d877cda25838a4720442479f7dc827415113a38de9bb92db4a7 |
|
MD5 | 0083d5a25ab186221b85662d3c34f487 |
|
BLAKE2b-256 | 68ba1a850dffa13c38cb11d1ac16bee7b12c140dc12976a136c8f653365f858e |