Skip to main content

A mock handler for simulating a vector database.

Project description

Mocker db

MockerDB is a python module that contains mock vector database like solution built around python dictionary data type. It contains methods necessary to interact with this 'database', embed, search and persist.

Mocker DB

This class is a mock handler for simulating a vector database, designed primarily for testing and development scenarios. It offers functionalities such as text embedding, hierarchical navigable small world (HNSW) search, and basic data management within a simulated environment resembling a vector database.

# import sys
# sys.path.append('../')
import numpy as np
from sentence_transformers import SentenceTransformer
from mocker_db import MockerDB, SentenceTransformerEmbedder, MockerSimilaritySearch

Usage examples

The examples contain:

  1. Inserting values into the database
  2. Seaching and retrieving values from the database
  3. Removing values from the database
  4. Testing the HNSW Search Algorithm

1. Inseting values into the database

# Initialization
handler = MockerDB(
    # optional
    embedder_params = {'model_name_or_path' : 'paraphrase-multilingual-mpnet-base-v2',
                        'processing_type' : 'batch',
                        'tbatch_size' : 500,
                        'SentenceTransformer' : SentenceTransformer},
    use_embedder = True,
    embedder = SentenceTransformerEmbedder,
    ## optional/ for similarity search
    similarity_search = MockerSimilaritySearch,
    return_keys_list = None,
    search_results_n = 3,
    similarity_search_type = 'linear',
    similarity_params = {'space':'cosine'},
    ## optional/ inputs with defaults
    file_path = "./mock_persist",
    persist = True,
    embedder_error_tolerance = 0.0
)
# Initialize empty database
handler.establish_connection()
# Insert Data
values_list = [
    {"text": "Sample text 1",
     "text2": "Sample text 1"},
    {"text": "Sample text 2",
     "text2": "Sample text 2"}
]
handler.insert_values(values_list, "text")
print(f"Items in the database {len(handler.data)}")
Items in the database 2

2. Seaching and retrieving values from the database

  • get all keys
results = handler.search_database(
    query = "text",
    filter_criteria = {
        "text" : "Sample text 1",
    }
)
print([{k: str(v)[:30] + "..." for k, v in result.items()} for result in results])
[{'text': 'Sample text 1...', 'text2': 'Sample text 1...'}]
  • get all keys with keywords search
results = handler.search_database(
    query = "text",
    # when keyword key is provided filter is used to pass keywords
    filter_criteria = {
        "text" : ["1"],
    },
    keyword_check_keys = ['text'],
    # percentage of filter keyword allowed to be different
    keyword_check_cutoff = 1,
    return_keys_list=['text']
)
print([{k: str(v)[:30] + "..." for k, v in result.items()} for result in results])
[{'text': 'Sample text 1...'}]
  • get all key - text2
results = handler.search_database(
    query = "text",
    filter_criteria = {
        "text" : "Sample text 1",
    },
    return_keys_list=["-text2"])
print([{k: str(v)[:30] + "..." for k, v in result.items()} for result in results])
[{'text': 'Sample text 1...'}]
  • get all keys + distance
results = handler.search_database(
    query = "text",
    filter_criteria = {
        "text" : "Sample text 1"
    },
    return_keys_list=["+&distance"]
)
print([{k: str(v)[:30] + "..." for k, v in result.items()} for result in results])
[{'text': 'Sample text 1...', 'text2': 'Sample text 1...', '&distance': '0.6744726...'}]
  • get distance
results = handler.search_database(
    query = "text",
    filter_criteria = {
        "text" : "Sample text 1"
    },
    return_keys_list=["&distance"]
)
print([{k: str(v)[:30] + "..." for k, v in result.items()} for result in results])
[{'&distance': '0.6744726...'}]
  • get all keys + embeddings
results = handler.search_database(
    query = "text",
    filter_criteria = {
        "text" : "Sample text 1"
    },
    return_keys_list=["+embedding"]
)
print([{k: str(v)[:30] + "..." for k, v in result.items()} for result in results])
[{'text': 'Sample text 1...', 'text2': 'Sample text 1...', 'embedding': '[-4.94665056e-02 -2.38676026e-...'}]
  • get embeddings
results = handler.search_database(
    query = "text",
    filter_criteria = {
        "text" : "Sample text 1"
    },
    return_keys_list=["embedding"]
)
print([{k: str(v)[:30] + "..." for k, v in result.items()} for result in results])
[{'embedding': '[-4.94665056e-02 -2.38676026e-...'}]
  • get embeddings and embedded field
results = handler.search_database(
    query = "text",
    filter_criteria = {
        "text" : "Sample text 1"
    },
    return_keys_list=["embedding", "+&embedded_field"]
)
print([{k: str(v)[:30] + "..." for k, v in result.items()} for result in results])
[{'&embedded_field': 'text...', 'embedding': '[-4.94665056e-02 -2.38676026e-...'}]

3. Removing values from the database

print(f"Items in the database {len(handler.data)}")
handler.remove_from_database(filter_criteria = {"text": "Sample text 1"})
print(f"Items left in the database {len(handler.data)}")
Items in the database 2
Items left in the database 1

4. Testing the HNSW Search Algorithm

mss = MockerSimilaritySearch(
    # optional
    search_results_n = 3,
    similarity_params = {'space':'cosine'},
    similarity_search_type ='linear'
)

ste = SentenceTransformerEmbedder(# optional / adaptor parameters
                                  processing_type = '',
                                  tbatch_size = 500,
                                  max_workers = 2,
                                  # sentence transformer parameters
                                  model_name_or_path = 'paraphrase-multilingual-mpnet-base-v2',
                                  SentenceTransformer = SentenceTransformer)
# Create embeddings
embeddings = [ste.embed("example1"), ste.embed("example2")]


# Assuming embeddings are pre-calculated and stored in 'embeddings'
data_with_embeddings = {"record1": {"embedding": embeddings[0]}, "record2": {"embedding": embeddings[1]}}
handler.data = data_with_embeddings

# HNSW Search
query_embedding = embeddings[0]  # Example query embedding
labels, distances = mss.hnsw_search(query_embedding, np.array(embeddings), k=1)
print(labels, distances)
[0] [1.1920929e-07]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mocker_db-0.2.3.tar.gz (15.1 kB view hashes)

Uploaded Source

Built Distribution

mocker_db-0.2.3-py3-none-any.whl (13.9 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page