A mock handler for simulating a vector database.

These details have not been verified by PyPI

Project links

Homepage

Project description

Mocker db

mocker-db is a python module that contains mock vector database like solution built around python dictionary data type. It contains methods necessary to interact with this 'database', embed, search and persist.

from mocker_db import MockerDB, MockerConnector, SentenceTransformerEmbedder

1. Inserting values into the database

MockerDB can be used as ephemeral database where everything is saved in memory, but also can be persisted in one file for the database and another for embeddings storage.

Embedder is set to sentence_transformer by default and processed locally, custom embedders that connect to an api or use other open source models could be used as long as they have the same interface.

# Initialization
handler = MockerDB(
    # optional
    embedder_params = {'model_name_or_path' : 'paraphrase-multilingual-mpnet-base-v2',
                        'processing_type' : 'batch',
                        'tbatch_size' : 500},
    similarity_search_type = 'linear_torch',
    use_embedder = True,
    embedder = SentenceTransformerEmbedder,
    persist = True
)
# Initialize empty database
handler.establish_connection(
    # optional for persist
    file_path = "./mock_persist",
    embs_file_path = "./mock_embs_persist",
)

sentences = [
    "The cat slept.",
    "It rained today.",
    "She smiled gently.",
    "Books hold knowledge.",
    "The sun set behind the mountains, casting a golden glow over the valley.",
    "He quickly realized that time was slipping away, and he needed to act fast.",
    "The concert was an unforgettable experience, filled with laughter and joy.",
    "Despite the challenges, they managed to build a beautiful home together.",
    "As the wind howled through the ancient trees, scattering leaves and whispering secrets of the forest, she stood there, gazing up at the endless expanse of stars, feeling both infinitely small and profoundly connected to the universe.",
    "While the project seemed daunting at first, requiring countless hours of research, planning, and execution, the team worked tirelessly, motivated by their shared goal of creating something truly remarkable and innovative in their field.",
    "In the bustling city streets, amidst the constant hum of traffic and chatter, he found himself contemplating life's mysteries, pondering the choices that had brought him to this very moment and wondering where the path ahead would lead.",
    "The conference was a gathering of minds from around the globe, each participant bringing their unique perspectives and insights to the table, fostering a vibrant exchange of ideas that would shape the future of their respective fields for years to come."
]

# Insert Data
values_list = [
    {'text' : t, 'n_words' : len(t.split())} for t in sentences
]
handler.insert_values(values_list, "text")
print(f"Items in the database {len(handler.data)}")

Items in the database 12

2. Searching and retrieving values from the database

There are multiple options for search which could be used together or separately:

simple filter
filter with keywords
llm filter
search based on similarity

get all keys

results = handler.search_database(
    query = "cat",
    filter_criteria = {
        "n_words" : 3,
    }
)
print([{k: str(v)[:30] + "..." for k, v in result.items()} for result in results])

[{'text': 'The cat slept....', 'n_words': '3...'}, {'text': 'She smiled gently....', 'n_words': '3...'}, {'text': 'It rained today....', 'n_words': '3...'}, {'text': 'Books hold knowledge....', 'n_words': '3...'}]

get all keys with keywords search

results = handler.search_database(
    # when keyword key is provided filter is used to pass keywords
    filter_criteria = {
        "text" : ["sun"],
    },
    keyword_check_keys = ['text'],
    # percentage of filter keyword allowed to be different
    keyword_check_cutoff = 1,
    return_keys_list=['text']
)
print([{k: str(v)[:30] + "..." for k, v in result.items()} for result in results])

[{'text': 'The sun set behind the mountai...'}]

get all key - n_words

results = handler.search_database(
    query = "cat",
    filter_criteria = {
        "n_words" : 3,
    },
    return_keys_list=["-n_words"])
print([{k: str(v)[:30] + "..." for k, v in result.items()} for result in results])

[{'text': 'The cat slept....'}, {'text': 'She smiled gently....'}, {'text': 'It rained today....'}, {'text': 'Books hold knowledge....'}]

get all keys + distance

results = handler.search_database(
    query = "cat slept",
    filter_criteria = {
        "n_words" : 3,
    },
    return_keys_list=["+&distance"]
)
print([{k: str(v)[:30] + "..." for k, v in result.items()} for result in results])

[{'text': 'The cat slept....', 'n_words': '3...', '&distance': '0.9757655893784214...'}, {'text': 'She smiled gently....', 'n_words': '3...', '&distance': '0.25537100167603033...'}, {'text': 'It rained today....', 'n_words': '3...', '&distance': '0.049663180663929454...'}, {'text': 'Books hold knowledge....', 'n_words': '3...', '&distance': '0.011214834039176086...'}]

get distance

results = handler.search_database(
    query = "cat slept",
    filter_criteria = {
        "n_words" : 3,
    },
    return_keys_list=["&distance"]
)
print([{k: str(v)[:30] + "..." for k, v in result.items()} for result in results])

[{'&distance': '0.9757655893784214...'}, {'&distance': '0.25537100167603033...'}, {'&distance': '0.049663180663929454...'}, {'&distance': '0.011214834039176086...'}]

get all keys + embeddings

results = handler.search_database(
    query = "cat slept",
    filter_criteria = {
        "n_words" : 3,
    },
    return_keys_list=["+embedding"]
)
print([{k: str(v)[:30] + "..." for k, v in result.items()} for result in results])

[{'text': 'The cat slept....', 'n_words': '3...', 'embedding': '[-3.86438444e-02  1.23167984e-...'}, {'text': 'She smiled gently....', 'n_words': '3...', 'embedding': '[-2.46711876e-02  2.37020180e-...'}, {'text': 'It rained today....', 'n_words': '3...', 'embedding': '[-1.35887727e-01 -2.52719879e-...'}, {'text': 'Books hold knowledge....', 'n_words': '3...', 'embedding': '[ 6.20863438e-02  1.13785945e-...'}]

get embeddings

results = handler.search_database(
    query = "cat slept",
    filter_criteria = {
        "n_words" : 3,
    },
    return_keys_list=["embedding"]
)
print([{k: str(v)[:30] + "..." for k, v in result.items()} for result in results])

[{'embedding': '[-3.86438444e-02  1.23167984e-...'}, {'embedding': '[-2.46711876e-02  2.37020180e-...'}, {'embedding': '[-1.35887727e-01 -2.52719879e-...'}, {'embedding': '[ 6.20863438e-02  1.13785945e-...'}]

get embeddings and embedded field

results = handler.search_database(
    query = "cat slept",
    filter_criteria = {
        "n_words" : 3,
    },
    return_keys_list=["embedding", "+&embedded_field"]
)
print([{k: str(v)[:30] + "..." for k, v in result.items()} for result in results])

[{'embedding': '[-3.86438444e-02  1.23167984e-...', '&embedded_field': 'text...'}, {'embedding': '[-2.46711876e-02  2.37020180e-...', '&embedded_field': 'text...'}, {'embedding': '[-1.35887727e-01 -2.52719879e-...', '&embedded_field': 'text...'}, {'embedding': '[ 6.20863438e-02  1.13785945e-...', '&embedded_field': 'text...'}]

get all keys with llm search

Ollama

import logging
logging.disable(logging.INFO)

# Initialization
handler = MockerDB(
    # optional
    persist = True,
    llm_conn_params = {

        'llm_h_type' : 'OllamaConn',
        'llm_h_params' : {
            'connection_string' : 'http://127.0.0.1:11434',
            'model_name' : 'llama3.1:latest'
        }

    }
)
# Initialize empty database
handler.establish_connection(
    # optional for persist
    file_path = "./mock_persist",
    embs_file_path = "./mock_embs_persist",
)

results = await handler.search_database_async(
    llm_search_keys=['text'],
    filter_criteria = {
        "text" : ["cat", "nature"],
    },
    return_keys_list=["+&cats"],
    ignore_cats_cache=False
)
print([{k: str(v)[:30] + "..." for k, v in result.items()} for result in results])

[{'text': 'The cat slept....', 'n_words': '3...', '&cats': "{'text': ['cat']}..."}, {'text': 'The sun set behind the mountai...', 'n_words': '13...', '&cats': "{'text': ['nature']}..."}, {'text': 'As the wind howled through the...', 'n_words': '37...', '&cats': "{'text': ['nature']}..."}]

handler.cats

{'The cat slept.': {1: ['cat'], 0: ['nature']},
 'It rained today.': {1: [], 0: ['cat', 'nature']},
 'She smiled gently.': {1: [], 0: ['cat', 'nature']},
 'Books hold knowledge.': {1: [], 0: ['cat', 'nature']},
 'The sun set behind the mountains, casting a golden glow over the valley.': {1: ['nature'],
  0: ['cat']},
 'He quickly realized that time was slipping away, and he needed to act fast.': {1: [],
  0: ['cat', 'nature']},
 'The concert was an unforgettable experience, filled with laughter and joy.': {1: [],
  0: ['cat', 'nature']},
 'Despite the challenges, they managed to build a beautiful home together.': {1: [],
  0: ['cat', 'nature']},
 'As the wind howled through the ancient trees, scattering leaves and whispering secrets of the forest, she stood there, gazing up at the endless expanse of stars, feeling both infinitely small and profoundly connected to the universe.': {1: ['nature'],
  0: ['cat']},
 'While the project seemed daunting at first, requiring countless hours of research, planning, and execution, the team worked tirelessly, motivated by their shared goal of creating something truly remarkable and innovative in their field.': {1: [],
  0: ['cat', 'nature']},
 "In the bustling city streets, amidst the constant hum of traffic and chatter, he found himself contemplating life's mysteries, pondering the choices that had brought him to this very moment and wondering where the path ahead would lead.": {1: [],
  0: ['cat', 'nature']},
 'The conference was a gathering of minds from around the globe, each participant bringing their unique perspectives and insights to the table, fostering a vibrant exchange of ideas that would shape the future of their respective fields for years to come.': {1: [],
  0: ['cat', 'nature']}}

OpenAI

import logging
logging.disable(logging.INFO)

from dotenv import load_dotenv
import os

load_dotenv("../../credentials")

# Initialization
handler = MockerDB(
    # optional
    persist = True,
    llm_conn_params = {

        'llm_h_type' : 'OpenAIConn',
        'llm_h_params' : {
            'model_name' : 'gpt-4o-mini',
            'env_mapping' : {
                'api_key' : "OPENAI_KEY"
            }
        }

    }
)
# Initialize empty database
handler.establish_connection(
    # optional for persist
    file_path = "./mock_persist",
    embs_file_path = "./mock_embs_persist",
)

results = await handler.search_database_async(
    llm_search_keys=['text'],
    filter_criteria = {
        "text" : ["cat", "nature"],
    }
)
print([{k: str(v)[:30] + "..." for k, v in result.items()} for result in results])

[{'text': 'The cat slept....', 'n_words': '3...'}, {'text': 'The sun set behind the mountai...', 'n_words': '13...'}, {'text': 'As the wind howled through the...', 'n_words': '37...'}]

3. Removing values from the database

print(f"Items in the database {len(handler.data)}")
handler.remove_from_database(filter_criteria = {"n_words" : 11})
print(f"Items left in the database {len(handler.data)}")

Items in the database 14
Items left in the database 12

4 Embeding text

results = handler.embed_texts(
    texts = [
    "Short. Variation 1: Short.",
    "Another medium-length example, aiming to test the variability in processing different lengths of text inputs. Variation 2: processing lengths medium-length example, in inputs. to variability aiming test of text different the Another"
  ]
)

print(str(results)[0:300] + "...")

{'embeddings': [[0.04973424971103668, -0.43570247292518616, -0.014545125886797905, -0.03648979589343071, -0.04165348783135414, -0.04544278606772423, -0.07025150209665298, 0.10043243318796158, -0.20846229791641235, 0.15596869587898254, 0.11489829421043396, -0.13442179560661316, -0.02425091527402401, ...

5. Using MockerDB API

Remote Mocker can be used via very similar methods to the local one.

# Initialization
handler = MockerDB(
    skip_post_init=True
)
# Initialize empty database
handler.establish_connection(
     # optional for connecting to api
    connection_details = {
        'base_url' : "http://localhost:8000/mocker-db"
    }
)

sentences = [
    "The cat slept.",
    "It rained today.",
    "She smiled gently.",
    "Books hold knowledge.",
    "The sun set behind the mountains, casting a golden glow over the valley.",
    "He quickly realized that time was slipping away, and he needed to act fast.",
    "The concert was an unforgettable experience, filled with laughter and joy.",
    "Despite the challenges, they managed to build a beautiful home together.",
    "As the wind howled through the ancient trees, scattering leaves and whispering secrets of the forest, she stood there, gazing up at the endless expanse of stars, feeling both infinitely small and profoundly connected to the universe.",
    "While the project seemed daunting at first, requiring countless hours of research, planning, and execution, the team worked tirelessly, motivated by their shared goal of creating something truly remarkable and innovative in their field.",
    "In the bustling city streets, amidst the constant hum of traffic and chatter, he found himself contemplating life's mysteries, pondering the choices that had brought him to this very moment and wondering where the path ahead would lead.",
    "The conference was a gathering of minds from around the globe, each participant bringing their unique perspectives and insights to the table, fostering a vibrant exchange of ideas that would shape the future of their respective fields for years to come."
]

# Insert Data
values_list = [
    {'text' : t, 'n_words' : len(t.split())} for t in sentences
]
handler.insert_values(values_list, "text")

{'status': 'success', 'message': ''}

MockerAPI has multiple handlers stored in memory at a time, they can be displayed with number of items and memory estimate.

handler.show_handlers()

{'results': [{'handler': 'default',
   'items': 12,
   'memory_usage': 1.4748001098632812}],
 'status': 'success',
 'message': '',
 'handlers': ['default'],
 'items': [12],
 'memory_usage': [1.4748001098632812]}

results = handler.search_database(
    query = "cat",
    filter_criteria = {
        "n_words" : 3,
    }
)

results

{'status': 'success',
 'message': '',
 'handler': 'default',
 'results': [{'text': 'The cat slept.', 'n_words': 3},
  {'text': 'Books hold knowledge.', 'n_words': 3},
  {'text': 'It rained today.', 'n_words': 3},
  {'text': 'She smiled gently.', 'n_words': 3}]}

results = handler.embed_texts(
    texts = [
    "Short. Variation 1: Short.",
    "Another medium-length example, aiming to test the variability in processing different lengths of text inputs. Variation 2: processing lengths medium-length example, in inputs. to variability aiming test of text different the Another"
  ],
    # optional
    embedding_model = "intfloat/multilingual-e5-base"
)

print(str(results)[0:500] + "...")

{'status': 'success', 'message': '', 'handler': 'cache_mocker_intfloat_multilingual-e5-base', 'embedding_model': 'intfloat/multilingual-e5-base', 'embeddings': [[-0.021023569628596306, 0.03461984172463417, -0.013103404082357883, 0.030711326748132706, 0.023395603522658348, -0.040545400232076645, -0.01580517739057541, -0.026828577741980553, 0.015833470970392227, 0.017637528479099274, 0.0008703444618731737, -0.011133708991110325, 0.11296682059764862, 0.015158110298216343, -0.04669041559100151, -0.0...

handler.show_handlers()

{'results': [{'handler': 'default',
   'items': 12,
   'memory_usage': 1.4762191772460938},
  {'handler': 'cache_mocker_intfloat_multilingual-e5-base',
   'items': 2,
   'memory_usage': 1.4075469970703125}],
 'status': 'success',
 'message': '',
 'handlers': ['default', 'cache_mocker_intfloat_multilingual-e5-base'],
 'items': [12, 2],
 'memory_usage': [1.4762191772460938, 1.4075469970703125]}

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.3.0

Apr 25, 2025

0.2.7

Nov 24, 2024

0.2.6

Oct 1, 2024

0.2.5

Sep 21, 2024

0.2.4

Sep 15, 2024

0.2.3

Sep 2, 2024

0.2.2

Aug 26, 2024

0.2.1

Aug 15, 2024

0.2.0

Jul 26, 2024

0.1.3

Jul 22, 2024

0.1.2

Jun 21, 2024

0.1.1

Apr 20, 2024

0.0.12

Apr 16, 2024

0.0.11

Apr 15, 2024

0.0.10

Apr 6, 2024

0.0.9

Apr 6, 2024

0.0.8

Apr 6, 2024

0.0.7

Apr 6, 2024

0.0.6

Mar 19, 2024

0.0.5

Mar 11, 2024

0.0.4

Feb 23, 2024

0.0.3

Feb 19, 2024

0.0.2

Feb 14, 2024

0.0.1

Jan 23, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mocker_db-0.3.0.tar.gz (1.1 MB view details)

Uploaded Apr 25, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

mocker_db-0.3.0-py3-none-any.whl (1.2 MB view details)

Uploaded Apr 25, 2025 Python 3

File details

Details for the file mocker_db-0.3.0.tar.gz.

File metadata

Download URL: mocker_db-0.3.0.tar.gz
Upload date: Apr 25, 2025
Size: 1.1 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.10.17

File hashes

Hashes for mocker_db-0.3.0.tar.gz
Algorithm	Hash digest
SHA256	`ee6e37bfeaa3f16e66ec32873a9fcfdca78efdaa324325b74a91b40e3bbcfc49`
MD5	`658ad28f4c9053610537433230c338d4`
BLAKE2b-256	`51362ef944de246de76d52ec5d68d7e972ab7c43b390820c13d005bb19bf1c30`

See more details on using hashes here.

File details

Details for the file mocker_db-0.3.0-py3-none-any.whl.

File metadata

Download URL: mocker_db-0.3.0-py3-none-any.whl
Upload date: Apr 25, 2025
Size: 1.2 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.10.17

File hashes

Hashes for mocker_db-0.3.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`298689c6410832af33589e95a032ec09c483981c3358aae1823d24b0c7dc8ad6`
MD5	`d907746e907f3dac494f7b8e4dce5e6b`
BLAKE2b-256	`3b48cbb79911415d9c11f4fe7899034fb7eb38dbbd25a44f7e21045d24c7585c`

See more details on using hashes here.

mocker-db 0.3.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Mocker db

1. Inserting values into the database

2. Searching and retrieving values from the database

get all keys

get all keys with keywords search

get all key - n_words

get all keys + distance

get distance

get all keys + embeddings

get embeddings

get embeddings and embedded field

get all keys with llm search

3. Removing values from the database

4 Embeding text

5. Using MockerDB API

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes