A pure python vector store with hybrid search and metadata filtering.

These details have not been verified by PyPI

Project links

Project description

SimpleVectorStore

A lightweight, in-memory Python library for managing and searching items containing vector embeddings, associated text, and metadata. Designed for simplicity and ease of use in scenarios where a full-featured vector database is overkill.

Overview

SimpleVectorStore provides a straightforward way to store, retrieve, update, and search data points that combine semantic meaning (via vectors) with textual content and structured metadata. It supports vector similarity search (cosine), basic lexical search, hybrid search combining both, and flexible metadata filtering. The entire store can be easily saved to and loaded from a JSON file.

Features

In-Memory: Fast access as all data resides in RAM.
Vector Similarity Search: Find items with similar vector embeddings using cosine similarity.
Lexical Search: Perform simple case-insensitive substring searches on item text.
Hybrid Search: Combine vector and lexical scores with adjustable weighting.
Metadata Filtering: Pre-filter search candidates based on metadata criteria (equality, ranges (gt, lt, gte, lte), list membership (in), containment (contains)).
CRUD Operations: Add, get, update (vector, text, metadata), and delete items easily.
Dimension Enforcement: Optionally enforce a consistent vector dimension across all items.
Persistence: Save the entire store state to a JSON file and load it back.
Simple API: Designed with ease of use in mind.

Installation

Currently, SimpleVectorStore is provided as a single Python class. To use it:

Copy the simple_vector_store.py file (containing the SimpleVectorStore class definition) into your project directory.
Import the class into your script:

# Assuming simple_vector_store.py is in the same directory or your PYTHONPATH
from simple_vector_store import SimpleVectorStore
import numpy as np

Dependencies:

numpy: For numerical operations, especially vector handling. (pip install numpy)

Usage

Initialization

# Initialize with a specific vector dimension (recommended)
store = SimpleVectorStore(vector_dim=3)

# Or initialize without a dimension (it will be inferred from the first item)
# store = SimpleVectorStore()

Adding Items

# Add items with vectors, text, and metadata
id1 = store.add_item(
vector=np.array([0.1, 0.9, 0.0]),
text="Information about apples.",
metadata={"category": "fruit", "color": "red", "year": 2023, "tags": ["juicy", "sweet"]}
)

id2 = store.add_item(
vector=np.array([0.8, 0.1, 0.1]),
text="All about oranges.",
metadata={"category": "fruit", "color": "orange", "year": 2022},
item_id="citrus-001" # You can provide your own IDs
)

id3 = store.add_item(
vector=np.array([0.1, 0.1, 0.8]),
text="Introduction to Python programming.",
metadata={"category": "tech", "language": "python", "year": 2023, "tags": ["code", "beginner"]}
)

print(f"Store now contains {len(store)} items.")

Vector Search

Find items semantically similar to a query vector.

query_vec = np.array([0.2, 0.7, 0.1]) # Query vector somewhat similar to 'apples'

# Find the top 2 most similar items
results = store.search_vector(query_vec, k=2)
print("\nVector Search Results:")
for item_id, score in results:
print(f" ID: {item_id}, Score: {score:.4f}, Text: {store.get_item(item_id)['text']}")

Lexical Search

Find items containing specific keywords in their text.

query_text = "about"

# Find up to 3 items containing the word "about"
results = store.search_lexical(query_text, k=3)
print("\nLexical Search Results:")
for item_id, score in results: # Score is 1.0 for match, 0.0 otherwise
print(f" ID: {item_id}, Score: {score:.1f}, Text: {store.get_item(item_id)['text']}")

Hybrid Search

Combine vector and lexical relevance.

query_vec_fruit = np.array([0.5, 0.5, 0.0]) # Generic fruit vector
query_text_specific = "oranges"

# Find items, weighting vector similarity 60% and text match 40%
results = store.search_hybrid(
query_vector=query_vec_fruit,
query_text=query_text_specific,
k=2,
vector_weight=0.6
)
print("\nHybrid Search Results:")
for item_id, score in results:
print(f" ID: {item_id}, Combined Score: {score:.4f}, Text: {store.get_item(item_id)['text']}")

Filtering

Apply metadata filters before searching. Filters are passed as a dictionary to search methods.

query_vec_tech = np.array([0.1, 0.2, 0.7]) # Query related to tech/python

# Vector search for tech items from 2023 or later containing the 'code' tag
filters = {
"category": "tech",
"year__gte": 2023,
"tags__contains": "code"
}

results = store.search_vector(query_vec_tech, k=5, filters=filters)
print("\nFiltered Vector Search Results (tech, >=2023, 'code' tag):")
if results:
for item_id, score in results:
print(f" ID: {item_id}, Score: {score:.4f}, Text: {store.get_item(item_id)['text']}")
else:
print(" No items matched the filters.")

# Lexical search for fruit items where color is 'red' or 'orange'
filters_fruit_color = {
"category": "fruit",
"color__in": ["red", "orange"]
}
lex_results = store.search_lexical("about", k=5, filters=filters_fruit_color)
print("\nFiltered Lexical Search Results (fruit, red/orange):")
if lex_results:
for item_id, score in lex_results:
print(f" ID: {item_id}, Text: {store.get_item(item_id)['text']}")
else:
print(" No items matched the filters.")

Updates & Deletion

# Update text
store.update_text(id1, "Detailed information about crisp red apples.")

# Update metadata (merge by default)
store.update_metadata(id3, {"difficulty": "easy"})

# Update vector
store.update_vector(id3, np.array([0.05, 0.05, 0.9]))

# Delete an item
deleted = store.delete_item(id2)
print(f"\nDeleted item {id2}: {deleted}")
print(f"Store now contains {len(store)} items.")

Persistence

Save the store's state to a JSON file and load it back.

SAVE_FILE = "my_store_backup"

# Save the store
try:
store.save(SAVE_FILE)
print(f"\nStore saved to {SAVE_FILE}.json")
except Exception as e:
print(f"Error saving store: {e}")

# --- Later, or in another script ---

# Load the store (this is a class method, returns a new instance)
try:
loaded_store = SimpleVectorStore.load(SAVE_FILE)
print(f"\nStore loaded from {SAVE_FILE}.json")
print(f"Loaded store has {len(loaded_store)} items.")
print(f"Loaded store vector dimension: {loaded_store.vector_dim}")

# Verify loaded data
item = loaded_store.get_item(id1)
if item:
print(f"Retrieved item {id1} from loaded store: {item['text']}")

except FileNotFoundError:
print(f"Save file {SAVE_FILE}.json not found.")
except Exception as e:
print(f"Error loading store: {e}")

# Optional: Clean up the file
# import os
# os.remove(SAVE_FILE + ".json")

API Reference (Key Methods)

__init__(self, vector_dim=None): Initialize the store.
add_item(self, vector, text, metadata, item_id=None): Add/overwrite an item.
get_item(self, item_id): Retrieve an item's data.
delete_item(self, item_id): Remove an item.
update_vector(self, item_id, vector): Update an item's vector.
update_text(self, item_id, text): Update an item's text.
update_metadata(self, item_id, metadata_update, replace=False): Update/replace an item's metadata.
search_vector(self, query_vector, k=5, filters=None): Perform cosine similarity search.
search_lexical(self, query_text, k=5, filters=None): Perform substring text search.
search_hybrid(self, query_vector, query_text, k=5, filters=None, vector_weight=0.7, lexical_scorer=None): Perform weighted hybrid search.
save(self, filename_base): Save store state to filename_base.json.
load(cls, filename_base): Class method to load store state from filename_base.json.
__len__(self): Get the number of items.
list_ids(self): Get a list of all item IDs.

Limitations

In-Memory Only: The entire dataset must fit into available RAM. Not suitable for very large datasets.
Basic Search Performance: Vector search involves a linear scan. It will become slow with a very large number of items. No approximate nearest neighbor (ANN) indexing is implemented. Lexical search is also basic.
Scalability: Primarily designed for single-process use. Concurrent writes are not inherently thread-safe without external locking mechanisms.
JSON Serialization: Metadata must contain JSON-serializable types for persistence to work correctly.

Contributing

Contributions are welcome! If you have suggestions for improvements or find bugs, please feel free to open an issue or submit a pull request.

Fork the repository.
Create a new branch (git checkout -b feature/your-feature-name).
Make your changes.
Commit your changes (git commit -m 'Add some feature').
Push to the branch (git push origin feature/your-feature-name).
Open a pull request.

License

This project is licensed under the MIT License - see the LICENSE file (if one exists) or the MIT License text for details.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.0.3

Mar 27, 2025

This version

0.0.2

Mar 27, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

simple_vector_store-0.0.2.tar.gz (20.0 kB view details)

Uploaded Mar 27, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

simple_vector_store-0.0.2-py3-none-any.whl (17.3 kB view details)

Uploaded Mar 27, 2025 Python 3

File details

Details for the file simple_vector_store-0.0.2.tar.gz.

File metadata

Download URL: simple_vector_store-0.0.2.tar.gz
Upload date: Mar 27, 2025
Size: 20.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.10.11

File hashes

Hashes for simple_vector_store-0.0.2.tar.gz
Algorithm	Hash digest
SHA256	`14c6f93c5f9ed2e971b919341f1b88c48b697087ffe753d8e1f9db63e994ea29`
MD5	`11d4a829f8911dde1b1733adbb712f49`
BLAKE2b-256	`1e519441ef4f5d1c75cbafee9ac500f2ad9039aa584ca4b29862c3c681ab2989`

See more details on using hashes here.

File details

Details for the file simple_vector_store-0.0.2-py3-none-any.whl.

File metadata

Download URL: simple_vector_store-0.0.2-py3-none-any.whl
Upload date: Mar 27, 2025
Size: 17.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.10.11

File hashes

Hashes for simple_vector_store-0.0.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`9c564ee632ef9761ee67dbcf6d2cea805ec81ec3b1612a6ac38ea5854fc79fb6`
MD5	`accbceabaea3790b09300d0a10bcc720`
BLAKE2b-256	`7701d18bd05c8e164040e82bd88ebb918b296b215c4f4fc1b00dfa8fbbadb559`

See more details on using hashes here.

simple-vector-store 0.0.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

SimpleVectorStore

Overview

Features

Installation

Usage

Initialization

Adding Items

Vector Search

Lexical Search

Hybrid Search

Filtering

Updates & Deletion

Persistence

API Reference (Key Methods)

Limitations

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes