Skip to main content

In-memory vector store with multi-metric similarity search.

Project description

philiprehberger-embedding-store

Tests PyPI version Last updated

philiprehberger-embedding-store

In-memory vector store with multi-metric similarity search.

Installation

pip install philiprehberger-embedding-store

Usage

from philiprehberger_embedding_store import VectorStore

store = VectorStore(dimensions=1536)

# Add vectors with metadata
store.add("doc1", embedding=[0.1, 0.2, ...], metadata={"title": "First doc"})
store.add("doc2", embedding=[0.3, 0.1, ...], metadata={"title": "Second doc"})

# Search by similarity
results = store.search(query_embedding=[0.15, 0.18, ...], top_k=5)
for result in results:
    print(f"{result.id}: score={result.score:.3f}, {result.metadata}")

Distance metrics

Choose a metric per store or override per search call:

from philiprehberger_embedding_store import VectorStore

# Set default metric at store level
store = VectorStore(dimensions=128, metric="euclidean")
results = store.search(query, top_k=5)

# Override metric for a single search
results = store.search(query, top_k=5, metric="manhattan")

Supported metrics: "cosine" (default), "dot", "euclidean", "manhattan".

Metadata filtering

from philiprehberger_embedding_store import VectorStore

store = VectorStore()
store.add("d1", [1.0, 0.0], {"category": "docs", "lang": "en"})
store.add("d2", [0.9, 0.1], {"category": "code", "lang": "en"})

# Filter by single field
results = store.search(query, filter=lambda m: m["category"] == "docs")

# Filter by multiple conditions
results = store.search(
    query,
    filter=lambda m: m["category"] == "docs" and m["lang"] == "en",
)

Batch operations

from philiprehberger_embedding_store import VectorStore

store = VectorStore()

# Add many vectors at once
store.add_many([
    ("id1", [0.1, 0.2], {"label": "first"}),
    ("id2", [0.3, 0.4], {"label": "second"}),
])

# Search with multiple queries at once
all_results = store.search_many(
    [query_embedding_1, query_embedding_2],
    top_k=5,
)

Score a single entry

Use score() to compute the similarity between a stored entry and an arbitrary query vector without running a full top-k search — handy for re-ranking or one-off comparisons.

from philiprehberger_embedding_store import VectorStore

store = VectorStore(metric="cosine")
store.add("doc1", [1.0, 0.0, 0.0])

store.score("doc1", [1.0, 0.0, 0.0])              # 1.0
store.score("doc1", [0.0, 1.0, 0.0])              # ~0.0
store.score("doc1", [1.0, 1.0, 1.0], metric="dot")  # 1.0

Persistence

from philiprehberger_embedding_store import VectorStore

store = VectorStore()
store.add("doc1", [0.1, 0.2], {"title": "Example"})

# Save to disk
store.save("vectors.json")

# Load from disk
loaded = VectorStore.load("vectors.json")

Store management

from philiprehberger_embedding_store import VectorStore

store = VectorStore()
store.add("a", [1.0, 0.0])

store.remove("a")      # Remove by ID
store.clear()           # Remove all entries

Updating and clearing

from philiprehberger_embedding_store import VectorStore

store = VectorStore(dimensions=3)
store.add("a", [1.0, 0.0, 0.0], {"version": 1})

# Replace the vector in place
store.update("a", vector=[0.0, 1.0, 0.0])

# Replace the metadata (wholesale)
store.update("a", metadata={"version": 2})

# Update both at once
store.update("a", vector=[0.0, 0.0, 1.0], metadata={"version": 3})

# Remove everything but keep the dimensionality (3) and metric configuration
store.clear()
assert len(store) == 0
store.add("b", [0.1, 0.2, 0.3])  # still constrained to 3 dimensions

API

Function / Class Description
VectorStore(dimensions, metric?) Create a store with optional dimensionality and metric
add(id, embedding, metadata?) Add a vector with optional metadata
add_many(items) Batch add multiple vectors
search(query, top_k?, metric?, filter?, min_score?) Similarity search
search_many(queries, top_k?, metric?, filter?, min_score?) Batch similarity search
score(id, query, metric?) Compute similarity between a stored entry and a query vector
get(id) Get entry by ID
delete(id) Delete entry by ID
remove(id) Remove entry by ID (alias for delete)
update_metadata(id, metadata) Update metadata for an entry
update(id, vector=None, metadata=None) Replace an entry's vector and/or metadata in place
save(path) Save store to JSON file
VectorStore.load(path) Load store from JSON file
clear() Remove all entries (preserves dimensionality and metric)
ids() List all stored IDs
len(store) Number of entries
id in store Check if ID exists
store.size Number of entries (property)
store.metric Current distance metric (property)

Development

pip install -e .
python -m pytest tests/ -v

Support

If you find this project useful:

Star the repo

🐛 Report issues

💡 Suggest features

❤️ Sponsor development

🌐 All Open Source Projects

💻 GitHub Profile

🔗 LinkedIn Profile

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

philiprehberger_embedding_store-0.5.0.tar.gz (182.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file philiprehberger_embedding_store-0.5.0.tar.gz.

File metadata

File hashes

Hashes for philiprehberger_embedding_store-0.5.0.tar.gz
Algorithm Hash digest
SHA256 cac4568c75ef8490a016d9b20086470dffaddce59d1d36ca78183e27e7fe1a26
MD5 7146954b956fe3edfcbf02fcaf046303
BLAKE2b-256 df9d867e8ac51adb8c3f9a0fd7b87a1f4609c5183c4b200beda33f4dc39e9cee

See more details on using hashes here.

File details

Details for the file philiprehberger_embedding_store-0.5.0-py3-none-any.whl.

File metadata

File hashes

Hashes for philiprehberger_embedding_store-0.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 095012f9b36a292c9cf9c7035a0ad23dd0dc5b74972a38a49ac7fe9f8afc0beb
MD5 1dd96d8b1ebd71a0476ca62ac4bb0a51
BLAKE2b-256 53180660148f7e56a8e0f0ec82776b4984b7e5bb7710aefc6d4a728a2b4a6336

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page