Vector embedding generation and similarity search
Reason this release was yanked:
Not ready for use
Project description
our-embeddings
Vector embedding generation and similarity search for the ourochronos ecosystem.
Overview
our-embeddings provides a unified interface for generating and searching vector embeddings. It supports both local (sentence-transformers) and OpenAI providers, with a federation standard for cross-node embedding compatibility.
Default model: BAAI/bge-small-en-v1.5 (384 dimensions, L2-normalized).
Install
pip install our-embeddings
For local embeddings (default, no API key needed):
pip install our-embeddings[local] # includes sentence-transformers
Usage
Generate Embeddings
from our_embeddings.service import generate_embedding, vector_to_pgvector
# Generate a 384-dim embedding vector
vector = generate_embedding("PostgreSQL is excellent for JSONB queries")
# Convert to pgvector format for storage
pg_str = vector_to_pgvector(vector)
# → "[0.0231,0.0891,...]"
Search Similar Content
from our_embeddings import search_similar
results = search_similar(
query="database performance",
content_type="belief",
limit=10,
min_similarity=0.5,
)
# Returns list of dicts with id, content, similarity score
Embed and Store
from our_embeddings import embed_content
result = embed_content(
content_type="belief",
content_id="uuid-here",
text="Valence uses dimensional confidence",
)
Batch Operations
from our_embeddings.local import generate_embeddings_batch
vectors = generate_embeddings_batch(
["text one", "text two", "text three"],
batch_size=32,
)
Backfill Missing Embeddings
from our_embeddings import backfill_embeddings
count = backfill_embeddings(content_type="belief", batch_size=100)
Configuration
EmbeddingConfig
from our_embeddings.config import EmbeddingConfig
config = EmbeddingConfig.from_env()
# Fields:
# embedding_provider: str = "local"
# embedding_model_path: str = "BAAI/bge-small-en-v1.5"
# embedding_device: str = "cpu"
# openai_api_key: str = ""
Environment Variables
| Variable | Default | Description |
|---|---|---|
VALENCE_EMBEDDING_PROVIDER |
local |
"local" or "openai" |
VALENCE_EMBEDDING_MODEL_PATH |
BAAI/bge-small-en-v1.5 |
Model name or path |
VALENCE_EMBEDDING_DEVICE |
cpu |
"cpu" or "cuda" |
OPENAI_API_KEY |
— | Required if provider is openai |
Providers
Local (default)
Uses sentence-transformers with BAAI/bge-small-en-v1.5:
- 384 dimensions, L2-normalized
- No API key required
- Model loaded lazily and cached as singleton
- Thread-safe initialization
OpenAI
Uses OpenAI text-embedding-3-small:
- 1536 dimensions
- Requires
OPENAI_API_KEY - Text truncated to 8000 chars
Embedding Type Registry
Register and manage multiple embedding types:
from our_embeddings import register_embedding_type, list_embedding_types
register_embedding_type(
type_id="local_bge_small",
provider="local",
model="BAAI/bge-small-en-v1.5",
dimensions=384,
is_default=True,
)
types = list_embedding_types(status="active")
Federation Standard
Cross-node embedding compatibility for federated knowledge sharing:
from our_embeddings import get_federation_standard, validate_federation_embedding
standard = get_federation_standard()
# → {"model": "BAAI/bge-small-en-v1.5", "dimensions": 384,
# "type": "bge_small_en_v15", "normalization": "L2", "version": "1.0"}
valid, error = validate_federation_embedding([0.1, 0.2, ...])
Federation functions for belief exchange:
prepare_belief_for_federation(belief_id)— Package belief with embeddingvalidate_incoming_belief_embedding(data)— Validate received embeddingsregenerate_embedding_if_needed(data)— Re-embed if format differs
State Ownership
Owns the embedding_types and embedding_coverage tables in the valence schema. Reads/writes the embedding column on beliefs, vkb_exchanges, and vkb_patterns tables.
Development
make dev # Install with dev dependencies
make lint # Run linters
make test # Run tests
make test-cov # Tests with coverage
make format # Auto-format
Part of Valence
This brick is part of the Valence knowledge substrate. See our-infra for ourochronos conventions.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file our_embeddings-0.1.1.tar.gz.
File metadata
- Download URL: our_embeddings-0.1.1.tar.gz
- Upload date:
- Size: 22.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
92afd75783969180763160f69c36ca14200dc1e218841bcad6266b4f8e81c829
|
|
| MD5 |
de103cb483727a8dab399adda49e87d8
|
|
| BLAKE2b-256 |
f1f6f28fb5c9318dd72014dea659aca55ea77cd600b240e34e7452b571b61a31
|
File details
Details for the file our_embeddings-0.1.1-py3-none-any.whl.
File metadata
- Download URL: our_embeddings-0.1.1-py3-none-any.whl
- Upload date:
- Size: 16.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
25a0c78506934c3944e6804d831e4d67ed0b76a3ce1e25161abe698e84c9630f
|
|
| MD5 |
32e22e97b0ac7971e63663b9d0a6b58a
|
|
| BLAKE2b-256 |
bcc57ccb08ca671794f54c3fa28b35752f1ff8ceaa1a9a8d6912e75f2c7c7c56
|