Skip to main content

Vector embedding generation and similarity search

Reason this release was yanked:

Not ready for use

Project description

our-embeddings

Vector embedding generation and similarity search for the ourochronos ecosystem.

Overview

our-embeddings provides a unified interface for generating and searching vector embeddings. It supports both local (sentence-transformers) and OpenAI providers, with a federation standard for cross-node embedding compatibility.

Default model: BAAI/bge-small-en-v1.5 (384 dimensions, L2-normalized).

Install

pip install our-embeddings

For local embeddings (default, no API key needed):

pip install our-embeddings[local]  # includes sentence-transformers

Usage

Generate Embeddings

from our_embeddings.service import generate_embedding, vector_to_pgvector

# Generate a 384-dim embedding vector
vector = generate_embedding("PostgreSQL is excellent for JSONB queries")

# Convert to pgvector format for storage
pg_str = vector_to_pgvector(vector)
# → "[0.0231,0.0891,...]"

Search Similar Content

from our_embeddings import search_similar

results = search_similar(
    query="database performance",
    content_type="belief",
    limit=10,
    min_similarity=0.5,
)
# Returns list of dicts with id, content, similarity score

Embed and Store

from our_embeddings import embed_content

result = embed_content(
    content_type="belief",
    content_id="uuid-here",
    text="Valence uses dimensional confidence",
)

Batch Operations

from our_embeddings.local import generate_embeddings_batch

vectors = generate_embeddings_batch(
    ["text one", "text two", "text three"],
    batch_size=32,
)

Backfill Missing Embeddings

from our_embeddings import backfill_embeddings

count = backfill_embeddings(content_type="belief", batch_size=100)

Configuration

EmbeddingConfig

from our_embeddings.config import EmbeddingConfig

config = EmbeddingConfig.from_env()
# Fields:
#   embedding_provider: str = "local"
#   embedding_model_path: str = "BAAI/bge-small-en-v1.5"
#   embedding_device: str = "cpu"
#   openai_api_key: str = ""

Environment Variables

Variable Default Description
VALENCE_EMBEDDING_PROVIDER local "local" or "openai"
VALENCE_EMBEDDING_MODEL_PATH BAAI/bge-small-en-v1.5 Model name or path
VALENCE_EMBEDDING_DEVICE cpu "cpu" or "cuda"
OPENAI_API_KEY Required if provider is openai

Providers

Local (default)

Uses sentence-transformers with BAAI/bge-small-en-v1.5:

  • 384 dimensions, L2-normalized
  • No API key required
  • Model loaded lazily and cached as singleton
  • Thread-safe initialization

OpenAI

Uses OpenAI text-embedding-3-small:

  • 1536 dimensions
  • Requires OPENAI_API_KEY
  • Text truncated to 8000 chars

Embedding Type Registry

Register and manage multiple embedding types:

from our_embeddings import register_embedding_type, list_embedding_types

register_embedding_type(
    type_id="local_bge_small",
    provider="local",
    model="BAAI/bge-small-en-v1.5",
    dimensions=384,
    is_default=True,
)

types = list_embedding_types(status="active")

Federation Standard

Cross-node embedding compatibility for federated knowledge sharing:

from our_embeddings import get_federation_standard, validate_federation_embedding

standard = get_federation_standard()
# → {"model": "BAAI/bge-small-en-v1.5", "dimensions": 384,
#    "type": "bge_small_en_v15", "normalization": "L2", "version": "1.0"}

valid, error = validate_federation_embedding([0.1, 0.2, ...])

Federation functions for belief exchange:

  • prepare_belief_for_federation(belief_id) — Package belief with embedding
  • validate_incoming_belief_embedding(data) — Validate received embeddings
  • regenerate_embedding_if_needed(data) — Re-embed if format differs

State Ownership

Owns the embedding_types and embedding_coverage tables in the valence schema. Reads/writes the embedding column on beliefs, vkb_exchanges, and vkb_patterns tables.

Development

make dev       # Install with dev dependencies
make lint      # Run linters
make test      # Run tests
make test-cov  # Tests with coverage
make format    # Auto-format

Part of Valence

This brick is part of the Valence knowledge substrate. See our-infra for ourochronos conventions.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

our_embeddings-0.1.1.tar.gz (22.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

our_embeddings-0.1.1-py3-none-any.whl (16.8 kB view details)

Uploaded Python 3

File details

Details for the file our_embeddings-0.1.1.tar.gz.

File metadata

  • Download URL: our_embeddings-0.1.1.tar.gz
  • Upload date:
  • Size: 22.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for our_embeddings-0.1.1.tar.gz
Algorithm Hash digest
SHA256 92afd75783969180763160f69c36ca14200dc1e218841bcad6266b4f8e81c829
MD5 de103cb483727a8dab399adda49e87d8
BLAKE2b-256 f1f6f28fb5c9318dd72014dea659aca55ea77cd600b240e34e7452b571b61a31

See more details on using hashes here.

File details

Details for the file our_embeddings-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: our_embeddings-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 16.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for our_embeddings-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 25a0c78506934c3944e6804d831e4d67ed0b76a3ce1e25161abe698e84c9630f
MD5 32e22e97b0ac7971e63663b9d0a6b58a
BLAKE2b-256 bcc57ccb08ca671794f54c3fa28b35752f1ff8ceaa1a9a8d6912e75f2c7c7c56

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page