Skip to main content

Deterministic Knowledge Graph & Vector Engine with Bit-Exact Audit Trails

Project description

version

Valoricore

The Official Python SDK for Valori-Kernel

AI Memory That Is Cryptographically Auditable — By Design


License: MIT OR Apache-2.0 Python 3.8+ Rust Core PyPI Build


valoricore is the official Python SDK for Valori-Kernel — a no_std Rust engine that makes AI memory reproducible and provable.

Standard vector databases use floating-point arithmetic, which produces different search results on different CPUs. When a regulator or auditor asks you to replay an AI decision, you cannot guarantee the replay produces the same result on different hardware — or even on the same hardware after a library upgrade.

Valori fixes this by unifying Vector Memory and Knowledge Graphs using Q16.16 fixed-point arithmetic, producing bit-identical results across x86, ARM, and RISC-V. Every insert is recorded in a BLAKE3-chained event log. The entire state is summarised in a single Merkle root hash you can store, compare, and prove — without touching the database.

The core use case: any system where the AI's memory must be reproducible, tamper-evident, and independently verifiable. Finance, legal tech, autonomous systems, or any regulated environment where "trust, but verify" is a legal requirement — not an aspiration.


Why Replace Your Current Vector DB?

Feature Valoricore Chroma / FAISS / Pinecone Business Value
Results across hardware Bit-identical (Q16.16) Float drift Pass cross-platform audits; replay any AI decision with guaranteed identical output
Cryptographic state proof BLAKE3 Merkle root per insert None Prove exactly what data the AI saw at any point in time
Hybrid Vector + Graph Native, same memory space Separate systems Build GraphRAG pipelines without managing a second database
Offline proof verification No DB connection required N/A Auditors can verify AI decisions without accessing production
Snapshot / replay Byte-exact restore Partial / format-specific Disaster recovery that is provably correct, not just "probably fine"
no_std embeddable core Runs on ARM Cortex-M4 Heap-heavy Deploy AI memory to edge devices, browsers, and air-gapped systems
Multi-tenant collections Up to 1 024 isolated namespaces Tag filtering only True tenant isolation with zero cross-contamination risk

For Compliance & Audit Teams

Valori is not a black box. Every state change is written to a BLAKE3-chained append-only event log. An auditor or compliance officer can independently verify what data an AI system saw — without accessing the production database, without trusting the server, and without re-running the model.

from valoricore import ingest_embedding, generate_proof, verify_embedding

# Step 1 — AI system ingests a vector in production and stores the proof
vector      = [0.142, 0.897, 0.334, 0.561]   # e.g. an embedding of a document
fixed_vals  = ingest_embedding(vector)         # convert to deterministic Q16.16
proof_hex   = generate_proof(fixed_vals)       # BLAKE3 Merkle node — store this

print(f"Proof: {proof_hex}")
# → "a3f2c1d9..." (64-char hex)

# Step 2 — Auditor verifies it independently, months later, on any machine
is_valid = verify_embedding(floats=vector, claimed_hash=proof_hex)
print(f"Verified: {is_valid}")   # True — math doesn't lie

The proof is computed entirely in Rust (via the embedded FFI) with no network calls. It is deterministic because the underlying arithmetic is fixed-point — no floating-point rounding, no hardware-dependent results.

What the state hash proves: that the database contained exactly these records, in exactly this order, at the time the hash was recorded. Any tampering — insert, delete, or reorder — produces a different hash.


Installation

Valoricore ships with pre-compiled Rust binaries for Linux (x86-64, arm64), macOS (x86-64, Apple Silicon), and Windows. A Rust compiler is only required when building from source.

Core (vector DB + knowledge graph)

pip install valoricore

With local / offline embeddings

pip install "valoricore[local]"

With cloud embedding providers

pip install "valoricore[openai]"
pip install "valoricore[cohere]"

Full installation (all providers + LangChain + LlamaIndex)

pip install "valoricore[all]"

Optional integrations

pip install "valoricore[langchain]"
pip install "valoricore[llamaindex]"
pip install "valoricore[pdf]"

Quick Start

pip install "valoricore[local]"

The Audit Proof in 10 Lines

from valoricore import MemoryClient
from valoricore.embeddings import SentenceTransformerEmbedder

embedder = SentenceTransformerEmbedder("all-MiniLM-L6-v2")
client   = MemoryClient(path="./my_valori_db")

# Insert a document — chunked, embedded, and stored in the Knowledge Graph
client.add_document(
    text  = "Loan approved for application #A-20241107 at 14:32 UTC.",
    embed = embedder,
    title = "Decision Log",
)

# Semantic search
hits = client.semantic_search("loan approval decisions", embed=embedder, k=3)

# Every insert changes this hash deterministically.
# Run this on Apple Silicon, Intel, or ARM: the output is identical.
print(f"State hash: {client.get_state_hash()}")
# → e3b0c44298fc1c149afb...  (64-char BLAKE3 hex — the same on every machine)

This hash is your cryptographic receipt. Store it in a database, a blockchain, or an audit log. Anyone holding this hash and the original data can verify the state independently — no network connection, no trust in the server required.

Interactive Colab Notebooks

Test Valoricore in your browser with zero local setup:

1 · Embedded Local Engine (full example)

from valoricore import MemoryClient
from valoricore.embeddings import SentenceTransformerEmbedder

embedder = SentenceTransformerEmbedder("all-MiniLM-L6-v2")
client   = MemoryClient(path="./my_valori_db")

result = client.add_document(
    text  = "Valoricore is a deterministic Rust kernel that unifies "
            "vector memory and knowledge graphs.",
    embed = embedder,
    title = "Introduction",
)
print(f"Document Node ID : {result['document_node_id']}")
print(f"Chunk count      : {result['chunk_count']}")

hits = client.semantic_search("What does Valoricore unify?", embed=embedder, k=3)
for h in hits:
    print(f"  id={h['id']}  score={h['score']}")

print(f"State hash: {client.get_state_hash()}")

2 · Remote / Cluster Mode

Point remote at any node in the cluster. Writes are transparently redirected to the current leader (HTTP 307); the resolved leader is cached so subsequent writes skip the extra hop. During a leader election the client retries with exponential backoff before raising NotLeaderError.

from valoricore import MemoryClient, SyncRemoteClient
from valoricore.embeddings import OpenAIEmbedder

embedder = OpenAIEmbedder()

# MemoryClient — high-level, same API as local embedded
client = MemoryClient(remote="http://my-valori-node:3000")
result = client.add_document(text="Remote deployment with full audit trail.", embed=embedder)

# SyncRemoteClient — lower-level, direct access to all endpoints
from valoricore import SyncRemoteClient, NotLeaderError

node = SyncRemoteClient("http://my-valori-node:3000", max_retries=5, retry_backoff=0.3)

# Check cluster health before writing
if not node.cluster_health():
    raise RuntimeError("no leader elected yet")

status = node.cluster_status()
print(f"Leader: node {status['leader']}  Term: {status['term']}")

# Insert — redirects to the leader automatically
record_id = node.insert([0.1, 0.2, 0.3, 0.4])

# Linearizable read: reflects every write committed before this read (default in cluster mode)
hits = node.search([0.1, 0.2, 0.3, 0.4], k=5, consistency="linearizable")

# Eventually-consistent read: answered immediately from the local node (no leader round-trip)
hits_local = node.search([0.1, 0.2, 0.3, 0.4], k=5, consistency="local")

3 · Async API (FastAPI / asyncio)

import asyncio
from valoricore import AsyncMemoryClient
from valoricore.embeddings import SentenceTransformerEmbedder

embedder = SentenceTransformerEmbedder("all-MiniLM-L6-v2")

async def main():
    async with AsyncMemoryClient(path="./async_db") as client:
        result = await client.add_document(
            text  = "Non-blocking deterministic vector storage.",
            embed = embedder,
        )
        hits  = await client.semantic_search("Non-blocking search", embed=embedder, k=5)
        state = await client.get_state_hash()
        print(f"State: {state}")

asyncio.run(main())

Collections (Multi-tenancy)

Valori supports up to 1 024 named collections (namespaces). Every data operation accepts an optional collection parameter. The "default" collection always exists and cannot be dropped.

Records in different collections are fully isolated — a search scoped to "tenant-acme" never returns records from "tenant-beta" or the default collection, and vice versa.

from valoricore import SyncRemoteClient

client = SyncRemoteClient("http://localhost:3000")

# ── Create ────────────────────────────────────────────────────────────────────
result = client.create_collection("tenant-acme")
# {"name": "tenant-acme", "id": 1, "created": True}

# Idempotent — same name returns the existing ID
result2 = client.create_collection("tenant-acme")
# {"name": "tenant-acme", "id": 1, "created": False}

# ── List ──────────────────────────────────────────────────────────────────────
collections = client.list_collections()
# [{"name": "default", "id": 0}, {"name": "tenant-acme", "id": 1}]

# ── Scoped insert ─────────────────────────────────────────────────────────────
rid_a = client.insert([0.1, 0.2, 0.3, 0.4], collection="tenant-acme")
rid_b = client.insert([0.5, 0.6, 0.7, 0.8])   # lands in "default"

batch_ids = client.insert_batch(
    [[0.1, 0.2, 0.3, 0.4], [0.9, 0.8, 0.7, 0.6]],
    collection="tenant-acme",
)

# ── Scoped search ─────────────────────────────────────────────────────────────
# Only "tenant-acme" records are considered.
hits = client.search([0.1, 0.2, 0.3, 0.4], k=5, collection="tenant-acme")

# Default search — never includes "tenant-acme" records.
default_hits = client.search([0.1, 0.2, 0.3, 0.4], k=5)

# ── Drop ──────────────────────────────────────────────────────────────────────
client.drop_collection("tenant-acme")   # 204, removes all scoped records
# client.drop_collection("default")    # raises ValueError — default is protected

Collections work identically in async mode:

from valoricore import AsyncRemoteClient
import asyncio

async def main():
    client = AsyncRemoteClient("http://localhost:3000")

    await client.create_collection("tenant-async")
    ids = await client.insert_batch([[0.1]*4, [0.2]*4], collection="tenant-async")
    hits = await client.search([0.1]*4, k=5, collection="tenant-async")
    await client.drop_collection("tenant-async")
    await client.close()

asyncio.run(main())

Collections in a cluster

Collections are managed through the leader exactly like writes. Point the client at any node; the SDK follows the redirect automatically.

from valoricore import SyncRemoteClient

# Any node — redirects to leader for writes
node = SyncRemoteClient("http://cluster-node-1:3000")

# Create the collection (leader-only, 307-redirect handled automatically)
node.create_collection("tenant-acme")

# Insert through any node in the cluster
for url in ["http://cluster-node-1:3000", "http://cluster-node-2:3000"]:
    c = SyncRemoteClient(url)
    c.insert([0.1, 0.2, 0.3, 0.4], collection="tenant-acme")

# Search on any node — linearizable consistency ensures it reflects all writes
hits = node.search([0.1, 0.2, 0.3, 0.4], k=5,
                   collection="tenant-acme",
                   consistency="linearizable")

Embedding Providers

Provider Class Offline? Install
SentenceTransformers SentenceTransformerEmbedder Yes pip install "valoricore[local]"
OpenAI OpenAIEmbedder No pip install "valoricore[openai]"
Cohere CohereEmbedder No pip install "valoricore[cohere]"
HuggingFace Inference HuggingFaceEmbedder No (requests, built-in)
Ollama OllamaEmbedder Yes (local server) ollama pull nomic-embed-text
Dummy / CI DummyEmbedder Yes (built-in)
Hash / CI HashEmbedder Yes (built-in)

Convenience Factory

from valoricore.embeddings import get_embedder

embedder = get_embedder("local",       model_name="all-MiniLM-L6-v2")
embedder = get_embedder("openai",      api_key="sk-...")
embedder = get_embedder("ollama",      model="nomic-embed-text")
embedder = get_embedder("cohere",      api_key="...")
embedder = get_embedder("huggingface", api_key="hf_...", model="sentence-transformers/all-MiniLM-L6-v2")
embedder = get_embedder("dummy",       dim=384)   # CI / tests

LRU Caching

from valoricore.embeddings import SentenceTransformerEmbedder, CachedEmbedder

embedder = CachedEmbedder(SentenceTransformerEmbedder("BAAI/bge-small-en-v1.5"), max_size=5000)

Async Embedder

from valoricore.embeddings import SentenceTransformerEmbedder, AsyncEmbedder

async_embedder = AsyncEmbedder(SentenceTransformerEmbedder("all-MiniLM-L6-v2"))

async def pipeline():
    vec  = await async_embedder.embed("Hello")
    vecs = await async_embedder.embed_batch(["Hello", "World"])

Core Concepts

Records

A Record is a dense Q16.16 fixed-point vector stored in the kernel's RecordPool. Every insert returns an integer record_id and a BLAKE3 Merkle proof.

Nodes & Edges (Knowledge Graph)

A Node is a named entity that optionally points to a Record. An Edge is a directed relationship between two Nodes. Both live in the same memory space as the vector pool — no separate database.

Node Kinds

from valoricore import (
    NODE_RECORD,    # 0 – raw vector record
    NODE_CONCEPT,   # 1 – abstract concept
    NODE_AGENT,     # 2 – AI agent / process
    NODE_USER,      # 3 – human user
    NODE_TOOL,      # 4 – tool or function
    NODE_DOCUMENT,  # 5 – top-level document
    NODE_CHUNK,     # 6 – text chunk (child of document)
)

Edge Kinds

from valoricore import (
    EDGE_RELATION,   # 0 – generic relation
    EDGE_FOLLOWS,    # 1 – sequential ordering
    EDGE_MENTIONS,   # 4 – entity mention
    EDGE_REFERS_TO,  # 5 – cross-reference
    EDGE_PARENT_OF,  # 6 – hierarchical parent→child
)

Step-by-Step Usage Guide

Step 1 — Initialize

from valoricore import MemoryClient

# Local embedded (no server needed)
client = MemoryClient(
    path       = "./my_db",
    index_kind = "hnsw",        # "bruteforce" (default), "hnsw", or "ivf"
)

# Remote cluster
# client = MemoryClient(remote="http://my-node:3000")

Step 2 — Ingest Documents

# From a string
result = client.add_document(
    text       = open("report.txt").read(),
    embed      = embedder,
    title      = "Q4 Report",
    chunk_size = 512,
)

# From a PDF (requires: pip install "valoricore[pdf]")
from valoricore import load_text_from_file
result = client.add_document(text=load_text_from_file("report.pdf"), embed=embedder)

# Insert a raw pre-computed vector
result = client.upsert_vector(vector=[0.1, 0.2, ...])

Step 3 — Batch Insert

# Batch insert (high-throughput)
vectors = [[0.1] * 384, [0.2] * 384, [0.3] * 384]
ids = client.insert_batch(vectors)

# Batch insert with cryptographic proofs
results = client.insert_batch_with_proof(vectors, tags=[1, 2, 3])
for record_id, proof_bytes in results:
    print(f"id={record_id}  proof={proof_bytes.hex()[:16]}...")

Step 4 — Semantic Search

hits = client.semantic_search(
    query = "What is deterministic AI memory?",
    embed = embedder,
    k     = 10,
)

for hit in hits:
    print(f"Record ID : {hit['id']}")
    print(f"L2 Score  : {hit['score']}")   # lower = closer

Step 5 — Tag-Filtered Search

# Insert with tags to segment by tenant, user, or document type
client._db.insert([0.1] * 384, tag=42)

# Search within a specific tag only — O(1) overhead, 100% accuracy
hits = client._db.search([0.1] * 384, k=5, filter_tag=42)

Step 6 — Knowledge Graph (Fluent API)

Valoricore ships a high-level fluent API so you never have to manage raw integer IDs. db.node(), node.link_to(), and db.build_document() handle everything in one or two lines.

One-liner node creation

from valoricore import MemoryClient, Node
from valoricore.kinds import NODE_DOCUMENT, NODE_CHUNK, EDGE_PARENT_OF, EDGE_REFERS_TO

client = MemoryClient(path="./my_db", dim=384)

# Insert the embedding AND create the node in a single call — no manual ID juggling
doc   = client.node(NODE_DOCUMENT)
chunk = client.node(NODE_CHUNK, vector=my_embedding)  # inserts + creates, returns Node

print(doc)    # Node(id=0, kind=5, record_id=None)
print(chunk)  # Node(id=1, kind=6, record_id=0)

Method-chaining with link_to

# Create a directed edge from doc → chunk
doc.link_to(chunk, EDGE_PARENT_OF)

# Chain multiple edges in one line
c2 = client.node(NODE_CHUNK, vector=embedding_2)
c3 = client.node(NODE_CHUNK, vector=embedding_3)
doc.link_to([c2, c3], EDGE_PARENT_OF)   # link to a list at once

# Traverse back as Node objects
children = doc.children(EDGE_PARENT_OF)
# → [Node(id=1, ...), Node(id=2, ...), Node(id=3, ...)]

build_document context manager — the RAG pattern in 3 lines

embeddings = [embed(chunk) for chunk in text_chunks]   # your embedding function

with client.build_document(title="Q4 Report") as builder:
    for emb in embeddings:
        builder.add_chunk(emb)   # inserts vector, creates NODE_CHUNK, wires EDGE_PARENT_OF

# After the block:
doc_node   = builder.document    # root Node object
chunk_rids = builder.record_ids  # [0, 1, 2, …]  — pass to search for RAG retrieval

Before vs After

# ── BEFORE (low-level — works, but tedious) ──────────────────────────────────
rid1   = client._db.insert(emb1)
rid2   = client._db.insert(emb2)
doc_id = client.create_node(kind=NODE_DOCUMENT)
ch1    = client.create_node(kind=NODE_CHUNK, record_id=rid1)
ch2    = client.create_node(kind=NODE_CHUNK, record_id=rid2)
client.create_edge(from_id=doc_id, to_id=ch1, kind=EDGE_PARENT_OF)
client.create_edge(from_id=doc_id, to_id=ch2, kind=EDGE_PARENT_OF)

# ── AFTER (fluent — identical performance, far less code) ────────────────────
doc = client.node(NODE_DOCUMENT)
doc.link_to([
    client.node(NODE_CHUNK, vector=emb1),
    client.node(NODE_CHUNK, vector=emb2),
], EDGE_PARENT_OF)

Full agent memory example

from valoricore.kinds import NODE_AGENT, NODE_DOCUMENT, EDGE_BY_AGENT

# Agent node (no vector — it's a logical entity)
agent = client.node(NODE_AGENT)

# Document node linked to an embedding
doc = client.node(NODE_DOCUMENT, vector=my_embedding)

# Wire the relationship
agent.link_to(doc, EDGE_BY_AGENT)

# Traversal — everything returned as Node objects
visited = agent.walk(max_depth=2)         # [Node, Node, …]
rids    = agent.record_ids(max_depth=2)   # [0, 1, …]  for vector lookup

# Delete cascade (removes node + all edges)
doc.delete()

Low-level API (still fully supported)

# Raw integer IDs still work — the two styles mix freely
raw_nid = client.create_node(kind=NODE_DOCUMENT)
raw_eid = client.create_edge(from_id=raw_nid, to_id=chunk.id, kind=EDGE_PARENT_OF)

# db.edge() accepts Node objects OR raw ints
client.edge(doc, chunk, EDGE_REFERS_TO)    # Node objects
client.edge(3, 7, EDGE_REFERS_TO)          # raw ints

Step 7 — Metadata

import json

# Attach arbitrary metadata to a record (max 64 KB)
client.set_metadata(record_id=0, metadata=json.dumps({"source": "report.pdf", "page": 3}).encode())

# Retrieve it
raw = client.get_metadata(record_id=0)
meta = json.loads(raw)
print(meta["source"])   # "report.pdf"

Step 8 — Lifecycle

# Permanently remove record from pool and search index
client.delete(record_id=0)

# Soft delete: deactivates the record but preserves the pool slot for reuse.
# The record will no longer appear in search results.
# The state hash changes to reflect the deletion.
client.soft_delete(record_id=1)

print(f"Active records: {client.record_count()}")

Step 9 — Snapshot, Restore, and Audit

# Snapshot full kernel state to bytes
snap = client.snapshot()
with open("state.snap", "wb") as f:
    f.write(snap)

# Restore to a fresh engine — bit-exact
fresh = MemoryClient(path="./restored_db")
fresh.restore(snap)

assert fresh.get_state_hash() == client.get_state_hash()
print("Bit-exact restore verified")

# Full event timeline (append-only, human-readable)
for event in client.get_timeline():
    print(event)

Step 10 — Cryptographic Proof Verification (Offline)

from valoricore import ingest_embedding, generate_proof, verify_embedding

my_vector = [0.1] * 384

# Generate a standalone proof — no DB connection required
fixed_values = ingest_embedding(my_vector)   # float → Q16.16
proof_hex    = generate_proof(fixed_values)  # BLAKE3 Merkle node

# Verify on any machine, any time
is_valid = verify_embedding(floats=my_vector, claimed_hash=proof_hex)
print(f"Proof valid: {is_valid}")   # True

Framework Integrations

Both adapters live in valoricore.integrations — a single import, no adapter boilerplate, works in local embedded and remote HTTP modes without changing any code.

LangChain

pip install "valoricore[langchain]"

Local embedded (no server needed):

from valoricore.integrations import ValoricoreLangChain
from langchain_openai import OpenAIEmbeddings

store = ValoricoreLangChain(
    path       = "./my_db",
    embedding  = OpenAIEmbeddings(),
    index_kind = "hnsw",          # "bruteforce" | "hnsw" | "ivf"
)

# Add texts — batch embedded + batch inserted in one call
store.add_texts(
    texts     = ["Valoricore is deterministic.", "Fixed-point arithmetic rocks."],
    metadatas = [{"source": "intro"}, {"source": "math"}],
)

# Similarity search
docs = store.similarity_search("What is deterministic AI?", k=3)
for doc in docs:
    print(doc.page_content, doc.metadata)

# With distance scores (lower = closer)
pairs = store.similarity_search_with_score("fixed-point", k=3)

# Pre-computed vector search
docs = store.similarity_search_by_vector(my_embedding, k=3)

# Cryptographic audit hash
print(store.get_state_hash())   # 64-char BLAKE3 hex, survives crash recovery

Remote HTTP node:

store = ValoricoreLangChain(
    remote    = "http://my-valori-node:3000",
    embedding = OpenAIEmbeddings(),
)

From documents (standard LangChain factory pattern):

from langchain.document_loaders import PyPDFLoader

docs  = PyPDFLoader("report.pdf").load()
store = ValoricoreLangChain.from_documents(docs, OpenAIEmbeddings(), path="./db")

As a retriever in a RAG chain:

from langchain.chains import RetrievalQA
from langchain_openai import ChatOpenAI

# k and filter_tag are optional
retriever = store.as_retriever(k=5, filter_tag=tenant_id)

chain = RetrievalQA.from_chain_type(
    llm       = ChatOpenAI(),
    retriever = retriever,
)
answer = chain.run("What is deterministic AI memory?")

Tag-filtered search (tenant isolation):

# Insert records tagged by tenant
store.add_texts(["tenant A doc"], metadatas=[{"tenant": "A"}])

# Search only within a tag — O(1) overhead, 100% accuracy
docs = store.similarity_search("query", k=5, filter_tag=42)

LlamaIndex

pip install "valoricore[llamaindex]"

Local embedded:

from llama_index.core import VectorStoreIndex, StorageContext
from llama_index.core.node_parser import SentenceSplitter
from llama_index.embeddings.openai import OpenAIEmbedding
from valoricore.integrations import ValoricoreLlamaIndex

embed_model  = OpenAIEmbedding()
vector_store = ValoricoreLlamaIndex(
    path       = "./my_db",
    index_kind = "hnsw",    # "bruteforce" | "hnsw" | "ivf"
)

storage_ctx = StorageContext.from_defaults(vector_store=vector_store)
index       = VectorStoreIndex.from_documents(
    documents,
    storage_context = storage_ctx,
    embed_model     = embed_model,
    transformations = [SentenceSplitter(chunk_size=512)],
)

# Query
engine   = index.as_query_engine()
response = engine.query("What is deterministic AI memory?")
print(response)

Remote HTTP node:

vector_store = ValoricoreLlamaIndex(remote="http://my-valori-node:3000")

Similarity score semantics:

LlamaIndex expects similarity in (0, 1] where 1 = identical. Valoricore converts its raw Q16.16² L2 distance automatically: similarity = 1 / (1 + distance).

Audit hash:

print(vector_store.get_state_hash())   # 64-char BLAKE3 hex
snap = vector_store.snapshot()         # full kernel state as bytes
vector_store.restore(snap)             # bit-exact restore

Error Handling

from valoricore import (
    ValoricoreError,   # base — catch all SDK errors
    ValidationError,   # bad vector dimension / FXP out-of-range
    ConnectionError,   # remote node unreachable
    IntegrityError,    # BLAKE3 proof mismatch
    NotFoundError,     # record / node / edge doesn't exist
    KernelError,       # unrecoverable Rust kernel error
)

try:
    client.delete(record_id=9999)
except NotFoundError:
    print("Record does not exist")

try:
    client.upsert_vector([0.1] * 128)   # wrong dimension
except ValidationError as e:
    print(f"Bad embedding: {e}")

try:
    MemoryClient(remote="http://offline-node:3000").snapshot()
except ConnectionError as e:
    print(f"Node unreachable: {e}")

Performance

Operation Local FFI Remote HTTP
Single insert ~20 µs ~0.5 ms
Batch insert (1 k vectors) ~15 ms ~50 ms
L2 search (10 k × 384) ~8 ms ~10 ms
L2 search (100 k × 384) ~80 ms ~90 ms
Graph BFS (depth 2, 50 nodes) ~0.5 ms ~2 ms
State hash (BLAKE3) < 1 µs ~1 ms
Snapshot (10 k records) ~5 ms ~20 ms

Benchmarked on Apple M2. The local FFI path calls Rust directly with zero serialization overhead.

Note: Safe input range for embedding values is [-32767.0, 32767.0]. Standard normalized embeddings (OpenAI, SentenceTransformers) are always in [-1.0, 1.0] and are safe.


Configuration Reference

MemoryClient / AsyncMemoryClient

Parameter Type Default Description
path str "./valori_db" Local database directory
remote str | None None Remote node URL. When set, path is ignored
index_kind str "bruteforce" Vector index: "bruteforce", "hnsw", or "ivf"
quantization str "none" Quantization: "none", "scalar", or "product"

Valoricore / AsyncValoricore factory

Parameter Type Default Description
path str "./valori_db" Local database directory
remote str | None None Remote node URL
index_kind str "bruteforce" Vector index backend

Environment variables (server mode)

Variable Default Description
VALORI_MAX_RECORDS 1024 Soft record limit
VALORI_DIM 16 Embedding dimension
VALORI_INDEX bruteforce bruteforce, hnsw, or ivf
VALORI_QUANT (none) scalar or product
VALORI_SNAPSHOT_PATH (none) Path to write snapshots
VALORI_WAL_PATH (none) Path to write WAL
VALORI_EVENT_LOG_PATH (none) Path to write event log
VALORI_AUTH_TOKEN (none) Bearer token for HTTP API
VALORI_FOLLOWER_OF (none) Leader URL (enables follower mode)

Environment variables (cluster mode)

Set these to boot a node as a Raft cluster member instead of standalone.

Variable Description
VALORI_CLUSTER_MEMBERS id=raft_addr/api_addr,… — presence activates cluster mode. Example: 1=10.0.0.1:3100/10.0.0.1:3000,2=10.0.0.2:3100/10.0.0.2:3000
VALORI_NODE_ID This node's integer ID (must appear in VALORI_CLUSTER_MEMBERS).
VALORI_RAFT_BIND gRPC consensus listener address (default 0.0.0.0:3100).
VALORI_CLUSTER_INIT Set to 1 on exactly one node of a brand-new cluster to bootstrap it.
VALORI_RAFT_LOG_PATH Path to the redb file for the persistent Raft log. When set, the state machine also persists last_applied and the latest snapshot so audit events are never replayed after a restart.

API Reference

MemoryClient

Ingestion

Method Description
add_document(text, embed, title, doc_id, chunk_size) Chunk, embed, and store a document with Knowledge Graph links
add_chunks(chunks, embed, parent_document_node, title) Lower-level chunked ingestion
upsert_vector(vector, attach_to_document_node) Insert a raw pre-computed vector
insert_batch(vectors) Batch insert multiple raw vectors
insert_batch_with_proof(vectors, tags) Batch insert with per-record BLAKE3 proofs

Search

Method Description
semantic_search(query, embed, k) Embed query string and return nearest neighbours

Lifecycle

Method Description
delete(record_id) Permanently remove record from pool and index
soft_delete(record_id) Deactivate record; slot preserved for reuse; state hash updated
record_count() Total active records

Metadata

Method Description
get_metadata(record_id) Retrieve raw binary metadata for a record
set_metadata(record_id, metadata) Attach up to 64 KB of binary metadata to a record

Persistence & Audit

Method Description
snapshot() Serialize full kernel state to bytes
restore(data) Replace current state with a snapshot
get_state_hash() 64-char BLAKE3 hex digest of the entire kernel state
get_timeline() Chronological list of all state transitions from the event log

Knowledge Graph — Fluent API (recommended)

Method Returns Description
node(kind, vector=None, tag=0) Node Create a node; optionally insert a vector and link it in one call
edge(from_node, to_node, kind) int Create an edge; accepts Node objects or raw integer IDs
build_document(title=None) DocumentGraph Context manager: builds doc → chunk graph with no ID bookkeeping

Node object methods

Method Returns Description
node.link_to(other, edge_kind) self Create edge(s) from this node; other may be a Node, int, or list of either
node.link_from(other, edge_kind) self Create edge from other to this node
node.children(edge_kind=None) List[Node] Outgoing neighbours, optionally filtered by edge kind
node.walk(max_depth=2) List[Node] BFS traversal; returns visited Node objects
node.record_ids(max_depth=2) List[int] All reachable vector record IDs (for RAG retrieval)
node.delete() None Cascade-delete node and all incident edges
int(node) int Escape hatch to the raw integer ID

DocumentGraph context manager

Attribute / Method Description
builder.add_chunk(vector, tag=0, metadata=None) Insert vector, create NODE_CHUNK, wire EDGE_PARENT_OF; returns Node
builder.document The root NODE_DOCUMENT Node
builder.chunks Ordered list of chunk Node objects
builder.record_ids List of vector record IDs in insertion order

Knowledge Graph — Low-Level API (still fully supported)

Method Description
create_node(kind, record_id) Create a graph node; returns integer node ID
create_edge(from_id, to_id, kind) Create a directed edge; returns integer edge ID
delete_node(node_id) Cascade-delete a node and all its incident edges
delete_edge(edge_id) Delete a single edge
get_node(node_id) Fetch node kind and attached record_id
get_edges(node_id) Fetch all outgoing edges
walk(start_node, max_depth) BFS traversal; returns visited node IDs
expand(start_node, max_depth) BFS traversal; returns reachable record IDs

SyncRemoteClient / AsyncRemoteClient

These clients expose the full API surface when talking to a running node over HTTP. SyncRemoteClient uses requests; AsyncRemoteClient uses httpx and must be awaited. Both are cluster-aware (automatic leader redirect, retry with backoff).

Collections

Method Returns Description
create_collection(name) {"name", "id", "created"} Create a namespace. Idempotent.
list_collections() [{"name", "id"}, …] List all namespaces.
drop_collection(name) None Drop a namespace and all its records. Raises ValueError for "default".

All data methods accept collection: str = "default":

Method Collection-aware parameter
insert(vector, tag, collection)
insert_batch(batch, collection)
search(query, k, filter_tag, consistency, collection) ✅ (also accepts consistency="linearizable"|"local")

Cluster

Method Returns Description
cluster_status() dict Leader node ID, term, log indices, membership table.
cluster_health() bool True when a leader is visible; False during election.
get_state_hash() str 64-char BLAKE3 hex digest of the current kernel state.

Module-Level Cryptographic Helpers

from valoricore import ingest_embedding, generate_proof, verify_embedding

fixed = ingest_embedding([0.1, 0.2, 0.3])   # List[float] → List[int] (Q16.16)
proof = generate_proof(fixed)               # → hex string (BLAKE3 Merkle root)
ok    = verify_embedding([0.1, 0.2, 0.3], proof)  # → bool

These functions are implemented in Rust (via PyO3) and work offline — no running engine required.


License

MIT OR Apache-2.0 — see LICENSE-MIT.

You may use Valoricore in proprietary, commercial, and on-premise deployments without any copyleft obligations. For enterprise support, SLA agreements, or custom deployment assistance, contact: varshith.gudur17@gmail.com

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

valoricore-0.2.2.tar.gz (380.0 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

valoricore-0.2.2-cp39-abi3-win_amd64.whl (2.6 MB view details)

Uploaded CPython 3.9+Windows x86-64

valoricore-0.2.2-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.3 MB view details)

Uploaded CPython 3.9+manylinux: glibc 2.17+ x86-64

valoricore-0.2.2-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (3.5 MB view details)

Uploaded CPython 3.9+manylinux: glibc 2.17+ ARM64

valoricore-0.2.2-cp39-abi3-macosx_11_0_arm64.whl (3.0 MB view details)

Uploaded CPython 3.9+macOS 11.0+ ARM64

valoricore-0.2.2-cp39-abi3-macosx_10_12_x86_64.whl (3.1 MB view details)

Uploaded CPython 3.9+macOS 10.12+ x86-64

File details

Details for the file valoricore-0.2.2.tar.gz.

File metadata

  • Download URL: valoricore-0.2.2.tar.gz
  • Upload date:
  • Size: 380.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for valoricore-0.2.2.tar.gz
Algorithm Hash digest
SHA256 c786f3d3e74ef5f6eb1f9ea5adae736bc2736b54d0196db4a7f98b8f1e78e4da
MD5 2d2b7f662ad4bcfd09c3d0676dd143bd
BLAKE2b-256 8d312baa81e737b3a561679defc9aa1ead77470b28f9fb763a951bdca4a495d8

See more details on using hashes here.

Provenance

The following attestation bundles were made for valoricore-0.2.2.tar.gz:

Publisher: publish-pypi.yml on varshith-Git/Valori-Kernel

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file valoricore-0.2.2-cp39-abi3-win_amd64.whl.

File metadata

  • Download URL: valoricore-0.2.2-cp39-abi3-win_amd64.whl
  • Upload date:
  • Size: 2.6 MB
  • Tags: CPython 3.9+, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for valoricore-0.2.2-cp39-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 44ef77b806a6aa87a3495ecfc6e9f18da18282522f250c0de00b6478257530c7
MD5 2d45ce61d1ef1c69d14a12c0a8183175
BLAKE2b-256 6a3fc0a2855800e7e956d064b91aa01c6e049c0b2da7dfe3f7930a92ef16fa4e

See more details on using hashes here.

Provenance

The following attestation bundles were made for valoricore-0.2.2-cp39-abi3-win_amd64.whl:

Publisher: publish-pypi.yml on varshith-Git/Valori-Kernel

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file valoricore-0.2.2-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for valoricore-0.2.2-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 cf3f5c297f7390cb2b6f17ad9f8bb1dc4e8f6f91f2188f64ad7d3fc63ef08aa7
MD5 5c6fa2142397c075465e5570316d91b2
BLAKE2b-256 5c14b407404cc8c5692bbd1f3f00ff75d99390a1d54017c953c77f2577e92161

See more details on using hashes here.

Provenance

The following attestation bundles were made for valoricore-0.2.2-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: publish-pypi.yml on varshith-Git/Valori-Kernel

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file valoricore-0.2.2-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for valoricore-0.2.2-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 2c9f4d93e63dc7baea6331b0f965412282d195e0349b0492f56e751bc3ae048d
MD5 4fc7326c51466e0554f60c619cadee26
BLAKE2b-256 8a19998e79a55ae7b2fbc4eb1eb8595dd1baa2e0da921e3a9e29f315e9fda2c5

See more details on using hashes here.

Provenance

The following attestation bundles were made for valoricore-0.2.2-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: publish-pypi.yml on varshith-Git/Valori-Kernel

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file valoricore-0.2.2-cp39-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for valoricore-0.2.2-cp39-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 64e5c3ea535258d58e4c973ded31ef0dd322d9c24b90c464e1b7b28add493c6f
MD5 75874b3f687fab29922db18c5e5a566c
BLAKE2b-256 dc359519ce685c791064d2a6c3ed9a8456fad1e207e61062f879bc83e04a75e6

See more details on using hashes here.

Provenance

The following attestation bundles were made for valoricore-0.2.2-cp39-abi3-macosx_11_0_arm64.whl:

Publisher: publish-pypi.yml on varshith-Git/Valori-Kernel

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file valoricore-0.2.2-cp39-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for valoricore-0.2.2-cp39-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 a7757f4391a420fbcc85b7ee3b9bd5b971b0beebf51af797c3f6c7fd1d69e239
MD5 e0088f17548ef9fa6567b2b07f37af43
BLAKE2b-256 522ba65cf291ba9dbe34bea251c99db220ce7791c0bc4477411df2849dd4107b

See more details on using hashes here.

Provenance

The following attestation bundles were made for valoricore-0.2.2-cp39-abi3-macosx_10_12_x86_64.whl:

Publisher: publish-pypi.yml on varshith-Git/Valori-Kernel

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page