Deterministic Knowledge Graph & Vector Engine with Bit-Exact Audit Trails
Project description
Valoricore ๐ก๏ธ
The Official Python SDK for Valori-Kernel
Deterministic Vector Memory ยท Cryptographic Audit Trails ยท Hybrid Knowledge Graphs
valoricore is the official Python SDK for Valori-Kernel โ a no_std Rust engine that unifies Vector Memory and Knowledge Graphs into a single, cryptographically auditable memory space.
Every insert, search, and graph edge is backed by fixed-point Q16.16 arithmetic, producing bit-identical results across x86, ARM, and RISC-V. The global state is always summarised in a single BLAKE3 Merkle root you can store, compare, and prove.
โจ What Makes Valoricore Different?
| Feature | Valoricore | Chroma / FAISS / Pinecone |
|---|---|---|
| Results across hardware | โ Bit-identical (Q16.16 fixed-point) | โ Float drift |
| Cryptographic state proof | โ BLAKE3 Merkle root per operation | โ None |
| Hybrid Vector + Graph | โ Native, same memory space | โ ๏ธ Graph is separate system |
| Offline proof verification | โ No DB connection required | โ N/A |
| Snapshot / replay | โ Byte-exact restore | โ ๏ธ Partial / format-specific |
no_std embeddable core |
โ Zero heap allocation in kernel | โ Heap-heavy |
| Air-gapped deployment | โ Local FFI, no cloud required | โ ๏ธ Varies |
๐ฆ Installation
Valoricore ships with pre-compiled Rust binaries for Linux (x86-64, arm64), macOS (x86-64, Apple Silicon), and Windows. A Rust compiler is only required when building from source.
Core (vector DB + knowledge graph)
pip install valoricore
With local / offline embeddings (no API key needed)
pip install "valoricore[local]"
# Uses sentence-transformers + PyTorch
With cloud embedding providers
pip install "valoricore[openai]" # OpenAI text-embedding-3-*
pip install "valoricore[cohere]" # Cohere embed-english-v3.0
Full installation (all providers + LangChain + LlamaIndex)
pip install "valoricore[all]"
Optional integrations
pip install "valoricore[langchain]" # LangChain VectorStore + Retriever
pip install "valoricore[llamaindex]" # LlamaIndex VectorStore
pip install "valoricore[pdf]" # PDF document ingestion (pypdf)
๐ Quick Start
1 ยท Embedded Local Engine (no server required)
from valoricore import MemoryClient
from valoricore.embeddings import SentenceTransformerEmbedder
# โ Load a local model (downloads once, runs fully offline after that)
embedder = SentenceTransformerEmbedder("all-MiniLM-L6-v2") # dim=384
# โก Initialize the embedded Rust engine
client = MemoryClient(path="./my_valori_db")
# โข Add a document โ automatically chunks, embeds, and links in the Knowledge Graph
result = client.add_document(
text = "Valoricore is a deterministic, no_std Rust kernel "
"that unifies vector memory and knowledge graphs.",
embed = embedder,
title = "Introduction",
)
print(f"Document Node ID : {result['document_node_id']}")
print(f"Chunk count : {result['chunk_count']}")
print(f"Proof hashes : {result['proof_hashes']}")
# โฃ Semantic search
hits = client.semantic_search("What does Valoricore unify?", embed=embedder, k=3)
for h in hits:
print(f" id={h['id']} l2_score={h['score']}")
# โค Cryptographic state proof
print(f"\nDatabase state : {client.get_state_hash()}")
2 ยท Remote / Cluster Mode
Connect to a standalone valori-node HTTP server and use the exact same API:
from valoricore import MemoryClient
from valoricore.embeddings import OpenAIEmbedder
embedder = OpenAIEmbedder() # reads OPENAI_API_KEY from env
# Simply pass a remote URL โ everything else is identical
client = MemoryClient(remote="http://my-valori-node:3000")
result = client.add_document(
text = "Remote deployment with full audit trail.",
embed = embedder,
)
print(result["document_node_id"])
# Snapshot the remote node state to local bytes
snap = client.snapshot()
with open("backup.snap", "wb") as f:
f.write(snap)
3 ยท Async API (FastAPI / asyncio)
import asyncio
from valoricore import AsyncMemoryClient
from valoricore.embeddings import SentenceTransformerEmbedder
embedder = SentenceTransformerEmbedder("all-MiniLM-L6-v2")
async def main():
# Async context manager โ auto-closes on exit
async with AsyncMemoryClient(path="./async_db") as client:
result = await client.add_document(
text = "Non-blocking deterministic vector storage.",
embed = embedder,
)
print(f"node_id={result['document_node_id']}")
hits = await client.semantic_search(
"Non-blocking search", embed=embedder, k=5
)
print(f"Found {len(hits)} results")
# Snapshot + audit from async context
snap = await client.snapshot()
state = await client.get_state_hash()
print(f"State: {state}")
asyncio.run(main())
๐ Embedding Providers
The valoricore.embeddings module provides production-ready adapters for every major embedding provider. Every adapter implements __call__ so it works directly wherever an EmbedFn is accepted.
Provider Overview
| Provider | Class | Offline? | Extra install |
|---|---|---|---|
| SentenceTransformers | SentenceTransformerEmbedder |
โ Yes | pip install "valoricore[local]" |
| OpenAI | OpenAIEmbedder |
โ Cloud | pip install "valoricore[openai]" |
| Cohere | CohereEmbedder |
โ Cloud | pip install "valoricore[cohere]" |
| HuggingFace Inference | HuggingFaceEmbedder |
โ Cloud | (requests, built-in) |
| Ollama | OllamaEmbedder |
โ Local server | ollama pull nomic-embed-text |
| Dummy / CI | DummyEmbedder |
โ Yes | (built-in) |
| Hash / CI | HashEmbedder |
โ Yes | (built-in) |
Local / Offline Production (Recommended for Air-Gapped Environments)
from valoricore.embeddings import SentenceTransformerEmbedder, CachedEmbedder
# High-quality model, fully offline after first download
raw_embedder = SentenceTransformerEmbedder(
model_name = "BAAI/bge-small-en-v1.5", # dim=384, state-of-the-art
device = "cpu", # or "cuda", "mps"
normalize = True, # cosine similarity friendly
)
# Optional: wrap with LRU cache to avoid re-embedding identical texts
embedder = CachedEmbedder(raw_embedder, max_size=5000)
OpenAI (Cloud)
import os
from valoricore.embeddings import OpenAIEmbedder
embedder = OpenAIEmbedder(
api_key = os.environ["OPENAI_API_KEY"], # or pass directly
model = "text-embedding-3-small", # dim=1536
dimensions = 384, # optional truncation (3-* models only)
)
Ollama (Local Server โ Zero Cloud Dependency)
# One-time setup
brew install ollama && ollama serve
ollama pull nomic-embed-text # dim=768
from valoricore.embeddings import OllamaEmbedder
embedder = OllamaEmbedder(
model = "nomic-embed-text",
base_url = "http://localhost:11434",
)
Convenience Factory
from valoricore.embeddings import get_embedder
# Swap providers with one line change
embedder = get_embedder("local", model_name="all-MiniLM-L6-v2")
embedder = get_embedder("openai", api_key="sk-...")
embedder = get_embedder("ollama", model="nomic-embed-text")
embedder = get_embedder("cohere", api_key="...")
embedder = get_embedder("huggingface", api_key="hf_...", model="sentence-transformers/all-MiniLM-L6-v2")
embedder = get_embedder("dummy", dim=384) # CI / tests
Async Embedder (for asyncio pipelines)
from valoricore.embeddings import SentenceTransformerEmbedder, AsyncEmbedder
sync_embedder = SentenceTransformerEmbedder("all-MiniLM-L6-v2")
async_embedder = AsyncEmbedder(sync_embedder) # runs in thread-pool
async def pipeline():
vec = await async_embedder.embed("Hello")
vecs = await async_embedder.embed_batch(["Hello", "World"])
๐ง Core Concepts
Records
A Record is a dense fixed-point vector stored in the kernel's RecordPool. Every insert returns an integer record_id and a BLAKE3 Merkle proof.
Nodes & Edges (Knowledge Graph)
A Node is a named entity that optionally points to a Record. An Edge is a directed relationship between two Nodes. The graph is stored in the same memory space as the vector pool โ no separate database.
Node Kinds (built-in constants)
from valoricore import (
NODE_RECORD, # 0 โ raw vector record
NODE_CONCEPT, # 1 โ abstract concept
NODE_AGENT, # 2 โ AI agent / process
NODE_USER, # 3 โ human user
NODE_TOOL, # 4 โ tool or function
NODE_DOCUMENT, # 5 โ top-level document
NODE_CHUNK, # 6 โ text chunk (child of document)
)
Edge Kinds (built-in constants)
from valoricore import (
EDGE_RELATION, # 0 โ generic relation
EDGE_FOLLOWS, # 1 โ sequential ordering
EDGE_IN_EPISODE, # 2 โ membership in episode
EDGE_BY_AGENT, # 3 โ created/sent by agent
EDGE_MENTIONS, # 4 โ entity mention
EDGE_REFERS_TO, # 5 โ cross-reference
EDGE_PARENT_OF, # 6 โ hierarchical parentโchild
)
๐ Step-by-Step Usage Guide
Step 1 โ Install & Verify
pip install "valoricore[local]"
python -c "import valoricore; print(valoricore.__version__)"
Step 2 โ Choose Your Embedding Provider
from valoricore.embeddings import get_embedder
# Local (no API key, no internet after first download)
embedder = get_embedder("local", model_name="all-MiniLM-L6-v2")
# OpenAI
# embedder = get_embedder("openai") # reads OPENAI_API_KEY env var
# CI / testing (zero-cost, deterministic)
# embedder = get_embedder("dummy", dim=384)
Step 3 โ Initialize the Client
from valoricore import MemoryClient
# Local embedded engine (no server needed)
client = MemoryClient(path="./my_db")
# OR connect to a remote cluster
# client = MemoryClient(remote="http://my-node:3000")
Step 4 โ Ingest Documents
# From a string
result = client.add_document(
text = open("my_paper.txt").read(),
embed = embedder,
title = "My Paper",
chunk_size = 512, # chars per chunk
)
# From a PDF file (requires: pip install "valoricore[pdf]")
from valoricore import load_text_from_file
text = load_text_from_file("report.pdf")
result = client.add_document(text=text, embed=embedder)
# Insert a raw pre-computed vector
result = client.upsert_vector(vector=[0.1, 0.2, ...]) # len must match kernel dim
Step 5 โ Semantic Search
hits = client.semantic_search(
query = "What is deterministic AI memory?",
embed = embedder,
k = 10,
)
for hit in hits:
print(f"Record ID : {hit['id']}")
print(f"L2 Score : {hit['score']}") # lower = closer (L2 squared)
Step 6 โ Knowledge Graph Operations
from valoricore import NODE_AGENT, NODE_DOCUMENT, EDGE_BY_AGENT
# Manual graph construction
record_id = client._db.insert([0.5] * 384)
agent_node = client.create_node(kind=NODE_AGENT)
doc_node = client.create_node(kind=NODE_DOCUMENT, record_id=record_id)
# Link agent โ document
client.create_edge(from_id=agent_node, to_id=doc_node, kind=EDGE_BY_AGENT)
# Inspect
print(client.get_node(doc_node)) # {"kind": 5, "record_id": 0}
print(client.get_edges(agent_node)) # [{"edge_id": 0, "to_node": 1, "kind": 3}]
# Traversal: BFS up to depth 2
visited_nodes = client.walk(agent_node, max_depth=2)
# Collect all record_ids reachable from a starting node
record_ids = client.expand(agent_node, max_depth=2)
Step 7 โ Lifecycle (Delete, Soft Delete)
# Permanently remove record from pool and search index
client.delete(record_id=0)
# Soft delete: deactivates record but preserves pool slot
client.soft_delete(record_id=1)
# Count active records
n = client.record_count()
print(f"Active records: {n}")
Step 8 โ Snapshot, Restore, and Audit
# Snapshot the full kernel state to bytes
snap = client.snapshot()
with open("state.snap", "wb") as f:
f.write(snap)
# Restore to a fresh engine (bit-exact)
fresh = MemoryClient(path="./restored_db")
fresh.restore(snap)
# The state hashes must be identical
assert fresh.get_state_hash() == client.get_state_hash()
print("โ
Bit-exact restore verified")
# View full event timeline
for event in client.get_timeline():
print(event)
Step 9 โ Cryptographic Proof Verification (Offline)
from valoricore import ingest_embedding, generate_proof, verify_embedding
my_vector = [0.1] * 384
# Generate a standalone proof for this vector (no DB connection required)
fixed_values = ingest_embedding(my_vector) # float โ Q16.16
proof_hex = generate_proof(fixed_values) # BLAKE3 Merkle node
# Verify offline โ proves the vector has not been tampered with
is_valid = verify_embedding(floats=my_vector, claimed_hash=proof_hex)
print(f"Proof valid: {is_valid}")
๐ Framework Integrations
LangChain
pip install "valoricore[langchain]"
from langchain_openai import OpenAIEmbeddings
from valoricore.adapters import ValoricoreAdapter, LangChainVectorStore
adapter = ValoricoreAdapter(base_url="http://localhost:3000")
embeddings = OpenAIEmbeddings()
vectorstore = LangChainVectorStore(adapter=adapter, embedding=embeddings)
# Add documents
vectorstore.add_texts(
texts = ["Valoricore is deterministic.", "Fixed-point arithmetic rocks."],
metadatas = [{"source": "intro"}, {"source": "math"}],
)
# Search
docs = vectorstore.similarity_search("What is deterministic AI?", k=3)
for doc in docs:
print(doc.page_content)
# With scores
docs_scores = vectorstore.similarity_search_with_score("deterministic", k=3)
for doc, score in docs_scores:
print(f"{doc.page_content[:60]}โฆ score={score:.4f}")
LangChain Retriever:
from valoricore.adapters import ValoricoreAdapter, LangChainRetriever
adapter = ValoricoreAdapter(base_url="http://localhost:3000")
retriever = LangChainRetriever(
adapter = adapter,
embed_fn = lambda t: embeddings.embed_query(t),
k = 5,
)
docs = retriever.get_relevant_documents("deterministic vector search")
LlamaIndex
pip install "valoricore[llamaindex]"
from llama_index.core import VectorStoreIndex, StorageContext
from valoricore.adapters import ValoricoreAdapter, LlamaIndexVectorStore
adapter = ValoricoreAdapter(base_url="http://localhost:3000")
vector_store = LlamaIndexVectorStore(adapter=adapter)
storage_ctx = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(documents, storage_context=storage_ctx)
query_engine = index.as_query_engine()
response = query_engine.query("What is Valoricore?")
print(response)
๐ Error Handling
from valoricore import (
MemoryClient,
ValoricoreError, # base โ catch all SDK errors
ValidationError, # bad vector dimension / FXP out-of-range
ConnectionError, # remote node unreachable
IntegrityError, # BLAKE3 proof mismatch
NotFoundError, # record / node / edge doesn't exist
KernelError, # unrecoverable Rust kernel error
)
client = MemoryClient(path="./db")
try:
client.delete(record_id=9999)
except NotFoundError:
print("Record does not exist โ safe to ignore")
try:
client.upsert_vector([0.1] * 128) # wrong dimension
except ValidationError as e:
print(f"Bad embedding: {e}")
try:
remote = MemoryClient(remote="http://offline-node:3000")
remote.snapshot()
except ConnectionError as e:
print(f"Node unreachable: {e}")
๐ Performance Characteristics
Valoricore enforces deterministic L2 brute-force scanning to guarantee auditability.
| Operation | Local FFI | Remote HTTP |
|---|---|---|
| Single insert | ~20 ยตs | ~0.5 ms |
| Batch insert (1k vectors) | ~15 ms | ~50 ms |
| L2 search (10kร384) | ~8 ms | ~10 ms |
| L2 search (100kร384) | ~80 ms | ~90 ms |
| Graph BFS (depth 2, 50 nodes) | ~0.5 ms | ~2 ms |
| State hash (BLAKE3) | <1 ยตs | ~1 ms |
| Snapshot (10k records) | ~5 ms | ~20 ms |
Benchmarked on Apple M2. The local FFI path calls Rust directly with zero serialization overhead.
[!NOTE] Valoricore uses Q16.16 fixed-point arithmetic. Safe input range for embedding values is [-32767.0, 32767.0]. Standard normalized embeddings (OpenAI, SentenceTransformers) are always in [-1.0, 1.0] and are therefore safe.
โ๏ธ Configuration Reference
MemoryClient / AsyncMemoryClient
| Parameter | Type | Default | Description |
|---|---|---|---|
path |
str |
"./valori_db" |
Local database directory |
remote |
str | None |
None |
Remote node URL. When set, path is ignored |
index_kind |
str |
"bruteforce" |
Future: "hnsw" / "ivf" |
quantization |
str |
"none" |
Future: "int8" / "binary" |
Valoricore / AsyncValoricore (factories)
from valoricore import Valoricore, AsyncValoricore
db = Valoricore(path="./db") # local
db = Valoricore(remote="http://node:3000") # remote
async_db = AsyncValoricore(path="./db") # local async
async_db = AsyncValoricore(remote="http://node:3000") # remote async
๐ Forensic CLI
The valori CLI lets you inspect the append-only event log and reproduce the exact state of any historical snapshot.
# Install CLI (included with the package)
pip install valoricore
# Deep forensic inspection
valori inspect --dir ./my_valori_db --snapshot-path ./my_valori_db/state.snap
# View chronological event timeline
valori timeline ./my_valori_db/events.log
# Verify a snapshot's state hash
valori verify --snapshot ./my_valori_db/state.snap --expected-hash <64-char-hex>
๐ Project Structure
valoricore/
โโโ __init__.py # Public API surface
โโโ embeddings.py # ๐ Embedding provider adapters
โโโ factory.py # Valoricore() / AsyncValoricore() factories
โโโ local.py # LocalClient (FFI)
โโโ remote.py # SyncRemoteClient / AsyncRemoteClient
โโโ memory.py # MemoryClient (high-level)
โโโ async_memory.py # AsyncMemoryClient (full async mirror)
โโโ protocol.py # ProtocolClient (unified local+remote)
โโโ adapter.py # ValoricoreAdapter (proof overlay)
โโโ chunking.py # Deterministic text chunkers
โโโ ingest.py # File loaders (.txt, .md, .pdf)
โโโ kinds.py # Node / Edge kind constants
โโโ types.py # Type aliases (Vector, Proof, etc.)
โโโ exceptions.py # Exception hierarchy
โโโ utils.py # Internal helpers
โโโ adapters/ # Framework adapters (optional)
โโโ base.py # ValoricoreAdapter (retry + validation)
โโโ langchain.py # LangChain Retriever
โโโ langchain_vectorstore.py # LangChain VectorStore
โโโ llamaindex.py # LlamaIndex VectorStore
โโโ sentence_transformers_adapter.py
๐ Documentation
| Resource | Description |
|---|---|
| Getting Started Guide | First 5 minutes walkthrough |
| API Reference | Complete method signatures and return types |
| Architecture | Rust kernel internals and design decisions |
๐ค Contributing
# Clone and install for development
git clone https://github.com/varshith-Git/Valori-Kernel
cd Valori-Kernel/python
pip install -e ".[dev]"
# Build the Rust FFI extension
cd ..
maturin develop
# Run tests
pytest python/tests/ -v
๐ License
Licensed under the GNU Affero General Public License v3.0 (AGPL-3.0).
See LICENSE for details.
Integrity-First AI Infrastructure
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file valoricore-0.1.2.tar.gz.
File metadata
- Download URL: valoricore-0.1.2.tar.gz
- Upload date:
- Size: 1.5 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.10.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bc6c5d466c33be04f131ef5ad32ba80fe421ad277e10df237744015003978144
|
|
| MD5 |
f4c17269acf4b12071463d54fc902ddb
|
|
| BLAKE2b-256 |
3e6799ee21f95e3b61a0f4e09839226bfec24a84c830772ccd91990d5100905e
|
File details
Details for the file valoricore-0.1.2-cp39-abi3-macosx_11_0_arm64.whl.
File metadata
- Download URL: valoricore-0.1.2-cp39-abi3-macosx_11_0_arm64.whl
- Upload date:
- Size: 529.7 kB
- Tags: CPython 3.9+, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.10.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fd429051d814fca2d722afd757ee2bfd062008fe71dae2d3386ee39d6d0d30f3
|
|
| MD5 |
3441d112ce314c3c03633b6ec6bc6de3
|
|
| BLAKE2b-256 |
5fa78e2e9439a07f578759b0dad791f6225f9ffb52525d38ac712eb69f530e05
|