Skip to main content

VSAR: VSA-grounded reasoning with approximate joins

Project description

VSAR: VSA-grounded Reasoning

VSAR (VSAX Reasoner) is a VSA-grounded reasoning system that provides fast approximate querying over large knowledge bases using hypervector algebra. Built on the VSAX library for GPU-accelerated VSA operations.

Key Features

  • Fast approximate querying: Query 10^6+ facts with subsymbolic retrieval
  • VSARL language: Declarative syntax for facts, queries, and rules
  • CLI interface: Simple commands for ingestion, querying, and export
  • Multiple formats: Load facts from CSV, JSONL, or VSAR files
  • Trace layer: Explanation DAG for debugging and transparency
  • Deterministic results: Reproducible outputs with fixed seeds
  • HDF5 persistence: Save and load knowledge bases
  • Comprehensive testing: 281 tests with 98.5% coverage

Quick Start

Installation

# Install uv (recommended)
pip install uv

# Clone and install
git clone https://github.com/your-org/vsar.git
cd vsar
uv sync

# Verify installation
uv run vsar --help

Hello World - CLI

Create a simple VSAR program family.vsar:

@model FHRR(dim=8192, seed=42);
@threshold(value=0.22);

fact parent(alice, bob).
fact parent(alice, carol).
fact parent(bob, dave).
fact parent(carol, eve).

query parent(alice, X)?
query parent(X, dave)?

Run it:

uv run vsar run family.vsar

Output:

Inserted 4 facts

┌─────────────────────────┐
│ Query: parent(alice, X) │
├────────┬────────────────┤
│ Entity │ Score          │
├────────┼────────────────┤
│ bob    │ 0.9234         │
│ carol  │ 0.9156         │
└────────┴────────────────┘

CLI Commands

Ingest Facts

# From CSV (predicate in first column)
uv run vsar ingest facts.csv --kb family.h5

# From CSV (all rows same predicate)
uv run vsar ingest parents.csv --predicate parent --kb family.h5

# From JSONL
uv run vsar ingest facts.jsonl --kb family.h5

Query and Export

# Export KB to JSON
uv run vsar export family.h5 --format json --output facts.json

# Export to JSONL
uv run vsar export family.h5 --format jsonl --output facts.jsonl

# Inspect KB statistics
uv run vsar inspect family.h5

Advanced Options

# JSON output for scripting
uv run vsar run program.vsar --json

# Show trace DAG
uv run vsar run program.vsar --trace

# Limit results per query
uv run vsar run program.vsar --k 10

VSARL Language

Directives

Configure the reasoning engine:

// Model configuration
@model FHRR(dim=8192, seed=42);    // FHRR backend, 8192 dimensions
@model MAP(dim=4096, seed=100);     // MAP backend (alternative)

// Retrieval parameters
@threshold(value=0.22);             // Similarity threshold
@beam(width=50);                    // Beam width (Phase 2)

Facts

Ground atoms (all arguments are constants):

fact parent(alice, bob).
fact parent(bob, carol).
fact lives_in(alice, boston).
fact transfer(alice, bob, money).   // Ternary fact
fact person(alice).                  // Unary fact

Queries

Single-atom queries with one variable (Phase 1):

query parent(alice, X)?         // Find children of alice
query parent(X, carol)?         // Find parents of carol
query lives_in(X, boston)?      // Who lives in boston?
query transfer(alice, X, money)? // Alice transferred money to X

Phase 1 Limitation: Only single-variable, single-atom queries supported. Conjunctive queries coming in Phase 2.

Comments

// Single-line comment

/* Multi-line
   comment */

File Formats

CSV Format

With predicate column (first column = predicate):

parent,alice,bob
parent,bob,carol
lives_in,alice,boston

Without predicate (use --predicate flag):

alice,bob
bob,carol

JSONL Format

One fact per line:

{"predicate": "parent", "args": ["alice", "bob"]}
{"predicate": "parent", "args": ["bob", "carol"]}
{"predicate": "lives_in", "args": ["alice", "boston"]}

VSAR Format

Native .vsar program files (see VSARL Language above).

Python API

High-Level API (Recommended)

from vsar.language.ast import Directive, Fact, Query
from vsar.language.loader import load_facts
from vsar.semantics.engine import VSAREngine

# Create engine from directives
directives = [
    Directive(name="model", params={"type": "FHRR", "dim": 512, "seed": 42})
]
engine = VSAREngine(directives)

# Load and insert facts
facts = load_facts("facts.csv")
for fact in facts:
    engine.insert_fact(fact)

# Execute query
query = Query(predicate="parent", args=["alice", None])
result = engine.query(query, k=5)

for entity, score in result.results:
    print(f"{entity}: {score:.4f}")

# Inspect trace
trace = engine.trace.get_dag()
for event in trace:
    print(f"{event.type}: {event.payload}")

# Save KB
engine.save_kb("family.h5")

Low-Level API (Phase 0 Foundation)

from vsar.kernel.vsa_backend import FHRRBackend
from vsar.symbols.registry import SymbolRegistry
from vsar.encoding.vsa_encoder import VSAEncoder
from vsar.encoding.roles import RoleVectorManager
from vsar.kb.store import KnowledgeBase
from vsar.retrieval.query import Retriever

# Create VSA system
backend = FHRRBackend(dim=512, seed=42)
registry = SymbolRegistry(backend, seed=42)
encoder = VSAEncoder(backend, registry, seed=42)
kb = KnowledgeBase(backend)
role_manager = RoleVectorManager(backend, seed=42)
retriever = Retriever(backend, registry, kb, encoder, role_manager)

# Insert facts
atom_vec = encoder.encode_atom("parent", ["alice", "bob"])
kb.insert("parent", atom_vec, ("alice", "bob"))

# Query: parent(alice, X)
results = retriever.retrieve("parent", 2, {"1": "alice"}, k=5)
print(results)  # [('bob', 0.85), ...]

Architecture

VSAR uses a layered architecture:

Phase 0 Layers (Foundation)

  • Kernel (vsar.kernel): VSA operations (FHRR/MAP backends via VSAX)
  • Symbols (vsar.symbols): Typed symbol spaces (E, R, A, C, T, S) with basis management
  • Encoding (vsar.encoding): Role-filler binding for atoms (predicate + arguments)
  • KB (vsar.kb): Predicate-partitioned storage with HDF5 persistence
  • Retrieval (vsar.retrieval): Unbinding, cleanup, top-k similarity search

Phase 1 Layers (Language & CLI)

  • Language (vsar.language): VSARL parser (Lark), AST, loaders (CSV/JSONL/VSAR)
  • Semantics (vsar.semantics): VSAREngine orchestrating all layers
  • Trace (vsar.trace): Explanation DAG for transparency
  • CLI (vsar.cli): Typer-based commands with Rich formatting

See docs/architecture.md for complete details.

Project Status

✅ Phase 0 (Foundation) - COMPLETE

  • ✅ Kernel backend (FHRR VSA via VSAX)
  • ✅ Symbol space management (6 typed spaces)
  • ✅ Atom encoding (role-filler binding)
  • ✅ KB storage (predicate-partitioned bundles)
  • ✅ Retrieval primitive (unbind → cleanup)
  • ✅ HDF5 persistence (KB + basis)
  • ✅ Published to PyPI (v0.1.0)

✅ Phase 1 (Language & CLI) - COMPLETE

  • ✅ VSARL parser (facts, queries, directives)
  • ✅ Facts ingestion (CSV/JSONL/VSAR)
  • ✅ Program execution engine
  • ✅ Trace layer (explanation DAG)
  • ✅ CLI interface (run, ingest, export, inspect)
  • ✅ 281 tests, 98.5% coverage

🔜 Phase 2 (Rules & Chaining)

  • Rule definitions (rule grandparent(X,Z) :- parent(X,Y), parent(Y,Z).)
  • Bounded forward chaining
  • Conjunctive queries
  • Stratified negation

🔜 Phase 3 (Optimizations)

  • Indexing strategies
  • Query planning
  • Parallel execution
  • Web interface

Examples

Example 1: Family Tree

@model FHRR(dim=8192, seed=42);

fact parent(alice, bob).
fact parent(bob, carol).
fact parent(carol, dave).

query parent(alice, X)?     // Returns: bob (0.92)
query parent(X, carol)?     // Returns: bob (0.88)

Example 2: Knowledge Graph

@model FHRR(dim=8192, seed=42);
@threshold(value=0.25);

fact lives_in(alice, boston).
fact lives_in(bob, cambridge).
fact works_at(alice, mit).
fact works_at(bob, harvard).

query lives_in(X, boston)?    // Returns: alice
query works_at(alice, X)?     // Returns: mit

Example 3: Large-Scale Ingestion

# Ingest 1M facts from CSV
uv run vsar ingest large_dataset.csv \\
  --kb large.h5 \\
  --dim 8192 \\
  --seed 42

# Query the KB
uv run vsar run queries.vsar --kb large.h5 --k 10

Performance

Approximate query performance (Phase 1):

  • 10^3 facts: <100ms per query
  • 10^4 facts: <200ms per query
  • 10^5 facts: <500ms per query
  • 10^6 facts: <1s per query

Measured on AMD EPYC 7742 CPU with dim=8192

Testing

VSAR has comprehensive test coverage:

# Run all tests
uv run pytest

# Run with coverage
uv run pytest --cov=vsar --cov-report=html

# Run specific suites
uv run pytest tests/unit/           # Unit tests
uv run pytest tests/integration/    # Integration tests

Test statistics:

  • 281 tests (all passing)
  • 98.5% coverage
  • Unit tests: 261
  • Integration tests: 20

Development

# Install development dependencies
uv sync --all-groups

# Run formatters
uv run black .
uv run ruff check . --fix

# Type checking
uv run mypy src/vsar

# Pre-commit hooks
uv run pre-commit install
uv run pre-commit run --all-files

# Build documentation
cd docs && uv run mkdocs serve

Documentation

Contributing

Contributions welcome! See CONTRIBUTING.md for guidelines.

Citation

If you use VSAR in your research, please cite:

@software{vsar2025,
  title = {VSAR: VSA-grounded Reasoning},
  author = {VSAR Contributors},
  year = {2025},
  url = {https://github.com/your-org/vsar}
}

License

MIT License - see LICENSE for details.

Acknowledgments

  • Built on VSAX for VSA operations
  • Inspired by Datalog and logic programming systems
  • Uses Lark for parsing
  • CLI powered by Typer and Rich

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vsar-0.2.0.tar.gz (35.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vsar-0.2.0-py3-none-any.whl (41.3 kB view details)

Uploaded Python 3

File details

Details for the file vsar-0.2.0.tar.gz.

File metadata

  • Download URL: vsar-0.2.0.tar.gz
  • Upload date:
  • Size: 35.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for vsar-0.2.0.tar.gz
Algorithm Hash digest
SHA256 391e6f6e952ce8b859134ef106fb5cb138c242912cb8608daed1e69f0ad263c3
MD5 c13e7c9c3f510f82005515b9222d9427
BLAKE2b-256 533c8486cd5c96e4f74fcc4f51f29007e35b9fba882838f52c134dc3c7a4f4b3

See more details on using hashes here.

Provenance

The following attestation bundles were made for vsar-0.2.0.tar.gz:

Publisher: publish.yml on vasanthsarathy/vsar

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file vsar-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: vsar-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 41.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for vsar-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7b0a5d97e550fea30e0f61fa21a5810c51e602952adc0995371182e061d5667c
MD5 5d49e403f9c91b4b9b769f1517a5efbe
BLAKE2b-256 30132b1e35520fd3a07963d8d09556e61f0c4eb5d739dc30b7725b896c8dbdf5

See more details on using hashes here.

Provenance

The following attestation bundles were made for vsar-0.2.0-py3-none-any.whl:

Publisher: publish.yml on vasanthsarathy/vsar

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page