Local document memory with instant semantic search. Drop any file. Ask anything. Get an answer in under a second.
Project description
vstash
Local document memory with hybrid retrieval. Single SQLite file. Zero cloud dependencies for search. Beats ColBERTv2 on 5/5 BEIR datasets with the tuned bge-small-rrf-v3 model. Under 60 ms p50 at 50K chunks on an Apple Silicon laptop.
pip install vstash
vstash add paper.pdf notes.md https://example.com/article
vstash search "what's the main argument?"
Retrieval Quality
| Dataset | Docs | vstash (v3) | ColBERTv2 | BM25 | Δ vs ColBERTv2 |
|---|---|---|---|---|---|
| SciFact | 5.2K | 0.9361 | 0.693 | 0.665 | +0.243 |
| NFCorpus | 3.6K | 0.3927 | 0.344 | 0.325 | +0.049 |
| SciDocs | 25.7K | 0.3693 | 0.154 | 0.158 | +0.215 |
| FiQA | 57.6K | 0.7506 | 0.356 | 0.236 | +0.395 |
| ArguAna | 8.7K | 0.7540 | 0.463 | 0.315 | +0.291 |
Absolute NDCG@10 on BEIR via the full production retrieval pipeline (RRF hybrid + adaptive weights + MMR dedup + IDF, 2026-04-19). Tuned model: Stffens/bge-small-rrf-v3 (33M params, 384d). v3 beats ColBERTv2 on 5/5 BEIR datasets and improves macro NDCG@10 by +0.016 absolute over bge-small-rrf-v2 (0.6405 vs 0.6246). Training-time eval uses a batched path that skips MMR/IDF for speed; absolute NDCG@10 differs by a few percent vs the production numbers above, but baseline-vs-final deltas are preserved. See experiments/results/v2_v3_head_to_head.json for the full table (reproduce via python -m experiments.v2_v3_head_to_head) and the methodological note in experiments/hypotheses.md for the pipeline-shift caveat.
How It Works
Query --> Embed --+--> Vector ANN (sqlite-vec) --+
| +--> Adaptive RRF --> MMR Dedup --> Results
+--> FTS5 BM25 ----------------+
- Hybrid search: vector + keyword, fused via Reciprocal Rank Fusion.
- Adaptive RRF: IDF-based per-query weights. Rare terms boost keywords, common terms boost vectors.
- MMR dedup: diverse sections from long documents, not redundant chunks from one.
- Self-tuned, gated:
vstash retrainfine-tunes embeddings from your own disagreement signal; the eval gate refuses regressions.
Install
pip install vstash # SDK + search
pip install 'vstash[ingest]' # + PDF, DOCX, PPTX parsing
pip install 'vstash[serve]' # + web UI (vstash serve)
pip install 'vstash[all]' # everything
Usage
# Ingest: files, folders, URLs
vstash add report.pdf ~/notes/ https://arxiv.org/abs/2310.06825
# Search: local, no API key
vstash search "what is the proposed method?"
# Ask: needs a local LLM, auto-detects Ollama / LM Studio
vstash ask "summarize the key findings"
vstash chat # interactive
# Fine-tune on your own corpus (eval-gated, refuses regressions)
vstash retrain
vstash reindex --model ~/.vstash/models/retrained
Python SDK
from vstash import Memory
mem = Memory(project="my_agent")
mem.add("docs/spec.pdf")
mem.remember("OAuth uses PKCE for public clients", title="auth-notes")
results = mem.search("deployment strategy", top_k=5)
for r in results:
print(r.text, r.score, r.collection, r.tags, r.added_at)
answer = mem.ask("What are the system requirements?")
Commands
vstash add <file/dir/url> Add documents to memory
vstash remember "<text>" Ingest text directly
vstash search "<query>" Semantic search (free, local)
vstash ask "<question>" Answer from your documents (needs LLM)
vstash chat Interactive Q&A
vstash list Show all documents
vstash stats Memory statistics
vstash forget <file> Remove a document
vstash retrain Fine-tune embeddings on your data
vstash reindex Re-embed with a new model
vstash watch <dir> Auto-ingest on file changes
vstash serve Web UI on localhost
vstash check [--repair] Integrity check and repair
vstash config Show configuration
vstash profile <cmd> Manage named profiles
vstash journal <cmd> Cross-session agent memory
MCP Server
16 tools for Claude Desktop, Claude Code, Cursor, or any MCP client:
vstash-mcp # start MCP server
{
"mcpServers": {
"vstash": {
"command": "vstash-mcp"
}
}
}
Self-Supervised Embedding Refinement
vstash can tune its own embedding model to your corpus, without any human labels.
vstash retrain # generate training pairs + fine-tune
vstash reindex --model ~/.vstash/models/retrained
How it works, in one paragraph. When you search your corpus, the vector and keyword halves of the pipeline sometimes rank different documents at the top. Those disagreements are a free signal: the document each half picked is probably relevant, the one only one half picked might not be. vstash turns this into training pairs and fine-tunes the embedding model on them. The run is eval-gated: it evaluates the candidate against the base model on a held-out slice of your corpus and refuses to save a model that performs worse.
The feature is maturing fast. Each release tightens the recipe, lifts the measured numbers, and adds infrastructure that keeps the next iteration honest:
| Release | Training recipe | 5-dataset BEIR macro NDCG@10 | What landed alongside |
|---|---|---|---|
base bge-small |
no fine-tune | 0.6118 | reference |
rrf-v2 |
76k triples, ad-hoc scripts | 0.6246 | first paper-grade result; still the NFCorpus specialist |
rrf-v3 |
60k triples via retrain-multi CLI, temperature=0.5, eval gate |
0.6405 | H-R9 ablation picked the config empirically; H-R7 seeded RNGs make it reproducible; H-R5 reports NDCG@3 + Recall@100 so regressions are visible before they ship |
Both v2 and v3 beat ColBERTv2 on 5/5 BEIR datasets under the current pipeline. v3 improves macro by +0.016 over v2 (+2.6% relative), with the largest per-dataset gain on FiQA (+0.097 absolute). It is a trade, not a strict upgrade: v3 gives up ~0.040 NDCG@10 on NFCorpus vs v2 in exchange for the FiQA and SciFact wins. v2 remains the better pick for keyword-heavy / biomedical corpora where NFCorpus-style retrieval dominates; v3 is the recommended default for everything else. The eval gate also catches losers: hypothesis H-R3 (hard-negative margin filter) regressed macro -2.49pp, the candidate was refused, the branch was closed without merging. The pipeline's job is to refuse bad models, and it does.
See the Retrieval Quality table, docs/retrain.md for the full recipe and per-version breakdown, and experiments/results/v2_v3_head_to_head.json for reproducible numbers.
Requires sentence-transformers, torch, and accelerate:
pip install 'sentence-transformers>=3' torch 'accelerate>=1.1.0'
Privacy
| Component | Data leaves machine? |
|---|---|
| Embeddings (FastEmbed) | Never |
| Search (sqlite-vec + FTS5) | Never |
| Inference (Ollama/LM Studio) | Never |
| Inference (Cerebras/OpenAI) | Yes (query + context sent to API) |
Search is always private. Use a local LLM for fully private answers.
Paper
vstash: Local-First Hybrid Retrieval with Adaptive Fusion for LLM Agents
Adaptive RRF, self-supervised embedding refinement, a negative result on post-RRF scoring, and the production substrate all in one place. PDF build at paper/arxiv/vstash.pdf.
Documentation
| Guide | Description |
|---|---|
| How It Works | Search pipeline, chunking, RRF |
| Configuration | Full TOML reference |
| Embedding Models | Model comparison, vstash retrain |
| MCP Server | 16 tools for LLM agents |
| Experiments | BEIR benchmarks, ablations |
Experiments
| Experiment | Key Result | Command |
|---|---|---|
| BEIR Benchmark | With bge-small-rrf-v3 (current default): 5/5 BEIR datasets beat ColBERTv2. With -rrf-v2 (previous): 4/5 under this script's historical pipeline. See Retrieval Quality for the v3 numbers. |
python -m experiments.beir_benchmark --no-chroma |
| Retrain (eval-gated) | Fine-tune your embedding model on your own corpus, refuses regressions | vstash retrain --help |
| Pipeline latency | Under 60 ms p50 @ 50K, 0.80x with snapvec-ivfpq @ 100K (Apple Silicon laptop) | python -m experiments.vstash_pipeline_ivfpq_bench --n 100000 |
| Relevance Signal | F1=0.996 cross-domain | python -m experiments.relevance_signal_beir |
What's New in v0.32
- Persistent embedder daemon (v0.32) —
vstash servepre-loads the embedding model and exposes/api/embedonlocalhost:8585. CLI and SDK clients auto-detect and delegate; cold start drops from ~2 s to ~5 ms. - Query LRU cache (v0.31) — opt-in repeated-query cache via
[cache] query_cache_size. Roughly 700x on cache hits, automatically invalidated on writes. - Batched directory ingest (v0.31) — single-transaction writes with deferred FTS. 5x faster at 500 docs versus per-file ingest.
snapvec-ivfpqvector backend (v0.30) — IVFPQ with fp16 rerank. Pareto-dominant over sqlite-vec at N >= 50K: 0.80x mean latency at 100K, NDCG within noise.
See CHANGELOG for full version history.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file vstash-0.34.0.tar.gz.
File metadata
- Download URL: vstash-0.34.0.tar.gz
- Upload date:
- Size: 662.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6fa3036849a7db1dd830879e5a478e7d795b1417dd35106fca1ef9fc9eec0087
|
|
| MD5 |
f6864c8265dc592ca311f3f931fe1fca
|
|
| BLAKE2b-256 |
f5de12d89ff36c76aee391825103433d90b2501bb337296def00167e2a8cce00
|
File details
Details for the file vstash-0.34.0-py3-none-any.whl.
File metadata
- Download URL: vstash-0.34.0-py3-none-any.whl
- Upload date:
- Size: 215.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
083a2db8eaf0b16c597527224878d734e7a71ff7c977a42952d065844684c323
|
|
| MD5 |
6bc8423912c976cd8ceba3991fa82168
|
|
| BLAKE2b-256 |
18e33363a5742d8ed2f4536ba506c0194d5448f58d2a6cf65f4cd7597f721978
|