Local document memory with instant semantic search. Drop any file. Ask anything. Get an answer in under a second.

These details have not been verified by PyPI

Project description

vstash

Local document memory with instant semantic search.

vstash demo

Drop any file. Ask anything. Get an answer fast.

pip install vstash
vstash add paper.pdf notes.md https://example.com/article
vstash search "what's the main argument about X?"

Why vstash?

Most RAG tools are slow, cloud-dependent, or require a running server. vstash is none of those things.

Layer	Technology	Why
Embeddings	FastEmbed (ONNX Runtime)	~700 chunks/s, fully in-process, no server
Vector store	sqlite-vec	Single `.db` file, cosine similarity, zero deps
Keyword search	FTS5 (SQLite)	Exact matches, porter stemming, built into SQLite
Hybrid ranking	Reciprocal Rank Fusion	Best of both: semantic + keyword, no training needed
Memory scoring	Frequency + temporal decay	Surfaces frequently-accessed, recent chunks
Chunking	Semantic-first	Markdown headers & paragraphs, with token-bounded fallback
Inference	Cerebras / Ollama / OpenAI	~2,000 tok/s via Cerebras, or 100% local via Ollama
Parsing	markitdown	PDF, DOCX, PPTX, XLSX, HTML, Markdown, URLs

Philosophy: extreme speed at every layer. Zero cloud required for search.

Install

pip install vstash

Or from source:

git clone https://github.com/stffns/vstash
cd vstash
pip install -e .

Quick start

Search (free, no API key needed)

Semantic search works 100% locally — no inference backend required:

vstash add report.pdf
vstash add ~/docs/notes.md
vstash add https://arxiv.org/abs/2310.06825
vstash search "what is the proposed method?"

Ask (requires an LLM backend)

To get natural language answers, configure an inference backend:

# Option A: Fully local with Ollama (free, private)
# Install Ollama: https://ollama.com
ollama pull llama3.2

# Option B: Fast with Cerebras (free tier available)
export CEREBRAS_API_KEY=your_key_here

# Option C: OpenAI or any compatible API
export OPENAI_API_KEY=your_key_here

Then:

vstash ask "summarize the key findings"
vstash chat   # interactive Q&A session

Python SDK

New in v0.3.0. Use vstash as a building block in your own agents and pipelines:

from vstash import Memory

mem = Memory(project="my_agent")
mem.add("docs/spec.pdf")

# Semantic search — free, no LLM
chunks = mem.search("deployment strategy", top_k=5)
for c in chunks:
    print(c.text, c.score)

# Search + LLM answer
answer = mem.ask("What are the system requirements?")

# Management
mem.list()                # → list[DocumentInfo]
mem.stats()               # → StoreStats
mem.remove("docs/old.pdf")

The Memory class supports project/collection scoping, context managers, and works with any configured inference backend. See the full API in vstash/memory.py.

LangChain Integration

New in v0.4.0. Use vstash as a retriever in any LangChain chain or agent:

pip install vstash[langchain]

from vstash import Memory
from vstash.langchain import VstashRetriever
from langchain_openai import ChatOpenAI
from langchain.chains import RetrievalQA

mem = Memory(project="my_docs")
mem.add("report.pdf")

retriever = VstashRetriever(memory=mem, top_k=5)
chain = RetrievalQA.from_chain_type(
    llm=ChatOpenAI(),
    retriever=retriever,
)
answer = chain.invoke("What are the key findings?")

The retriever uses vstash's hybrid search (vector + keyword RRF) and returns standard LangChain Document objects with metadata (source, title, score). Compatible with LangSmith tracing automatically.

Supports filtering: VstashRetriever(memory=mem, project="alpha", collection="research", layer="summaries").

Commands

vstash add <file/dir/url>   Add documents to memory
vstash ask "<question>"     Answer a question from your documents
vstash search "<query>"     Semantic search without LLM (free, local)
vstash chat                 Interactive Q&A session
vstash list                 Show all documents in memory
vstash stats                Memory statistics (docs, chunks, DB size)
vstash forget <file>        Remove a document from memory
vstash watch <dir>          Auto-ingest on file changes
vstash export               Export chunks as JSONL for training data curation
vstash config               Show current configuration
vstash-mcp                  Start MCP server (for Claude Desktop integration)

Filtering with metadata

vstash supports hierarchical metadata via frontmatter or CLI flags:

vstash add notes.md --collection research --project ml-survey --tags "attention,transformers"
vstash list --project ml-survey
vstash ask "what architectures were compared?" --project ml-survey
vstash export --project ml-survey --format jsonl

Documents with YAML frontmatter are parsed automatically:

---
project: ml-survey
layer: literature-review
tags: [attention, transformers]
---

# My Research Notes
...

MCP Server — Claude Desktop Integration

vstash includes a built-in MCP server that gives Claude Desktop persistent document memory across sessions.

Setup

1. Install vstash:

pip install vstash

2. Add to Claude Desktop config (~/Library/Application Support/Claude/claude_desktop_config.json):

{
  "mcpServers": {
    "vstash": {
      "command": "vstash-mcp"
    }
  }
}

If using pyenv, use the full path: "command": "/path/to/.pyenv/versions/3.x.x/bin/vstash-mcp"

3. Restart Claude Desktop.

Available MCP Tools

Tool	Description
`vstash_add(path)`	Ingest a file, directory, or URL into memory
`vstash_ask(query, top_k)`	Semantic search + LLM-generated answer with sources
`vstash_search(query, top_k)`	Raw retrieval without LLM — returns chunks with scores
`vstash_list()`	List all ingested documents
`vstash_stats()`	Database statistics (doc count, chunks, size)
`vstash_forget(source)`	Remove a document from memory

Make sure your ~/.vstash/vstash.toml includes the API key under [cerebras] (or your chosen backend), since MCP servers don't inherit shell environment variables.

Configuration

vstash looks for vstash.toml in your current directory, then ~/.vstash/vstash.toml, then falls back to defaults.

[inference]
backend = "cerebras"       # "cerebras" | "ollama" | "openai"
model   = "llama3.1-8b"

[cerebras]
api_key = ""               # or set CEREBRAS_API_KEY env var

[ollama]
host  = "http://localhost:11434"
model = "llama3.2"

[embeddings]
model = "BAAI/bge-small-en-v1.5"   # 384 dims, ~700 chunks/s

[chunking]
size    = 1024    # max tokens per chunk
overlap = 128     # token overlap (used in fixed-window fallback)
top_k   = 5       # chunks retrieved per query

[scoring]
enabled = true        # frequency + decay re-ranking (default: on)
alpha = 0.8           # RRF weight
beta = 0.2            # access history weight
decay_lambda = 0.05   # temporal decay rate
over_fetch = 50       # candidates to re-rank

[storage]
db_path = "~/.vstash/memory.db"

Embedding models

Model	Dims	Speed	Quality
`BAAI/bge-small-en-v1.5`	384	~700 chunks/s	Great
`BAAI/bge-base-en-v1.5`	768	~300 chunks/s	Excellent
`nomic-ai/nomic-embed-text-v1.5`	768	~300 chunks/s	Excellent

Changing the embedding model requires re-ingesting all documents (dimensions must match).

How it works

Ingestion pipeline

file/URL
  → markitdown         (parse to plain text)
  → _split_by_headers  (Markdown sections)
  → _split_by_paragraphs (paragraph boundaries)
  → _fixed_window      (fallback for oversized paragraphs)
  → _merge_small       (merge tiny chunks < 80 tokens)
  → FastEmbed ONNX     (embed each chunk, ~700 chunks/s)
  → sqlite-vec         (store vectors)
  → FTS5               (index text for keyword search)

Semantic chunking preserves document structure: Markdown headers stay with their body content, paragraphs aren't torn mid-sentence, and tiny fragments are merged to avoid low-quality embeddings.

Search pipeline

query
  → FastEmbed ONNX     (embed query)
  → sqlite-vec         (top-k×10 vector candidates by cosine similarity)
  → FTS5               (top-k×10 keyword candidates by BM25)
  → RRF                (merge rankings: score = Σ 1/(60+rank))
  → rerank_with_decay  (frequency + temporal decay re-ranking)
  → top-k results      (default: 5 chunks)
  → LLM                (optional: Cerebras, Ollama, or OpenAI)

Reciprocal Rank Fusion (k=60, vec_weight=0.6, fts_weight=0.4) ensures that semantic queries find conceptually related chunks while exact keyword queries are never missed.

Memory scoring

New in v0.5.0. vstash learns which chunks matter to you. Every search tracks access frequency and recency, then re-ranks results using:

final_score = α · normalized_rrf + β · log(1 + access_count · e^(−λ · days_ago))

Chunks you access often and recently get a boost. Chunks you haven't touched in months decay naturally. The scoring adds ~0.12ms to a ~0.7ms pipeline — negligible overhead.

Parameter	Default	Description
`alpha`	0.8	Weight for semantic similarity (RRF)
`beta`	0.2	Weight for access history
`decay_lambda`	0.05	Temporal decay rate (higher = faster forgetting)
`over_fetch`	50	Candidates to re-rank before truncating to top_k

Configure in vstash.toml:

[scoring]
enabled = true
alpha = 0.8
beta = 0.2
decay_lambda = 0.05
over_fetch = 50
track_access = true

Disable scoring entirely with enabled = false — search reverts to pure RRF.

Privacy

Component	Data leaves machine?
Embeddings (FastEmbed)	Never — fully local ONNX
Vector store (sqlite-vec)	Never — local `.db` file
Semantic search	Never — local embeddings + SQLite
Inference (Cerebras/OpenAI)	Yes — query + retrieved chunks sent to API
Inference (Ollama)	Never — fully local

For full privacy, use backend = "ollama" or skip inference entirely and use vstash search instead of vstash ask.

Supported file types

PDF, DOCX, PPTX, XLSX, Markdown, TXT, HTML, CSV, Python, JavaScript, TypeScript, Go, Rust, Java — and any URL.

Roadmap

Phase 1 ✅: Core — ingest, embed, hybrid search, answer
Phase 2 ✅: Usability — MCP server, collections/namespaces, watch mode, frontmatter metadata, export, semantic chunking
Phase 3 ✅: Python SDK — from vstash import Memory
Phase 4 ✅: LangChain integration — VstashRetriever for chains and agents
Phase 5 ✅: Memory scoring — frequency + temporal decay re-ranking
Phase 6: Sync — cr-sqlite CRDT peer-to-peer sync, multiple profiles, REST API

Easter Egg

In a 2018 Cornell paper "Local Homology of Word Embeddings", researchers used the variable v_stash (p. 11) to refer to the "vector of the word stash" — making this the first documented use of the exact term in the context of AI/embeddings.

License

MIT

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.36.0

Apr 30, 2026

0.35.0

Apr 27, 2026

0.34.0

Apr 24, 2026

0.33.0

Apr 23, 2026

0.32.0

Apr 16, 2026

0.31.0

Apr 16, 2026

0.30.0

Apr 15, 2026

0.29.0

Apr 14, 2026

0.28.0

Apr 10, 2026

0.27.0

Apr 9, 2026

0.26.0

Apr 8, 2026

0.25.1

Apr 7, 2026

0.25.0

Apr 7, 2026

0.24.1

Apr 7, 2026

0.24.0

Apr 7, 2026

0.23.0

Apr 7, 2026

0.22.0

Apr 7, 2026

0.21.0

Apr 7, 2026

0.20.2

Apr 7, 2026

0.20.1

Apr 7, 2026

0.20.0

Apr 6, 2026

0.19.0

Apr 6, 2026

0.18.2

Apr 6, 2026

0.18.1

Apr 6, 2026

0.18.0

Apr 5, 2026

0.17.5

Apr 5, 2026

0.17.4

Apr 5, 2026

0.17.3

Apr 5, 2026

0.17.2

Apr 4, 2026

0.17.1

Apr 4, 2026

0.17.0

Apr 4, 2026

0.16.0

Apr 3, 2026

0.15.0

Apr 3, 2026

0.14.0

Apr 2, 2026

0.13.0

Apr 2, 2026

0.12.0

Apr 2, 2026

0.11.0

Apr 2, 2026

0.10.4

Apr 1, 2026

0.10.3

Apr 1, 2026

0.10.2

Apr 1, 2026

0.10.1

Mar 31, 2026

0.10.0

Mar 31, 2026

0.9.0

Mar 30, 2026

0.8.1

Mar 29, 2026

0.8.0

Mar 29, 2026

0.7.0

Mar 28, 2026

0.6.2

Mar 28, 2026

0.6.1

Mar 28, 2026

0.6.0

Mar 28, 2026

0.5.3

Mar 27, 2026

0.5.2

Mar 27, 2026

This version

0.5.1

Mar 27, 2026

0.5.0

Mar 27, 2026

0.4.1

Mar 26, 2026

0.4.0

Mar 26, 2026

0.3.1

Mar 24, 2026

0.3.0

Mar 23, 2026

0.2.6

Mar 23, 2026

0.2.5

Mar 23, 2026

0.2.4

Mar 21, 2026

0.2.3

Mar 21, 2026

0.2.2

Mar 20, 2026

0.2.1

Mar 20, 2026

0.2.0

Mar 20, 2026

0.1.0

Mar 20, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vstash-0.5.1.tar.gz (981.0 kB view details)

Uploaded Mar 27, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

vstash-0.5.1-py3-none-any.whl (53.0 kB view details)

Uploaded Mar 27, 2026 Python 3

File details

Details for the file vstash-0.5.1.tar.gz.

File metadata

Download URL: vstash-0.5.1.tar.gz
Upload date: Mar 27, 2026
Size: 981.0 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for vstash-0.5.1.tar.gz
Algorithm	Hash digest
SHA256	`d814cd18f36bd732e9e15efc6ba82a7ae4c3d141d2d3af88e98c7a3509eb8327`
MD5	`6f4a063cb851469505b49814781b6198`
BLAKE2b-256	`998fb9d76eeb8a28e1203e1545da8274e1ee6f761fac3fbbf7f2942bc6a07b5a`

See more details on using hashes here.

Provenance

The following attestation bundles were made for vstash-0.5.1.tar.gz:

Publisher: publish.yml on stffns/vstash

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: vstash-0.5.1.tar.gz
- Subject digest: d814cd18f36bd732e9e15efc6ba82a7ae4c3d141d2d3af88e98c7a3509eb8327
- Sigstore transparency entry: 1188969313
- Sigstore integration time: Mar 27, 2026
Source repository:
- Permalink: stffns/vstash@e4b13552facc698aae5619ce007ed7beb44d23df
- Branch / Tag: refs/tags/v0.5.1
- Owner: https://github.com/stffns
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@e4b13552facc698aae5619ce007ed7beb44d23df
- Trigger Event: release

File details

Details for the file vstash-0.5.1-py3-none-any.whl.

File metadata

Download URL: vstash-0.5.1-py3-none-any.whl
Upload date: Mar 27, 2026
Size: 53.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for vstash-0.5.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`bc22292e3f18a22ae72f6093719e7d6f145c75c6ef7713157909e6fc4934246e`
MD5	`018b071f79c05113538f232ed0bf8525`
BLAKE2b-256	`1bf21b718d41633e415bbec4bbda83276276c87d2d4502753f6d9e1f174a6b48`

See more details on using hashes here.

Provenance

The following attestation bundles were made for vstash-0.5.1-py3-none-any.whl:

Publisher: publish.yml on stffns/vstash

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: vstash-0.5.1-py3-none-any.whl
- Subject digest: bc22292e3f18a22ae72f6093719e7d6f145c75c6ef7713157909e6fc4934246e
- Sigstore transparency entry: 1188969315
- Sigstore integration time: Mar 27, 2026
Source repository:
- Permalink: stffns/vstash@e4b13552facc698aae5619ce007ed7beb44d23df
- Branch / Tag: refs/tags/v0.5.1
- Owner: https://github.com/stffns
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@e4b13552facc698aae5619ce007ed7beb44d23df
- Trigger Event: release

vstash 0.5.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

vstash

Why vstash?

Install

Quick start

Search (free, no API key needed)

Ask (requires an LLM backend)

Python SDK

LangChain Integration

Commands

Filtering with metadata

MCP Server — Claude Desktop Integration

Setup

Available MCP Tools

Configuration

Embedding models

How it works

Ingestion pipeline

Search pipeline

Memory scoring

Privacy

Supported file types

Roadmap

Easter Egg

License

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance