Single-file AI memory system for Python. Store, search, and query documents with built-in RAG.
Project description
memvid-sdk
Single-file AI memory system for Python. Store documents, search with BM25 ranking, and run retrieval-augmented generation (RAG) queries from a portable .mv2 file.
Built on Rust with PyO3 bindings. No database setup, no network dependencies, no configuration files.
Install
pip install memvid-sdk
Optional extras for framework integrations:
pip install "memvid-sdk[langchain]" # LangChain tools
pip install "memvid-sdk[llamaindex]" # LlamaIndex query engine
pip install "memvid-sdk[openai]" # OpenAI function schemas
pip install "memvid-sdk[crewai]" # CrewAI tools
pip install "memvid-sdk[full]" # All integrations
Quick start
from memvid_sdk import use, create
# Create a new memory file
mv = use("basic", "notes.mv2", mode="auto")
# Store a document
mv.put(
title="Project kickoff",
label="meeting",
metadata={"date": "2024-01-15"},
text="Discussed timeline, assigned tasks to team members.",
)
# Search by keyword
results = mv.find("timeline")
print(results["hits"])
# Ask a question (retrieves relevant context)
answer = mv.ask("What was discussed in the kickoff?")
print(answer["context"])
# Commit changes
mv.seal()
Embeddings & semantic search
Semantic search (mode="sem") and hybrid search (mode="auto") require a vector index:
from memvid_sdk import create
mv = create("notes.mv2", enable_vec=True)
Generate embeddings during ingestion (local or OpenAI)
mv.put(
title="Doc",
label="note",
metadata={},
text="alpha fastembed test",
enable_embedding=True,
embedding_model="bge-small", # local (fastembed)
)
Batch mode:
mv.put_many(
[{"title": "Doc", "label": "note", "text": "alpha"}],
opts={"enable_embedding": True, "embedding_model": "bge-small"},
)
Supported embedding_model values:
bge-small, bge-base, nomic, gte-large, openai-small, openai-large, openai-ada.
Bring your own embeddings (precomputed)
Store embedding identity metadata so semantic queries can auto-detect the right model later:
mv.put_many(
[{"title": "Doc", "label": "note", "text": "alpha"}],
embeddings=[[0.1, 0.2, 0.3, 0.4]],
embedding_identity={"provider": "custom", "model": "my-embedder-v1"},
)
Use an embedder (SDK-managed query embeddings)
from memvid_sdk.embeddings import HashEmbeddings
embedder = HashEmbeddings(dimension=32) # deterministic offline embedder for tests
mv.put_many(
[{"title": "Doc", "label": "note", "text": "alpha"}],
embedder=embedder,
)
mv.find("alpha", mode="sem", embedder=embedder)
mv.ask("alpha", mode="sem", context_only=True, embedder=embedder)
Built-in embedders:
OpenAIEmbeddings (OPENAI_API_KEY), CohereEmbeddings (COHERE_API_KEY),
VoyageEmbeddings (VOYAGE_API_KEY), NvidiaEmbeddings (NVIDIA_API_KEY),
HuggingFaceEmbeddings (local), HashEmbeddings (offline deterministic).
from memvid_sdk.embeddings import NvidiaEmbeddings
embedder = NvidiaEmbeddings(model="nvidia/nv-embed-v1") # uses NVIDIA_API_KEY
mv.put_many([{"title": "Doc", "label": "note", "text": "alpha"}], embedder=embedder)
mv.find("alpha", mode="sem", embedder=embedder)
Auto query embeddings (no query_embedding required)
If the memory contains memvid.embedding.* identity metadata (recommended) or a known vector
dimension (fallback), semantic queries can embed the query automatically:
mv.find("alpha", mode="sem")
mv.find("alpha", mode="sem", query_embedding_model="bge-small") # override/force
Core API
Opening a memory
mv = use(kind, path, apikey=None, **options)
| Parameter | Type | Description |
|---|---|---|
kind |
str |
Adapter type: "basic", "langchain", "llamaindex", "openai", "crewai", "vercel-ai", "autogen" |
path |
str |
Path to .mv2 file |
mode |
str |
"open" (default), "create", or "auto" |
enable_lex |
bool |
Enable lexical index (default: True) |
enable_vec |
bool |
Enable vector index (default: False) |
read_only |
bool |
Open in read-only mode (default: False) |
lock_timeout_ms |
int |
Lock acquisition timeout in milliseconds (default: 250) |
force |
str | None |
Set to "stale_only" to force-release stale locks |
force_writable |
bool |
Open read-only first, then re-open writable (best-effort) |
Context manager support:
with use("basic", "notes.mv2") as mv:
mv.put(title="Note", label="general", metadata={}, text="Content")
# File handle automatically closed
Storing documents
frame_id = mv.put(
title="Document title",
label="category",
metadata={"key": "value"},
text="Document content...",
uri="mv2://custom/path",
tags=["tag1", "tag2"],
)
Batch ingestion
For bulk imports, put_many processes documents in parallel with 100x+ speedup over individual calls:
docs = [
{"title": "Doc 1", "label": "news", "text": "First document..."},
{"title": "Doc 2", "label": "news", "text": "Second document..."},
# ... thousands more
]
frame_ids = mv.put_many(docs, opts={"compression_level": 3})
print(f"Ingested {len(frame_ids)} documents")
Searching
results = mv.find(query, k=5, snippet_chars=240, scope=None, mode=None)
| Parameter | Type | Description |
|---|---|---|
query |
str |
Search query |
k |
int |
Number of results (default: 5) |
snippet_chars |
int |
Snippet length (default: 240) |
scope |
str |
Filter by URI prefix |
mode |
str |
"auto", "lex", or "sem" |
Semantic/hybrid options (when mode != "lex"):
| Parameter | Type | Description |
|---|---|---|
query_embedding |
list[float] | None |
Precomputed query embedding |
query_embedding_model |
str | None |
Force embedding model for auto query embeddings |
adaptive |
bool | None |
Enable adaptive retrieval cutoff |
min_relevancy |
float | None |
Minimum relevancy (default: 0.5 when adaptive) |
max_k |
int | None |
Maximum results (default: 100 when adaptive) |
adaptive_strategy |
str | None |
One of: relative, absolute, cliff, elbow, combined |
Retrieval-augmented generation
response = mv.ask(
question,
k=6,
mode="auto",
model="openai:gpt-4o-mini",
api_key=os.environ.get("OPENAI_API_KEY"),
context_only=False,
mask_pii=False,
)
| Parameter | Type | Description |
|---|---|---|
question |
str |
Question to answer |
k |
int |
Documents to retrieve (default: 6) |
mode |
str |
"auto", "lex", or "sem" |
model |
str |
LLM for synthesis (e.g., "openai:gpt-4o-mini", "nvidia:meta/llama3-8b-instruct") |
api_key |
str |
API key for the LLM provider |
context_only |
bool |
Skip synthesis, return context only |
mask_pii |
bool |
Redact PII from response |
Semantic/hybrid options (when mode != "lex"):
| Parameter | Type | Description |
|---|---|---|
query_embedding |
list[float] | None |
Precomputed query embedding |
query_embedding_model |
str | None |
Force embedding model for auto query embeddings |
adaptive |
bool | None |
Enable adaptive retrieval cutoff |
min_relevancy |
float | None |
Minimum relevancy (default: 0.5 when adaptive) |
max_k |
int | None |
Maximum results (default: 100 when adaptive) |
adaptive_strategy |
str | None |
One of: relative, absolute, cliff, elbow, combined |
Best practices: Adaptive retrieval
For best search quality, enable adaptive retrieval with the combined strategy. This dynamically adjusts result counts based on relevance scores rather than returning a fixed k:
# Recommended for find()
results = mv.find(
"your query",
mode="sem", # or "auto"
adaptive=True,
adaptive_strategy="combined",
)
# Recommended for ask()
answer = mv.ask(
"your question",
mode="auto",
adaptive=True,
adaptive_strategy="combined",
)
The combined strategy uses both relative thresholds and score cliff detection to filter out low-relevance results, providing higher quality context for RAG applications.
Timeline queries
entries = mv.timeline(limit=100, since=1704067200, reverse=True)
Statistics
stats = mv.stats()
# {'frame_count': 42, 'size_bytes': 1048576, 'has_lex_index': True, ...}
File operations
# Commit pending changes
mv.seal()
# Explicit commit without sealing
mv.commit()
# Verify file integrity
report = mv.verify(deep=True)
report = Memvid.verify("notes.mv2", deep=True) # also supported
# Repair indexes
result = mv.doctor(dry_run=True)
result = Memvid.doctor("notes.mv2", rebuild_time_index=True)
Diagnostics
from memvid_sdk import info, lock_who, lock_nudge, verify_single_file
print(info()) # versions + enabled features
print(lock_who("notes.mv2")) # lock status + owner (if locked)
print(lock_nudge("notes.mv2")) # request stale lock release
verify_single_file("notes.mv2") # ensure no sidecar files exist
Framework adapters
Adapters expose framework-native tools when the corresponding dependency is installed.
LangChain
mv = use("langchain", "notes.mv2")
tools = mv.tools # List of StructuredTool instances
LlamaIndex
mv = use("llamaindex", "notes.mv2")
engine = mv.as_query_engine()
response = engine.query("What is the timeline?")
OpenAI function calling
mv = use("openai", "notes.mv2")
functions = mv.functions # JSON schemas for tool_calls
CrewAI
mv = use("crewai", "notes.mv2")
tools = mv.tools # CrewAI-compatible tools
Table extraction
Extract structured tables from PDFs:
result = mv.put_pdf_tables("report.pdf", embed_rows=True)
print(f"Extracted {result['tables_count']} tables")
tables = mv.list_tables()
data = mv.get_table(tables[0]["table_id"], format="dict")
Error handling
Typed exceptions with stable codes:
| Code | Exception | Description |
|---|---|---|
| MV001 | CapacityExceededError |
Storage capacity exceeded |
| MV002 | TicketInvalidError |
Invalid ticket signature |
| MV003 | TicketReplayError |
Ticket replay detected |
| MV004 | LexIndexDisabledError |
Lexical index not enabled |
| MV005 | TimeIndexMissingError |
Time index missing |
| MV006 | VerifyFailedError |
Verification failed |
| MV007 | LockedError |
File locked by another process |
| MV008 | ApiKeyRequiredError |
API key required |
| MV009 | MemoryAlreadyBoundError |
File bound to another memory |
| MV010 | FrameNotFoundError |
Frame not found |
| MV011 | VecIndexDisabledError |
Vector index not enabled |
| MV012 | CorruptFileError |
Corrupt file / invalid TOC |
| MV013 | FileNotFoundError |
File not found |
| MV014 | VecDimensionMismatchError |
Vector dimension mismatch |
| MV015 | EmbeddingFailedError |
Embedding failed |
| MV016 | ClipIndexDisabledError |
CLIP index not enabled |
| MV017 | NerModelNotAvailableError |
NER model not available |
from memvid_sdk import CapacityExceededError
try:
mv.put(...)
except CapacityExceededError:
print("Out of storage capacity")
Environment variables
| Variable | Description |
|---|---|
MEMVID_API_KEY |
API key for capacity beyond free tier |
MEMVID_API_URL |
Control plane URL (for enterprise deployments) |
MEMVID_OFFLINE |
Set to 1 to disable network providers and model downloads |
MEMVID_MODELS_DIR |
Path to local embedding model cache (fastembed) |
MEMVID_CLIP_MODEL |
Local CLIP model name (default: mobileclip-s2) |
OPENAI_API_KEY |
API key for OpenAI embeddings (and can be used for ask() examples) |
OPENAI_BASE_URL |
Optional OpenAI-compatible base URL for embeddings |
NVIDIA_API_KEY |
API key for NVIDIA Integrate embeddings |
NVIDIA_BASE_URL |
Optional NVIDIA Integrate base URL (default: https://integrate.api.nvidia.com) |
NVIDIA_EMBEDDING_MODEL |
Optional NVIDIA embedding model override |
Development
Build from source:
cd proprietary/memvid-bindings/python
# Create virtual environment
python -m venv .venv
source .venv/bin/activate
# Install dev dependencies
pip install maturin
# Build native extension
maturin develop --release --skip-install
# Run tests
./.venv/bin/python -m pytest -q
# Smoke-run offline-safe examples
./.venv/bin/python scripts/test_examples.py
# Optional: run gated embedding runtime tests
MEMVID_TEST_FASTEMBED=1 ./.venv/bin/python -m pytest -q tests/test_runtime_embeddings_gated.py
MEMVID_TEST_OPENAI=1 OPENAI_API_KEY=... ./.venv/bin/python -m pytest -q tests/test_runtime_embeddings_gated.py
MEMVID_TEST_NVIDIA=1 NVIDIA_API_KEY=... ./.venv/bin/python -m pytest -q tests/test_runtime_embeddings_gated.py
# Optional: run gated local model tests (requires `memvid models install ...`)
MEMVID_TEST_CLIP=1 ./.venv/bin/python -m pytest -q tests/test_phase2_models_tables.py
MEMVID_TEST_NER=1 ./.venv/bin/python -m pytest -q tests/test_phase2_models_tables.py
Requirements
- Python 3.8+
- macOS (Apple Silicon or Intel), Linux (x64), or Windows (x64)
License
Apache-2.0. See LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file memvid_sdk-2.0.140-cp38-abi3-win_amd64.whl.
File metadata
- Download URL: memvid_sdk-2.0.140-cp38-abi3-win_amd64.whl
- Upload date:
- Size: 13.4 MB
- Tags: CPython 3.8+, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ba14c4484f55cc5e67cee2d52dc880d05fbe4143031048ac25e31236632fb4a6
|
|
| MD5 |
17992e222387e55740897f69b468d397
|
|
| BLAKE2b-256 |
0fe1eb4ade2e77e20d69521bebb3e23f5a1e533bb5920a07753474a8d6d93e38
|
File details
Details for the file memvid_sdk-2.0.140-cp38-abi3-manylinux_2_35_x86_64.whl.
File metadata
- Download URL: memvid_sdk-2.0.140-cp38-abi3-manylinux_2_35_x86_64.whl
- Upload date:
- Size: 99.4 MB
- Tags: CPython 3.8+, manylinux: glibc 2.35+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dc80d942b798df3e73d1059cd9cb7fa1f8a05200516948f5658fc2504a9d4606
|
|
| MD5 |
e4397a0572b6ede599e7290b8fee9578
|
|
| BLAKE2b-256 |
f90b968f08959a643b0c2f511b3f573f1d5b22ea3ba56659cf9ecf24f8b3e8ce
|
File details
Details for the file memvid_sdk-2.0.140-cp38-abi3-macosx_11_0_arm64.whl.
File metadata
- Download URL: memvid_sdk-2.0.140-cp38-abi3-macosx_11_0_arm64.whl
- Upload date:
- Size: 63.9 MB
- Tags: CPython 3.8+, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4f71e3431741770b4c25cc7c86738e4f5bdb2eb38cdc272436c24d2dbf22cfe7
|
|
| MD5 |
10966815e84f501e5355f08dbcf67137
|
|
| BLAKE2b-256 |
1c31a1e7a544ea90822dd88c23c8e5f5c63aec39a38a2b74b1fbdb9e6fb09d29
|
File details
Details for the file memvid_sdk-2.0.140-cp38-abi3-macosx_10_12_x86_64.whl.
File metadata
- Download URL: memvid_sdk-2.0.140-cp38-abi3-macosx_10_12_x86_64.whl
- Upload date:
- Size: 66.0 MB
- Tags: CPython 3.8+, macOS 10.12+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
18f74113527d0a98f9b20dc86474c889ac25a0e3ad232c8d09df177d589b76e4
|
|
| MD5 |
688ae35c4a89325f91ff30fb400f7811
|
|
| BLAKE2b-256 |
ddf2bb43e969bb937edb577212618b52510941d81dab0e42460d5e5b3bc7ae64
|