ChromaDB virtual filesystem backend for deepagents — instant session creation, zero marginal compute cost.
Project description
deepagents-chromafs
A read-only BackendProtocol backend for DeepAgents that treats a ChromaDB collection as a virtual filesystem.
Inspired by the ChromaFs algorithm from Mintlify: replace expensive sandbox boot (~46 s) with an in-memory virtual filesystem bootstrapped from a single Chroma document (~100 ms).
How it works
Path tree
The entire directory tree is stored as a single JSON document in Chroma under the key __path_tree__:
{
"auth/oauth.md": { "isPublic": true, "groups": [] },
"auth/api-keys.mdx": { "isPublic": true, "groups": [] },
"internal/billing.md": { "isPublic": false, "groups": ["admin", "billing"] }
}
Slug format contract: every key must exactly match the
page_slugmetadata on each chunk in the same collection. Slugs may or may not carry a file extension —auth/oauth.md,Makefile, andDockerfileare all valid. Extension-based glob patterns (**/*.md,**/*.py) only match slugs that include the corresponding extension; slugs without an extension simply won't match those patterns, which is the expected behavior.
The document may optionally be gzip-compressed and base64-encoded. On bootstrap, the backend fetches this document, applies RBAC filtering (hiding paths the user cannot access), and builds an in-memory directory index — no further network calls are needed for ls, glob, or path-scoping.
Content (cat)
Page content is stored as chunks in Chroma, each with page_slug and chunk_index metadata fields. On first read, all chunks are fetched, sorted, joined, and cached for the session lifetime.
Grep (4-step pipeline)
- Scope — derive candidate slugs from the in-memory tree (limited to the requested
path/glob). - Coarse filter — Chroma
$contains/$regexonwhere_documentto find matching chunks. - Bulk prefetch — fetch all matched page slugs concurrently into the in-memory cache.
- Fine filter — in-memory regex on cached content to produce line-level
GrepMatchresults.
Write operations
All write operations (write, edit, upload_files) return an EROFS error. The filesystem is stateless by design.
Installation
pip install deepagents-chromafs
Or with uv:
uv add deepagents-chromafs
With Redis cache support:
pip install deepagents-chromafs[redis]
# or
uv add deepagents-chromafs[redis]
Quick start
import chromadb
from deepagents_chromafs import ChromaFsBackend
client = chromadb.Client()
collection = client.get_collection("my_docs")
backend = ChromaFsBackend(collection)
# List root directory
result = backend.ls("/")
for entry in result.entries:
print(entry["path"], "dir" if entry.get("is_dir") else "file")
# Read a page
result = backend.read("/auth/oauth.md")
print(result.file_data["content"])
# Grep across all pages
result = backend.grep("OAuth2")
for match in result.matches:
print(f"{match['path']}:{match['line']}: {match['text']}")
# Glob for files
result = backend.glob("**/*.md")
for entry in result.matches:
print(entry["path"])
RBAC (group-based access control)
backend = ChromaFsBackend(
collection,
user_groups=frozenset({"admin", "billing"}),
)
Paths whose isPublic is False and whose groups list does not intersect with user_groups are hidden from the tree entirely — they do not appear in ls, glob, or grep results.
Custom metadata field names
backend = ChromaFsBackend(
collection,
slug_field="doc_slug", # default: "page_slug"
chunk_index_field="seq", # default: "chunk_index"
)
Redis cache (multi-session / multi-worker)
By default, page content is cached in-memory for the lifetime of the
ChromaFsBackend instance. For multi-session or multi-worker deployments,
plug in RedisContentCache to share the cache across processes:
import redis
from deepagents_chromafs import ChromaFsBackend
from deepagents_chromafs.redis_cache import RedisContentCache
cache = RedisContentCache(
redis.Redis(host="localhost", port=6379, db=0),
prefix="myapp", # namespace — avoids key collisions between collections
ttl=3600, # seconds; 0 = no expiry
)
backend = ChromaFsBackend(collection, cache=cache)
Any ContentCache subclass is accepted, so you can wire in other backends
(Memcached, DynamoDB, etc.) by subclassing ContentCache and overriding
get, put, has, and clear.
ChromaDB schema
Each page chunk document must have these metadata fields:
| Field | Type | Description |
|---|---|---|
page_slug |
str |
Page identifier including extension (e.g. auth/oauth.md) |
chunk_index |
int |
Chunk ordering within the page |
The path tree is stored as a single document with ID __path_tree__.
Preventing __path_tree__ from polluting search
By default ChromaDB auto-generates an embedding for every document added via
collection.add(), including __path_tree__. This wastes embedding compute
and lets the document surface in semantic similarity searches (collection.query()).
Two mitigations are recommended when inserting the path tree:
1. Zero-vector embedding (semantic search)
Pass an explicit zero vector so the document never wins a cosine similarity match:
EMBEDDING_DIM = 1536 # match your collection's embedding dimension
collection.add(
ids=["__path_tree__"],
documents=[tree_json],
embeddings=[[0.0] * EMBEDDING_DIM],
)
2. Metadata marker (full-text / where_document queries)
Add a metadata field that lets you exclude the document from your own queries:
collection.add(
ids=["__path_tree__"],
documents=[tree_json],
embeddings=[[0.0] * EMBEDDING_DIM],
metadatas=[{"_system": True}],
)
Then filter it out in any custom where_document scan:
collection.get(
where={"_system": {"$ne": True}},
where_document={"$contains": "access_token"},
)
Note:
ChromaFsBackenditself is not affected — its grep pipeline always scopes queries topage_slugmetadata, so__path_tree__(which has nopage_slug) is naturally excluded from all results.
Development
# Install dev dependencies
make install
# Run tests
make test
# Lint
make lint
# Format
make format
Algorithm reference
See the ChromaFs algorithm post on Mintlify for the original description.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file deepagents_chromafs-0.1.0.tar.gz.
File metadata
- Download URL: deepagents_chromafs-0.1.0.tar.gz
- Upload date:
- Size: 19.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.6 {"installer":{"name":"uv","version":"0.11.6","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cd73cab1637ac36a5be98f26fbe6828c7ddaef5bad886c9fe191bbeaa9f42249
|
|
| MD5 |
ddb2863223a7489db925280f7e3b2f81
|
|
| BLAKE2b-256 |
365b3dc31187ab43ded8f132384ed426c0f1c3fee7ead9fe5ce453b040102932
|
File details
Details for the file deepagents_chromafs-0.1.0-py3-none-any.whl.
File metadata
- Download URL: deepagents_chromafs-0.1.0-py3-none-any.whl
- Upload date:
- Size: 18.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.6 {"installer":{"name":"uv","version":"0.11.6","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fa912d8e9959959e8f7a4246c66cefa4363b84fc71ffbd3561a73d68a97cad9d
|
|
| MD5 |
2fe41f02c082248561bc0b86e002b18d
|
|
| BLAKE2b-256 |
1cbab8f6d880ac16eb188823deb985ebafe492bedac7550eadd622ff65f256d9
|