ChromaDB virtual filesystem backend for deepagents — instant session creation, zero marginal compute cost.

These details have not been verified by PyPI

Project links

Project description

deepagents-chromafs

A read-only BackendProtocol backend for DeepAgents that treats a ChromaDB collection as a virtual filesystem.

Inspired by the ChromaFs algorithm from Mintlify: replace expensive sandbox boot (~46 s) with an in-memory virtual filesystem bootstrapped from a single Chroma document (~100 ms).

How it works

Path tree

The entire directory tree is stored as a single JSON document in Chroma under the key __path_tree__:

{
    "auth/oauth.md": { "isPublic": true, "groups": [] },
    "auth/api-keys.mdx": { "isPublic": true, "groups": [] },
    "internal/billing.md": { "isPublic": false, "groups": ["admin", "billing"] }
}

Slug format contract: every key must exactly match the page_slug metadata on each chunk in the same collection. Slugs may or may not carry a file extension — auth/oauth.md, Makefile, and Dockerfile are all valid. Extension-based glob patterns (**/*.md, **/*.py) only match slugs that include the corresponding extension; slugs without an extension simply won't match those patterns, which is the expected behavior.

The document may optionally be gzip-compressed and base64-encoded. On bootstrap, the backend fetches this document, applies RBAC filtering (hiding paths the user cannot access), and builds an in-memory directory index — no further network calls are needed for ls, glob, or path-scoping.

Content (cat)

Page content is stored as chunks in Chroma, each with page_slug and chunk_index metadata fields. On first read, all chunks are fetched, sorted, joined, and cached for the session lifetime.

Grep (4-step pipeline)

Scope — derive candidate slugs from the in-memory tree (limited to the requested path / glob).
Coarse filter — Chroma $contains / $regex on where_document to find matching chunks.
Bulk prefetch — fetch all matched page slugs concurrently into the in-memory cache.
Fine filter — in-memory regex on cached content to produce line-level GrepMatch results.

Write operations

All write operations (write, edit, upload_files) return an EROFS error. The filesystem is stateless by design.

Installation

pip install deepagents-chromafs

Or with uv:

uv add deepagents-chromafs

With Redis cache support:

pip install deepagents-chromafs[redis]
# or
uv add deepagents-chromafs[redis]

Quick start

import chromadb
from deepagents_chromafs import ChromaFsBackend

client = chromadb.Client()
collection = client.get_collection("my_docs")

backend = ChromaFsBackend(collection)

# List root directory
result = backend.ls("/")
for entry in result.entries:
    print(entry["path"], "dir" if entry.get("is_dir") else "file")

# Read a page
result = backend.read("/auth/oauth.md")
print(result.file_data["content"])

# Grep across all pages
result = backend.grep("OAuth2")
for match in result.matches:
    print(f"{match['path']}:{match['line']}: {match['text']}")

# Glob for files
result = backend.glob("**/*.md")
for entry in result.matches:
    print(entry["path"])

RBAC (group-based access control)

backend = ChromaFsBackend(
    collection,
    user_groups=frozenset({"admin", "billing"}),
)

Paths whose isPublic is False and whose groups list does not intersect with user_groups are hidden from the tree entirely — they do not appear in ls, glob, or grep results.

Custom metadata field names

backend = ChromaFsBackend(
    collection,
    slug_field="doc_slug",        # default: "page_slug"
    chunk_index_field="seq",      # default: "chunk_index"
)

Redis cache (multi-session / multi-worker)

By default, page content is cached in-memory for the lifetime of the ChromaFsBackend instance. For multi-session or multi-worker deployments, plug in RedisContentCache to share the cache across processes:

import redis
from deepagents_chromafs import ChromaFsBackend
from deepagents_chromafs.redis_cache import RedisContentCache

cache = RedisContentCache(
    redis.Redis(host="localhost", port=6379, db=0),
    prefix="myapp",   # namespace — avoids key collisions between collections
    ttl=3600,         # seconds; 0 = no expiry
)

backend = ChromaFsBackend(collection, cache=cache)

Any ContentCache subclass is accepted, so you can wire in other backends (Memcached, DynamoDB, etc.) by subclassing ContentCache and overriding get, put, has, and clear.

ChromaDB schema

Each page chunk document must have these metadata fields:

Field	Type	Description
`page_slug`	`str`	Page identifier including extension (e.g. `auth/oauth.md`)
`chunk_index`	`int`	Chunk ordering within the page

The path tree is stored as a single document with ID __path_tree__.

Preventing `__path_tree__` from polluting search

By default ChromaDB auto-generates an embedding for every document added via collection.add(), including __path_tree__. This wastes embedding compute and lets the document surface in semantic similarity searches (collection.query()). Two mitigations are recommended when inserting the path tree:

1. Zero-vector embedding (semantic search)

Pass an explicit zero vector so the document never wins a cosine similarity match:

EMBEDDING_DIM = 1536  # match your collection's embedding dimension

collection.add(
    ids=["__path_tree__"],
    documents=[tree_json],
    embeddings=[[0.0] * EMBEDDING_DIM],
)

2. Metadata marker (full-text / where_document queries)

Add a metadata field that lets you exclude the document from your own queries:

collection.add(
    ids=["__path_tree__"],
    documents=[tree_json],
    embeddings=[[0.0] * EMBEDDING_DIM],
    metadatas=[{"_system": True}],
)

Then filter it out in any custom where_document scan:

collection.get(
    where={"_system": {"$ne": True}},
    where_document={"$contains": "access_token"},
)

Note: ChromaFsBackend itself is not affected — its grep pipeline always scopes queries to page_slug metadata, so __path_tree__ (which has no page_slug) is naturally excluded from all results.

Development

# Install dev dependencies
make install

# Run tests
make test

# Lint
make lint

# Format
make format

Algorithm reference

See the ChromaFs algorithm post on Mintlify for the original description.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Apr 27, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

deepagents_chromafs-0.1.0.tar.gz (19.2 kB view details)

Uploaded Apr 27, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

deepagents_chromafs-0.1.0-py3-none-any.whl (18.8 kB view details)

Uploaded Apr 27, 2026 Python 3

File details

Details for the file deepagents_chromafs-0.1.0.tar.gz.

File metadata

Download URL: deepagents_chromafs-0.1.0.tar.gz
Upload date: Apr 27, 2026
Size: 19.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.11.6 {"installer":{"name":"uv","version":"0.11.6","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for deepagents_chromafs-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`cd73cab1637ac36a5be98f26fbe6828c7ddaef5bad886c9fe191bbeaa9f42249`
MD5	`ddb2863223a7489db925280f7e3b2f81`
BLAKE2b-256	`365b3dc31187ab43ded8f132384ed426c0f1c3fee7ead9fe5ce453b040102932`

See more details on using hashes here.

File details

Details for the file deepagents_chromafs-0.1.0-py3-none-any.whl.

File metadata

Download URL: deepagents_chromafs-0.1.0-py3-none-any.whl
Upload date: Apr 27, 2026
Size: 18.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.11.6 {"installer":{"name":"uv","version":"0.11.6","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for deepagents_chromafs-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`fa912d8e9959959e8f7a4246c66cefa4363b84fc71ffbd3561a73d68a97cad9d`
MD5	`2fe41f02c082248561bc0b86e002b18d`
BLAKE2b-256	`1cbab8f6d880ac16eb188823deb985ebafe492bedac7550eadd622ff65f256d9`

See more details on using hashes here.

deepagents-chromafs 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

deepagents-chromafs

How it works

Path tree

Content (cat)

Grep (4-step pipeline)

Write operations

Installation

Quick start

RBAC (group-based access control)

Custom metadata field names

Redis cache (multi-session / multi-worker)

ChromaDB schema

Preventing __path_tree__ from polluting search

Development

Algorithm reference

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Preventing `__path_tree__` from polluting search