In-process virtual filesystem with hard quota for Python
Project description
D-MemFS
An in-process virtual filesystem with hard quota enforcement for Python.
Why MFS?
MemoryFileSystem gives you a fully isolated filesystem-like workspace inside a Python process.
- Hard quota (
MFSQuotaExceededError) to reject oversized writes before OOM - Hierarchical directories and multi-file operations (
import_tree,copy_tree,move) - File-level RW locking + global structure lock for thread-safe operations
- Free-threaded Python compatible (
PYTHON_GIL=0) — stress-tested under 50-thread contention - Async wrapper (
AsyncMemoryFileSystem) powered byasyncio.to_thread - Zero runtime dependencies (standard library only)
This is useful when io.BytesIO is too primitive (single buffer), and OS-level RAM disks/tmpfs are impractical (permissions, container policy, Windows driver friction).
Installation
pip install D-MemFS
Requirements: Python 3.11+
Quick Start
from dmemfs import MemoryFileSystem, MFSQuotaExceededError
mfs = MemoryFileSystem(max_quota=64 * 1024 * 1024)
mfs.mkdir("/data")
with mfs.open("/data/hello.bin", "wb") as f:
f.write(b"hello")
with mfs.open("/data/hello.bin", "rb") as f:
print(f.read()) # b"hello"
print(mfs.listdir("/data"))
print(mfs.is_file("/data/hello.bin")) # True
try:
with mfs.open("/huge.bin", "wb") as f:
f.write(bytes(512 * 1024 * 1024))
except MFSQuotaExceededError as e:
print(e)
API Highlights
MemoryFileSystem
open(path, mode, *, preallocate=0, lock_timeout=None)mkdir,remove,rmtree,rename,move,copy,copy_treelistdir,exists,is_dir,is_file,walk,globstat,stats,get_sizeexport_as_bytesio,export_tree,iter_export_tree,import_tree
Constructor parameters:
max_quota(default256 MiB): byte quota for file datamax_nodes(defaultNone): optional cap on total node count (files + directories). RaisesMFSNodeLimitExceededErrorwhen exceeded.default_storage(default"auto"): storage backend for new files —"auto"/"sequential"/"random_access"promotion_hard_limit(defaultNone): byte threshold above which Sequential→RandomAccess auto-promotion is suppressed (Noneuses the built-in 512 MiB limit)chunk_overhead_override(defaultNone): override the per-chunk overhead estimate used for quota accounting
Note: The
BytesIOreturned byexport_as_bytesio()is outside quota management. Exporting large files may consume significant process memory beyond the configured quota limit.
Supported binary modes: rb, wb, ab, r+b, xb
MemoryFileHandle
read,write,seek,tell,truncate,flush,close- file-like capability checks:
readable,writable,seekable
flush() is intentionally a no-op (compatibility API for file-like integrations).
stat() return (MFSStatResult)
size, created_at, modified_at, generation, is_dir
- Supports both files and directories
- For directories:
size=0,generation=0,is_dir=True
Text Mode
D-MemFS natively operates in binary mode. For text I/O, use MFSTextHandle:
from dmemfs import MemoryFileSystem, MFSTextHandle
mfs = MemoryFileSystem()
mfs.mkdir("/data")
# Write text
with mfs.open("/data/hello.bin", "wb") as f:
th = MFSTextHandle(f, encoding="utf-8")
th.write("こんにちは世界\n")
th.write("Hello, World!\n")
# Read text line by line
with mfs.open("/data/hello.bin", "rb") as f:
th = MFSTextHandle(f, encoding="utf-8")
for line in th:
print(line, end="")
MFSTextHandle is a thin, bufferless wrapper. It encodes on write() and decodes on read() / readline(). Unlike io.TextIOWrapper, it introduces no buffering issues when used with MemoryFileHandle.
Use Case Tutorials
ETL Staging
Stage data through raw → processed → output directories:
from dmemfs import MemoryFileSystem
mfs = MemoryFileSystem(max_quota=16 * 1024 * 1024)
mfs.mkdir("/raw")
mfs.mkdir("/processed")
raw_data = b"id,name,value\n1,foo,100\n2,bar,200\n"
with mfs.open("/raw/data.csv", "wb") as f:
f.write(raw_data)
with mfs.open("/raw/data.csv", "rb") as f:
data = f.read()
with mfs.open("/processed/data.csv", "wb") as f:
f.write(data.upper())
mfs.rmtree("/raw") # cleanup staging
Archive-like Operations
Store, list, and export multiple files as a tree:
from dmemfs import MemoryFileSystem
mfs = MemoryFileSystem()
mfs.import_tree({
"/archive/doc1.bin": b"Document 1",
"/archive/doc2.bin": b"Document 2",
"/archive/sub/doc3.bin": b"Document 3",
})
print(mfs.listdir("/archive")) # ['doc1.bin', 'doc2.bin', 'sub']
snapshot = mfs.export_tree(prefix="/archive") # dict of {path: bytes}
SQLite Snapshot
Serialize an in-memory SQLite DB into MFS and restore it later:
import sqlite3
from dmemfs import MemoryFileSystem
mfs = MemoryFileSystem()
conn = sqlite3.connect(":memory:")
conn.execute("CREATE TABLE t (id INTEGER, val TEXT)")
conn.execute("INSERT INTO t VALUES (1, 'hello')")
conn.commit()
with mfs.open("/snapshot.db", "wb") as f:
f.write(conn.serialize())
conn.close()
with mfs.open("/snapshot.db", "rb") as f:
raw = f.read()
restored = sqlite3.connect(":memory:")
restored.deserialize(raw)
rows = restored.execute("SELECT * FROM t").fetchall() # [(1, 'hello')]
Concurrency and Locking Notes
- Path/tree operations are guarded by
_global_lock. - File access is guarded by per-file
ReadWriteLock. lock_timeoutbehavior:None: block indefinitely0.0: try-lock (fail immediately withBlockingIOError)> 0: timeout in seconds, thenBlockingIOError
- Current
ReadWriteLockis non-fair: under sustained read load, writers can starve.
Operational guidance:
- Keep lock hold duration short
- Set an explicit
lock_timeoutin latency-sensitive code paths walk()andglob()provide weak consistency: each directory level is snapshotted under_global_lock, but the overall traversal is NOT atomic. Concurrent structural changes may produce inconsistent results.
Async Usage
from dmemfs import AsyncMemoryFileSystem
async def run() -> None:
mfs = AsyncMemoryFileSystem(max_quota=64 * 1024 * 1024)
await mfs.mkdir("/a")
async with await mfs.open("/a/f.bin", "wb") as f:
await f.write(b"data")
async with await mfs.open("/a/f.bin", "rb") as f:
print(await f.read())
Benchmarks
Minimal benchmark tooling is included:
- MFS vs
io.BytesIOvsPyFilesystem2 (MemoryFS)vstempfile - Cases: many-small-files and stream write/read
- Optional report output to
benchmarks/results/
Note: As of setuptools 82 (February 2026),
pyfilesystem2fails to import due to a known upstream issue (#597). Benchmark results including PyFilesystem2 were measured with setuptools ≤ 81 and are valid as historical comparison data.
Run:
uvx --with-requirements requirements.txt --with-editable . python benchmarks/compare_backends.py --save-md auto --save-json auto
See BENCHMARK.md for details.
Latest benchmark snapshot:
Testing and Coverage
Test execution and dev flow are documented in TESTING.md.
Typical local run:
uv pip compile requirements.in -o requirements.txt
uvx --with-requirements requirements.txt --with-editable . pytest tests/ -v --timeout=30 --cov=dmemfs --cov-report=xml --cov-report=term-missing
CI (.github/workflows/test.yml) runs tests with coverage XML generation.
API Docs Generation
API docs can be generated as Markdown (viewable on GitHub) using pydoc-markdown:
uvx --with pydoc-markdown --with-editable . pydoc-markdown '{
loaders: [{type: python, search_path: [.]}],
processors: [{type: filter, expression: "default()"}],
renderer: {type: markdown, filename: docs/api_md/index.md}
}'
Or as HTML using pdoc (local browsing only):
uvx --with-requirements requirements.txt pdoc dmemfs -o docs/api
Compatibility and Non-Goals
- Core
open()is binary-only (rb,wb,ab,r+b,xb). Text I/O is available via theMFSTextHandlewrapper. - No symlink/hardlink support — intentionally omitted to eliminate path traversal loops and structural complexity (same rationale as
pathlib.PurePath). - No direct
pathlib.Path/os.PathLikeAPI — MFS paths are virtual and must not be confused with host filesystem paths. Acceptingos.PathLikewould allow third-party libraries or a plainopen()call to silently treat an MFS virtual path as a real OS path, potentially issuing unintended syscalls against the host filesystem. All paths must be plainstrwith POSIX-style absolute notation (e.g."/data/file.txt"). - No kernel filesystem integration (intentionally in-process only)
Auto-promotion behavior:
- By default (
default_storage="auto"), new files start asSequentialMemoryFileand auto-promote toRandomAccessMemoryFilewhen random writes are detected. - Promotion is one-way (no downgrade back to sequential).
- Use
default_storage="sequential"or"random_access"to fix the backend at construction; usepromotion_hard_limitto suppress auto-promotion above a byte threshold. - Storage promotion temporarily doubles memory usage for the promoted file. The quota system accounts for this, but process-level memory may spike briefly.
Security note: In-memory data may be written to physical disk via OS swap or core dumps. MFS does not provide memory-locking (e.g., mlock) or secure erasure. Do not rely on MFS alone for sensitive data isolation.
Exception Reference
| Exception | Typical cause |
|---|---|
MFSQuotaExceededError |
write/import/copy would exceed quota |
MFSNodeLimitExceededError |
node count would exceed max_nodes (subclass of MFSQuotaExceededError) |
FileNotFoundError |
path missing |
FileExistsError |
creation target already exists |
IsADirectoryError |
file operation on directory |
NotADirectoryError |
directory operation on file |
BlockingIOError |
lock timeout or open-file conflict |
io.UnsupportedOperation |
mode mismatch / unsupported operation |
ValueError |
invalid mode/path/seek/truncate arguments |
Testing with pytest
D-MemFS ships a pytest plugin that provides an mfs fixture:
# conftest.py — register the plugin explicitly
pytest_plugins = ["dmemfs._pytest_plugin"]
Note: The plugin is not auto-discovered. Users must declare it in
conftest.pyto opt in.
# test_example.py
def test_write_read(mfs):
mfs.mkdir("/tmp")
with mfs.open("/tmp/hello.txt", "wb") as f:
f.write(b"hello")
with mfs.open("/tmp/hello.txt", "rb") as f:
assert f.read() == b"hello"
Development Notes
Design documents (Japanese):
- Architecture Spec v13 — API design, internal structure, CI matrix
- Detailed Design Spec — component-level design and rationale
- Test Design Spec — test case table and pseudocode
These documents are written in Japanese and serve as internal design references.
Performance Summary
Key results from the included benchmark (300 small files × 4 KiB, 16 MiB stream, 2 GiB large stream):
| Case | MFS (ms) | BytesIO (ms) | tempfile (ms) |
|---|---|---|---|
| small_files_rw | 34 | 5 | 164 |
| stream_write_read | 64 | 51 | 17 |
| random_access_rw | 24 | 53 | 27 |
| large_stream_write_read | 1 438 | 7 594 | 1 931 |
| many_files_random_read | 777 | 163 | 4 745 |
MFS incurs a small overhead on tiny-file workloads but delivers significantly better performance on large streams and random-access patterns compared with BytesIO. See BENCHMARK.md and benchmark_current_result.md for full data.
Note:
tempfileresults above were measured with the system temp directory on a RAM disk. On a physical SSD/HDD,tempfileperformance will be substantially slower.
License
MIT License
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file d_memfs-0.2.0.tar.gz.
File metadata
- Download URL: d_memfs-0.2.0.tar.gz
- Upload date:
- Size: 21.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6cbe5c384c74eff1258ac80f3da4bd65df5a6841f8a0c52f834088e4dc2d176b
|
|
| MD5 |
077e48e0d0caadb81882e5420d143c29
|
|
| BLAKE2b-256 |
6f7249a973f5a348fd586b3153385e62f5153c0a3b10b818c7b3c5e3cee39b19
|
Provenance
The following attestation bundles were made for d_memfs-0.2.0.tar.gz:
Publisher:
publish.yml on nightmarewalker/D-MemFS
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
d_memfs-0.2.0.tar.gz -
Subject digest:
6cbe5c384c74eff1258ac80f3da4bd65df5a6841f8a0c52f834088e4dc2d176b - Sigstore transparency entry: 1005572600
- Sigstore integration time:
-
Permalink:
nightmarewalker/D-MemFS@f28a8fc0bb4d7f5d13558a22b46af79583e65ee7 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/nightmarewalker
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@f28a8fc0bb4d7f5d13558a22b46af79583e65ee7 -
Trigger Event:
release
-
Statement type:
File details
Details for the file d_memfs-0.2.0-py3-none-any.whl.
File metadata
- Download URL: d_memfs-0.2.0-py3-none-any.whl
- Upload date:
- Size: 24.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
58fc90d8372fcacc972178a2f493ed4cdedcfba83f6de3ec08eeaec481c0b9d0
|
|
| MD5 |
800b8052d2b476947ebf04b195396511
|
|
| BLAKE2b-256 |
9dbd9c328da0d3d176450cf3a9c07d7559988aa8f6107fbf32c086ac9cd532e4
|
Provenance
The following attestation bundles were made for d_memfs-0.2.0-py3-none-any.whl:
Publisher:
publish.yml on nightmarewalker/D-MemFS
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
d_memfs-0.2.0-py3-none-any.whl -
Subject digest:
58fc90d8372fcacc972178a2f493ed4cdedcfba83f6de3ec08eeaec481c0b9d0 - Sigstore transparency entry: 1005572601
- Sigstore integration time:
-
Permalink:
nightmarewalker/D-MemFS@f28a8fc0bb4d7f5d13558a22b46af79583e65ee7 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/nightmarewalker
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@f28a8fc0bb4d7f5d13558a22b46af79583e65ee7 -
Trigger Event:
release
-
Statement type: