Skip to main content

3FS USRBIO file reader for fastsafetensors

Project description

fastsafetensor-3fs-reader

3FS USRBIO file reader for fastsafetensors.

This package provides a high-performance reader for 3FS USRBIO files with two backend implementations (C++ and pure-Python) and a mock for testing.

Backends

Backend Module Requirements Performance
C++ reader_cpp.py libhf3fs_api_shared.so + libtorch + CUDA Best (GIL-free, native USRBIO async I/O)
Python reader_py.py hf3fs_py_usrbio (+ optional PyTorch for GPU) Good (USRBIO via Client API or OS pread)
Mock mock.py None For testing only

The package auto-selects the best available backend at import time: C++ → Python → Mock. Use get_backend() to check which one is active.

Note: The C++ backend supports pipelined mode (double-buffered async H2D copy via cudaMemcpyAsync) which overlaps network I/O with GPU memory transfer for significantly better throughput. Pass pipelined=True to read_chunked() to enable it. The Python backend does not support pipelining and will silently fall back to non-pipelined mode.

Installation

Pure-Python mode (no C++ compilation)

FST3FS_NO_EXT=1 pip install .

With C++ extension

Requires libhf3fs_api_shared.so (from a 3FS build) and CUDA Runtime. The hf3fs_usrbio.h header is bundled in the package, so no external header dependency is needed:

export HF3FS_LIB_DIR=/path/to/3FS/build/lib         # directory with libhf3fs_api_shared.so
pip install .

Automatic libhf3fs_api_shared.so discovery

At import time, the package automatically searches for libhf3fs_api_shared.so using the following priority:

  1. HF3FS_LIB_DIR environment variable (user-explicit, highest priority).
  2. LD_LIBRARY_PATH directories (user already configured).
  3. hf3fs_py_usrbio pip install path — if hf3fs_py_usrbio is installed via pip, the library is typically located in a sibling .libs/ directory (e.g. site-packages/hf3fs_py_usrbio.libs/). This is discovered automatically so you don't need to set LD_LIBRARY_PATH manually.

The library is pre-loaded with RTLD_GLOBAL so that both the C++ and Python backends can resolve its symbols. Use get_hf3fs_lib_path() to check which path was loaded:

from fastsafetensor_3fs_reader import get_hf3fs_lib_path
print(get_hf3fs_lib_path())  # e.g. "/path/to/site-packages/hf3fs_py_usrbio.libs/libhf3fs_api_shared.so"

Installing hf3fs_py_usrbio (for the Python backend)

hf3fs_py_usrbio is not available on PyPI. It must be built from the DeepSeek 3FS source tree:

git clone https://github.com/deepseek-ai/3FS
cd 3FS
git submodule update --init --recursive
# Follow 3FS build instructions (cmake, etc.)
# After build, install the Python package:
cd build && pip install ..

Important: The default pip-installed hf3fs_py_usrbio package is suitable for testing and validation but is not recommended for production use. For production deployments, build 3FS from source with optimized compiler flags tailored to your hardware. Refer to projects like SGLang for examples of production-grade 3FS compilation workflows.

Usage

from fastsafetensor_3fs_reader import (
    ThreeFSFileReader,
    MockFileReader,
    is_available,
    get_backend,
)

# Check which backend is active
print(f"Backend: {get_backend()}")  # "cpp", "python", or "mock"

# Use mock reader for testing (always available)
reader = MockFileReader()
headers = reader.read_headers_batch(["/path/to/file.safetensors"])
reader.close()

# Use 3FS reader when available
if is_available():
    reader = ThreeFSFileReader(mount_point="/mnt/3fs")
    headers = reader.read_headers_batch([
        "/mnt/3fs/model-00001.safetensors",
        "/mnt/3fs/model-00002.safetensors",
    ])

    # Read tensor data into GPU memory
    import torch
    buf = torch.empty(1024 * 1024, dtype=torch.uint8, device="cuda")
    bytes_read = reader.read_chunked(
        path="/mnt/3fs/model-00001.safetensors",
        dev_ptr=buf.data_ptr(),
        file_offset=0,
        total_length=1024 * 1024,
    )
    reader.close()

Benchmark

The hack/benchmark/ directory contains a comprehensive benchmarking suite. Use benchmark_runner.py to measure read throughput across different backends, buffer sizes, chunk sizes, and process counts.

Full benchmark (read + GPU copy)

python hack/benchmark/benchmark_runner.py \
    --mount-point /mnt/3fs \
    --backends cpp,python \
    --buffer-sizes 8,16,32,64,128,256,512 \
    --chunk-sizes 8,16,32,64,128,256,512 \
    --num-processes 1,2,4,8 \
    --iterations 3

Download-only benchmark (host memory only, no GPU copy)

python hack/benchmark/benchmark_runner.py \
    --mount-point /mnt/3fs \
    --backends cpp,python \
    --buffer-sizes 8,16,32,64,128,256,512 \
    --chunk-sizes 8,16,32,64,128,256,512 \
    --num-processes 1,2,4,8 \
    --download-only \
    --iterations 3

Key parameters

Parameter Description Default
--mount-point 3FS FUSE mount-point path (required)
--backends Comma-separated backend names mock,python,cpp
--buffer-sizes Buffer sizes in MB 8,16,32,64,128,256,512,1024
--chunk-sizes Chunk sizes in MB 8,16,32,64,128,256,512,1024
--num-processes Process counts 1,2,4,8
--download-only Read into host memory only (skip GPU copy) false
--iterations Iterations per combination 3
--mode grid (sweep all combos) or single grid
--output-dir Directory for CSV and chart output ./benchmark_results

Performance Results

Test environment: Single 400 Gbps RDMA NIC. These numbers represent a loading baseline under specific storage and network hardware conditions — they do not represent the performance ceiling of the system.

Model: DeepSeek-V3 (total ~640 GB safetensors)

Configuration Avg Throughput (GB/s) Peak Throughput with fastsafetensors (GB/s) Load Time (s) Backend
8 processes, buffer=8 MB 35.0 32.0 30.34 C++ (non-pipelined)
8 processes, buffer=16 MB 37.6 36.6 25.73 C++ (pipelined)

Benchmark: RDMA throughput across buffer sizes (8M / 16M / 32M)

RDMA throughput across buffer sizes

Production: model weight loading with fastsafetensors (pipelined, peak 36.6 GB/s)

Model weight loading throughput

License

Apache-2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fastsafetensor_3fs_reader-0.3.3.tar.gz (56.7 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

fastsafetensor_3fs_reader-0.3.3-py3-none-any.whl (50.4 kB view details)

Uploaded Python 3

fastsafetensor_3fs_reader-0.3.3-cp313-cp313-manylinux_2_34_x86_64.manylinux_2_35_x86_64.whl (1.8 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.34+ x86-64manylinux: glibc 2.35+ x86-64

fastsafetensor_3fs_reader-0.3.3-cp312-cp312-manylinux_2_34_x86_64.manylinux_2_35_x86_64.whl (1.8 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.34+ x86-64manylinux: glibc 2.35+ x86-64

fastsafetensor_3fs_reader-0.3.3-cp311-cp311-manylinux_2_34_x86_64.manylinux_2_35_x86_64.whl (1.8 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.34+ x86-64manylinux: glibc 2.35+ x86-64

fastsafetensor_3fs_reader-0.3.3-cp310-cp310-manylinux_2_34_x86_64.manylinux_2_35_x86_64.whl (1.7 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.34+ x86-64manylinux: glibc 2.35+ x86-64

fastsafetensor_3fs_reader-0.3.3-cp39-cp39-manylinux_2_34_x86_64.manylinux_2_35_x86_64.whl (1.7 MB view details)

Uploaded CPython 3.9manylinux: glibc 2.34+ x86-64manylinux: glibc 2.35+ x86-64

File details

Details for the file fastsafetensor_3fs_reader-0.3.3.tar.gz.

File metadata

File hashes

Hashes for fastsafetensor_3fs_reader-0.3.3.tar.gz
Algorithm Hash digest
SHA256 8781d4d05600ce6a82cba17808a4f3780287702d896923e5b9583e09ca0bf205
MD5 d1adc44130eb2b2eb561fbb7aaa97ace
BLAKE2b-256 edadb90b58d6a2f1799696f7aa8246e1642a05bdd73a0d481d993e1dcb6ffb3a

See more details on using hashes here.

Provenance

The following attestation bundles were made for fastsafetensor_3fs_reader-0.3.3.tar.gz:

Publisher: build-wheel.yaml on ABNER-1/fastsafetensor_3fs_reader

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file fastsafetensor_3fs_reader-0.3.3-py3-none-any.whl.

File metadata

File hashes

Hashes for fastsafetensor_3fs_reader-0.3.3-py3-none-any.whl
Algorithm Hash digest
SHA256 41ab4672ad423b02bdde176ce7fbf0b3097077d9dbf422b24e04552f0c8df6ee
MD5 06aa140e1d956a0bcddc341d6053e5df
BLAKE2b-256 80cebfec00a36adce5d466d4ffb7ef2c6858a88d96d6c92837ca4b426f7c81b8

See more details on using hashes here.

Provenance

The following attestation bundles were made for fastsafetensor_3fs_reader-0.3.3-py3-none-any.whl:

Publisher: build-wheel.yaml on ABNER-1/fastsafetensor_3fs_reader

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file fastsafetensor_3fs_reader-0.3.3-cp313-cp313-manylinux_2_34_x86_64.manylinux_2_35_x86_64.whl.

File metadata

File hashes

Hashes for fastsafetensor_3fs_reader-0.3.3-cp313-cp313-manylinux_2_34_x86_64.manylinux_2_35_x86_64.whl
Algorithm Hash digest
SHA256 372734a98c25adc3c88a85c4f80db3f5f245e4713ba81e02cf5f8bc2c4736d16
MD5 6ae6aa0d96d55c43bb30fe3c6280f07f
BLAKE2b-256 9d640ed211bd2ca916c4f1b2876110b842d0ff0a6764e6080804264ad063329e

See more details on using hashes here.

Provenance

The following attestation bundles were made for fastsafetensor_3fs_reader-0.3.3-cp313-cp313-manylinux_2_34_x86_64.manylinux_2_35_x86_64.whl:

Publisher: build-wheel.yaml on ABNER-1/fastsafetensor_3fs_reader

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file fastsafetensor_3fs_reader-0.3.3-cp312-cp312-manylinux_2_34_x86_64.manylinux_2_35_x86_64.whl.

File metadata

File hashes

Hashes for fastsafetensor_3fs_reader-0.3.3-cp312-cp312-manylinux_2_34_x86_64.manylinux_2_35_x86_64.whl
Algorithm Hash digest
SHA256 85bcfe65a651a9416d9f8fafb0d7261f17b278d076dceac9102e2a4b3dce5f7a
MD5 736a3770d77a345766e923e87d9074d8
BLAKE2b-256 0a554660ba55d824c6ce67a38e05cf2ed45cd2ab82fc512d50e18445da677723

See more details on using hashes here.

Provenance

The following attestation bundles were made for fastsafetensor_3fs_reader-0.3.3-cp312-cp312-manylinux_2_34_x86_64.manylinux_2_35_x86_64.whl:

Publisher: build-wheel.yaml on ABNER-1/fastsafetensor_3fs_reader

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file fastsafetensor_3fs_reader-0.3.3-cp311-cp311-manylinux_2_34_x86_64.manylinux_2_35_x86_64.whl.

File metadata

File hashes

Hashes for fastsafetensor_3fs_reader-0.3.3-cp311-cp311-manylinux_2_34_x86_64.manylinux_2_35_x86_64.whl
Algorithm Hash digest
SHA256 6474919195a26f0e3e680ca3c7d07f83137c1a37c2a79b5de54ffb1b61a1b0c6
MD5 6db30ecc3e696bfe805e35438626ac3c
BLAKE2b-256 e4a776dc2ceaffff90e4d4cbd30f505e1a84e3fdaf61a4d34649fbc49221667f

See more details on using hashes here.

Provenance

The following attestation bundles were made for fastsafetensor_3fs_reader-0.3.3-cp311-cp311-manylinux_2_34_x86_64.manylinux_2_35_x86_64.whl:

Publisher: build-wheel.yaml on ABNER-1/fastsafetensor_3fs_reader

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file fastsafetensor_3fs_reader-0.3.3-cp310-cp310-manylinux_2_34_x86_64.manylinux_2_35_x86_64.whl.

File metadata

File hashes

Hashes for fastsafetensor_3fs_reader-0.3.3-cp310-cp310-manylinux_2_34_x86_64.manylinux_2_35_x86_64.whl
Algorithm Hash digest
SHA256 3eafe7a533d6f89755ff32cf9f3e1ebed8b157a8a5784a7944e809afc3267a9f
MD5 9fcf9fc531f60e38f0eb9d75cbdb8e94
BLAKE2b-256 ec0d708755e8de9bb7399d1de786be9e3e36eb1abd4962049dff15a11ab62221

See more details on using hashes here.

Provenance

The following attestation bundles were made for fastsafetensor_3fs_reader-0.3.3-cp310-cp310-manylinux_2_34_x86_64.manylinux_2_35_x86_64.whl:

Publisher: build-wheel.yaml on ABNER-1/fastsafetensor_3fs_reader

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file fastsafetensor_3fs_reader-0.3.3-cp39-cp39-manylinux_2_34_x86_64.manylinux_2_35_x86_64.whl.

File metadata

File hashes

Hashes for fastsafetensor_3fs_reader-0.3.3-cp39-cp39-manylinux_2_34_x86_64.manylinux_2_35_x86_64.whl
Algorithm Hash digest
SHA256 3f03c5b81ab5639b0844f625fd116d7e0ed0d86c93337da639411ee765f1297d
MD5 4670ed5bb6f5ab9fc31100f19e5dbd3d
BLAKE2b-256 af7753d658a1f522ee6b83424be2599017feb396ded8eab57c504e1b3bd44fee

See more details on using hashes here.

Provenance

The following attestation bundles were made for fastsafetensor_3fs_reader-0.3.3-cp39-cp39-manylinux_2_34_x86_64.manylinux_2_35_x86_64.whl:

Publisher: build-wheel.yaml on ABNER-1/fastsafetensor_3fs_reader

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page