Skip to main content

Distributed PyTorch file transfer for Baseten - Environment-aware, lock-free file transfer management

Project description

B10 Transfer

Accelerate cold starts by loading previous PyTorch compilation artifacts. This library enables caching of torch.compile() results across Baseten deployments, reducing compilation latencies by up to 5x.

Quick Start

For Standard Models (model.py)

from b10_transfer import load_compile_cache, save_compile_cache, OperationStatus

class Model:
    def load(self):
        # Load your model first
        self.model = YourModel().to("cuda")
        
        # Try to load existing compile cache
        cache_loaded = load_compile_cache()
        
        if cache_loaded == OperationStatus.ERROR:
            print("Run in eager mode, skipping torch compile")
        else:
            # Compile your model
            self.model = torch.compile(self.model, mode="max-autotune-no-cudagraphs")
            
            # Warm up with representative inputs to trigger compilation
            self.model("dummy input")
            self.model("another dummy input")
            
            # Save cache if it was newly created
            if cache_loaded != OperationStatus.SUCCESS:
                save_compile_cache()

For vLLM Custom Servers

Add to your config.yaml:

requirements:
  - b10-transfer

start_command: "b10-compile-cache & vllm serve ..."

The b10-compile-cache CLI tool automatically handles cache loading and saving for vLLM deployments.

Requirements

Add to your config.yaml:

requirements:
  - b10-transfer

Note: Requires b10cache enabled in your Baseten environment.

API Reference

Core Functions

load_compile_cache() -> OperationStatus

Load previously saved compilation cache for the current model environment.

Returns:

  • OperationStatus.SUCCESS → Cache successfully loaded
  • OperationStatus.SKIPPED → Cache already exists locally
  • OperationStatus.ERROR → General errors (b10fs unavailable, validation failed)
  • OperationStatus.DOES_NOT_EXIST → No cache file found for this environment

save_compile_cache() -> OperationStatus

Save the current model's torch compilation cache for future deployments.

Returns:

  • OperationStatus.SUCCESS → Cache successfully saved
  • OperationStatus.SKIPPED → Cache already exists in shared directory
  • OperationStatus.ERROR → General errors (insufficient space, validation failed)

save_vllm_compile_cache() -> None

Specialized function for vLLM deployments that:

  1. Attempts to load existing cache first
  2. Waits for vLLM server readiness
  3. Automatically saves cache after compilation

Utility Functions

clear_local_cache() -> bool

Clear the local PyTorch compilation cache directory.

Returns:

  • True → Cache cleared successfully or didn't exist
  • False → Failed to clear cache

get_cache_info() -> Dict[str, Any]

Get comprehensive information about current cache state.

Returns:

{
    "environment_key": str,           # Unique environment identifier
    "local_cache_exists": bool,       # Local torch cache status
    "b10fs_enabled": bool,           # B10FS availability
    "b10fs_cache_exists": bool,      # Remote cache status
    "local_cache_size_mb": float,    # Local cache size (if exists)
    "b10fs_cache_size_mb": float     # Remote cache size (if exists)
}

list_available_caches() -> Dict[str, Any]

List all available cache files with metadata.

Returns:

{
    "caches": [                      # List of cache files
        {
            "filename": str,         # Cache file name
            "environment_key": str,  # Environment identifier  
            "size_mb": float,        # File size in MB
            "is_current_environment": bool,  # Matches current env
            "created_time": float    # Creation timestamp
        }
    ],
    "current_environment": str,      # Current environment key
    "total_caches": int,            # Number of cache files
    "current_cache_exists": bool,   # Current env has cache
    "b10fs_enabled": bool          # B10FS availability
}

Constants

OperationStatus Enum

Status codes returned by cache operations:

  • OperationStatus.SUCCESS → Operation completed successfully
  • OperationStatus.ERROR → Operation failed due to error
  • OperationStatus.DOES_NOT_EXIST → Cache file not found (load operations only)
  • OperationStatus.SKIPPED → Operation not needed (cache already exists)

Exceptions

CacheError

Base exception for cache operations.

CacheValidationError

Raised when path validation or security checks fail.

CacheOperationInterrupted

Raised when operations are stopped due to insufficient disk space.

Configuration

The library automatically configures itself, but you can override defaults:

# Cache directories
export TORCHINDUCTOR_CACHE_DIR="/tmp/torchinductor_$(whoami)"
export B10FS_CACHE_DIR="/cache/model/compile_cache"
export LOCAL_WORK_DIR="/app"

# Cache limits
export MAX_CACHE_SIZE_MB="1024"        # 1GB max archive size
export MAX_CONCURRENT_SAVES="50"       # Concurrent save operations

# Required for functionality
export BASETEN_FS_ENABLED="1"

How It Works

Environment-Specific Caching

Caches are automatically keyed by GPU environment to ensure compatibility:

  • GPU Device Name: e.g., "NVIDIA GeForce RTX 4090"
  • CUDA Version: e.g., "12.1"

This ensures cached artifacts work correctly across similar hardware configurations.

Atomic Operations

  1. Load: B10FS → local temp → extract to torch cache directory
  2. Save: Compress torch cache → B10FS temp → atomic rename
  3. Space Monitoring: Operations interrupted if disk space insufficient

Debugging

# Enable debug logging
import logging
logging.getLogger('b10_transfer').setLevel(logging.DEBUG)

# Check cache status
info = b10_transfer.get_cache_info()
print(f"Environment: {info['environment_key']}")
print(f"Local cache: {info['local_cache_exists']}")
print(f"Remote cache: {info['b10fs_cache_exists']}")

# List all available caches
caches = b10_transfer.list_available_caches()
print(f"Total caches: {caches['total_caches']}")
for cache in caches['caches']:
    print(f"  {cache['filename']} ({cache['size_mb']:.1f} MB)")

Baseten Documentation

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

b10_transfer-0.3.16.tar.gz (26.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

b10_transfer-0.3.16-py3-none-any.whl (31.7 kB view details)

Uploaded Python 3

File details

Details for the file b10_transfer-0.3.16.tar.gz.

File metadata

  • Download URL: b10_transfer-0.3.16.tar.gz
  • Upload date:
  • Size: 26.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.2.0 CPython/3.11.13 Linux/5.15.0-1048-gke

File hashes

Hashes for b10_transfer-0.3.16.tar.gz
Algorithm Hash digest
SHA256 56a87320cbd870224048bccd8dc8fe38882b1a0511a1c8e29b31107d119ff40a
MD5 3016790b12d65898f3830eb21edb6427
BLAKE2b-256 b8f989000b9154f26f8512fbd2f8dcb6fae6a1aecc81910f2241624173626418

See more details on using hashes here.

File details

Details for the file b10_transfer-0.3.16-py3-none-any.whl.

File metadata

  • Download URL: b10_transfer-0.3.16-py3-none-any.whl
  • Upload date:
  • Size: 31.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.2.0 CPython/3.11.13 Linux/5.15.0-1048-gke

File hashes

Hashes for b10_transfer-0.3.16-py3-none-any.whl
Algorithm Hash digest
SHA256 2ae7d9a2b31c4734d1064d2edbc5da424f5e7ea9b79894096902287c8f48facc
MD5 7cd8cd63b82f430b0571b6c72dadb286
BLAKE2b-256 4671a827c416023534a626b517fc94263bdcb5c59d60852d18e2863ea9e4e38a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page