Skip to main content

Optimized inference package for Meta's SAM-Audio model with VRAM-efficient lite mode

Project description

sam-audio-infer

PyPI Python 3.11+ License: MIT

PyPI: https://pypi.org/project/sam-audio-infer/

Inference-only package for SAM-Audio (Segment Anything for Audio) by Meta AI.

This is a lightweight, dependency-minimal repackaging focused solely on inference with VRAM-efficient lite mode. For training and the full research codebase, please visit the original SAM-Audio repository.


Features

  • Inference-Only: Optimized for inference with torch.inference_mode() (no grad overhead)
  • Lite Mode: Reduce VRAM usage by 62-78% by removing unused components
  • Mixed Precision: Support for bfloat16/float16 inference
  • 48kHz Audio: Native high-quality audio processing at 48kHz sample rate
  • Auto-Chunking: Process long audio files without OOM errors
  • Model Caching: Configurable cache directory with environment variable support
  • Warmup Support: Pre-compile CUDA kernels for faster first inference
  • Simple API: Easy-to-use Python API and CLI

Installation

# Using uv (recommended)
uv add sam-audio-infer
uv pip install git+https://github.com/facebookresearch/sam-audio.git

# Or using pip
pip install sam-audio-infer
pip install git+https://github.com/facebookresearch/sam-audio.git

For development:

git clone https://github.com/openmirlab/sam-audio-infer.git
cd sam-audio-infer
uv sync

Prerequisites

HuggingFace Access Required: SAM-Audio models are gated.

  1. Request access to the model checkpoints:
  2. Once accepted, authenticate with HuggingFace:
    # Generate token at https://huggingface.co/settings/tokens
    huggingface-cli login
    # Or set environment variable
    export HF_TOKEN=hf_your_token_here
    

Quick Start

Python API

from sam_audio_infer import SamAudioInfer

# Load model (recommended settings, ~3 GB VRAM)
model = SamAudioInfer.from_pretrained(
    "base",                      # Model size: "small", "base", or "large"
    dtype="bfloat16",            # Mixed precision (~50% VRAM savings)
    enable_text_ranker=False,    # +3 GB VRAM if enabled
    enable_span_predictor=False, # +3 GB VRAM if enabled
)

# Separate audio
result = model.separate("song.wav", description="vocals")
result.save("vocals.wav", "accompaniment.wav")

Command Line

# Basic separation
sam-audio-infer separate song.wav -d "vocals" -o vocals.wav

# With residual output
sam-audio-infer separate song.wav -d "drums" -o drums.wav --residual other.wav

# Download model with warmup
sam-audio-infer download --model base --warmup

VRAM Usage

Model Full Mode Lite Mode Reduction
Base 12.73 GB 2.84 GB 78%
Large 16.18 GB 6.15 GB 62%

Tested on RTX 4090 with bfloat16 precision

Documentation

Requirements

  • Python >= 3.11
  • PyTorch >= 2.0.0
  • torchaudio >= 2.0.0
  • CUDA-capable GPU with at least 4GB VRAM (lite + bfloat16)

Acknowledgments

This package stands on the shoulders of two important projects.

Original Research by Meta AI / Facebook Research

SAM-Audio (Segment Anything for Audio) is developed by Meta AI Research.

Lite Mode Optimization

The Lite Mode VRAM optimization technique used in this package is inspired by AudioGhost AI.

License

MIT License

Note: The underlying SAM-Audio model has its own license terms. Please refer to the official SAM-Audio repository for model usage terms.

Citation

If you use SAM Audio in your research, please cite the original paper:

@article{shi2025samaudio,
    title={SAM Audio: Segment Anything in Audio},
    author={Bowen Shi and Andros Tjandra and John Hoffman and Helin Wang and Yi-Chiao Wu and Luya Gao and Julius Richter and Matt Le and Apoorv Vyas and Sanyuan Chen and Christoph Feichtenhofer and Piotr Doll{\'a}r and Wei-Ning Hsu and Ann Lee},
    year={2025},
    url={https://arxiv.org/abs/2512.18099}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sam_audio_infer-0.1.1.tar.gz (27.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sam_audio_infer-0.1.1-py3-none-any.whl (30.0 kB view details)

Uploaded Python 3

File details

Details for the file sam_audio_infer-0.1.1.tar.gz.

File metadata

  • Download URL: sam_audio_infer-0.1.1.tar.gz
  • Upload date:
  • Size: 27.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sam_audio_infer-0.1.1.tar.gz
Algorithm Hash digest
SHA256 a6378331e8f7f5e5d2d85c9c6d2470dc2a569dbe28e5d1a3c34d8f0855b996bc
MD5 b5d92c44f09bac5d5b24f47000bd7f4a
BLAKE2b-256 207943e1a4730bb322176c2b7fb418fe50e43fbdb3a65cb7c18d42ada866068c

See more details on using hashes here.

Provenance

The following attestation bundles were made for sam_audio_infer-0.1.1.tar.gz:

Publisher: publish.yml on openmirlab/sam-audio-infer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sam_audio_infer-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for sam_audio_infer-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 b2ac92aa6b8c0c8b30ad4365123189b6e291474fefff6021c4ac2948dfd0b55b
MD5 940e729fbd32af15569c36a8b8591331
BLAKE2b-256 8357be0e1c044f5ac527f9e04491be9dece812ded85216664806ee0bd1467238

See more details on using hashes here.

Provenance

The following attestation bundles were made for sam_audio_infer-0.1.1-py3-none-any.whl:

Publisher: publish.yml on openmirlab/sam-audio-infer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page