Optimized inference package for Meta's SAM-Audio model with VRAM-efficient lite mode
Project description
sam-audio-infer
PyPI: https://pypi.org/project/sam-audio-infer/
Inference-only package for SAM-Audio (Segment Anything for Audio) by Meta AI.
This is a lightweight, dependency-minimal repackaging focused solely on inference with VRAM-efficient lite mode. For training and the full research codebase, please visit the original SAM-Audio repository.
Features
- Inference-Only: Optimized for inference with
torch.inference_mode()(no grad overhead) - Lite Mode: Reduce VRAM usage by 62-78% by removing unused components
- Mixed Precision: Support for bfloat16/float16 inference
- 48kHz Audio: Native high-quality audio processing at 48kHz sample rate
- Auto-Chunking: Process long audio files without OOM errors
- Model Caching: Configurable cache directory with environment variable support
- Warmup Support: Pre-compile CUDA kernels for faster first inference
- Simple API: Easy-to-use Python API and CLI
Installation
# Using uv (recommended)
uv add sam-audio-infer
uv pip install git+https://github.com/facebookresearch/sam-audio.git
# Or using pip
pip install sam-audio-infer
pip install git+https://github.com/facebookresearch/sam-audio.git
For development:
git clone https://github.com/openmirlab/sam-audio-infer.git
cd sam-audio-infer
uv sync
Prerequisites
HuggingFace Access Required: SAM-Audio models are gated.
- Request access to the model checkpoints:
- Once accepted, authenticate with HuggingFace:
# Generate token at https://huggingface.co/settings/tokens huggingface-cli login # Or set environment variable export HF_TOKEN=hf_your_token_here
Quick Start
Python API
from sam_audio_infer import SamAudioInfer
# Load model (recommended settings, ~3 GB VRAM)
model = SamAudioInfer.from_pretrained(
"base", # Model size: "small", "base", or "large"
dtype="bfloat16", # Mixed precision (~50% VRAM savings)
enable_text_ranker=False, # +3 GB VRAM if enabled
enable_span_predictor=False, # +3 GB VRAM if enabled
)
# Separate audio
result = model.separate("song.wav", description="vocals")
result.save("vocals.wav", "accompaniment.wav")
Command Line
# Basic separation
sam-audio-infer separate song.wav -d "vocals" -o vocals.wav
# With residual output
sam-audio-infer separate song.wav -d "drums" -o drums.wav --residual other.wav
# Download model with warmup
sam-audio-infer download --model base --warmup
VRAM Usage
| Model | Full Mode | Lite Mode | Reduction |
|---|---|---|---|
| Base | 12.73 GB | 2.84 GB | 78% |
| Large | 16.18 GB | 6.15 GB | 62% |
Tested on RTX 4090 with bfloat16 precision
Documentation
- CLI Reference - Command line interface
- Python API - Python API reference
- Configuration - Models, precision, lite mode settings
- Architecture - How it works and optimization techniques
- Benchmarks - VRAM and performance benchmarks
- Troubleshooting - Common issues and solutions
Requirements
- Python >= 3.11
- PyTorch >= 2.0.0
- torchaudio >= 2.0.0
- CUDA-capable GPU with at least 4GB VRAM (lite + bfloat16)
Acknowledgments
This package stands on the shoulders of two important projects.
Original Research by Meta AI / Facebook Research
SAM-Audio (Segment Anything for Audio) is developed by Meta AI Research.
- Repository: github.com/facebookresearch/sam-audio
- Paper: Segment Anything for Audio
- HuggingFace: facebook/sam-audio-base
Lite Mode Optimization
The Lite Mode VRAM optimization technique used in this package is inspired by AudioGhost AI.
License
MIT License
Note: The underlying SAM-Audio model has its own license terms. Please refer to the official SAM-Audio repository for model usage terms.
Citation
If you use SAM Audio in your research, please cite the original paper:
@article{shi2025samaudio,
title={SAM Audio: Segment Anything in Audio},
author={Bowen Shi and Andros Tjandra and John Hoffman and Helin Wang and Yi-Chiao Wu and Luya Gao and Julius Richter and Matt Le and Apoorv Vyas and Sanyuan Chen and Christoph Feichtenhofer and Piotr Doll{\'a}r and Wei-Ning Hsu and Ann Lee},
year={2025},
url={https://arxiv.org/abs/2512.18099}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sam_audio_infer-0.1.1.tar.gz.
File metadata
- Download URL: sam_audio_infer-0.1.1.tar.gz
- Upload date:
- Size: 27.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a6378331e8f7f5e5d2d85c9c6d2470dc2a569dbe28e5d1a3c34d8f0855b996bc
|
|
| MD5 |
b5d92c44f09bac5d5b24f47000bd7f4a
|
|
| BLAKE2b-256 |
207943e1a4730bb322176c2b7fb418fe50e43fbdb3a65cb7c18d42ada866068c
|
Provenance
The following attestation bundles were made for sam_audio_infer-0.1.1.tar.gz:
Publisher:
publish.yml on openmirlab/sam-audio-infer
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
sam_audio_infer-0.1.1.tar.gz -
Subject digest:
a6378331e8f7f5e5d2d85c9c6d2470dc2a569dbe28e5d1a3c34d8f0855b996bc - Sigstore transparency entry: 782158066
- Sigstore integration time:
-
Permalink:
openmirlab/sam-audio-infer@be9e423e1a4f08249f8801fb4fdd1cd63c52ca98 -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/openmirlab
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@be9e423e1a4f08249f8801fb4fdd1cd63c52ca98 -
Trigger Event:
release
-
Statement type:
File details
Details for the file sam_audio_infer-0.1.1-py3-none-any.whl.
File metadata
- Download URL: sam_audio_infer-0.1.1-py3-none-any.whl
- Upload date:
- Size: 30.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b2ac92aa6b8c0c8b30ad4365123189b6e291474fefff6021c4ac2948dfd0b55b
|
|
| MD5 |
940e729fbd32af15569c36a8b8591331
|
|
| BLAKE2b-256 |
8357be0e1c044f5ac527f9e04491be9dece812ded85216664806ee0bd1467238
|
Provenance
The following attestation bundles were made for sam_audio_infer-0.1.1-py3-none-any.whl:
Publisher:
publish.yml on openmirlab/sam-audio-infer
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
sam_audio_infer-0.1.1-py3-none-any.whl -
Subject digest:
b2ac92aa6b8c0c8b30ad4365123189b6e291474fefff6021c4ac2948dfd0b55b - Sigstore transparency entry: 782158068
- Sigstore integration time:
-
Permalink:
openmirlab/sam-audio-infer@be9e423e1a4f08249f8801fb4fdd1cd63c52ca98 -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/openmirlab
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@be9e423e1a4f08249f8801fb4fdd1cd63c52ca98 -
Trigger Event:
release
-
Statement type: