Skip to main content

Optimized inference package for Meta's SAM-Audio model with VRAM-efficient lite mode

Project description

sam-audio-infer

PyPI Python 3.11+ License: MIT

PyPI: https://pypi.org/project/sam-audio-infer/

Inference-only package for SAM-Audio (Segment Anything for Audio) by Meta AI.

This is a lightweight, dependency-minimal repackaging focused solely on inference with VRAM-efficient lite mode. For training and the full research codebase, please visit the original SAM-Audio repository.


Features

  • Inference-Only: Optimized for inference with torch.inference_mode() (no grad overhead)
  • Lite Mode: Reduce VRAM usage by 62-78% by removing unused components
  • Mixed Precision: Support for bfloat16/float16 inference
  • 48kHz Audio: Native high-quality audio processing at 48kHz sample rate
  • Auto-Chunking: Process long audio files without OOM errors
  • Model Caching: Configurable cache directory with environment variable support
  • Warmup Support: Pre-compile CUDA kernels for faster first inference
  • Simple API: Easy-to-use Python API and CLI

Installation

# Using uv (recommended)
uv add sam-audio-infer
uv pip install git+https://github.com/facebookresearch/sam-audio.git

# Or using pip
pip install sam-audio-infer
pip install git+https://github.com/facebookresearch/sam-audio.git

For development:

git clone https://github.com/openmirlab/sam-audio-infer.git
cd sam-audio-infer
uv sync

Prerequisites

HuggingFace Access Required: SAM-Audio models are gated.

  1. Request access to the model checkpoints:
  2. Once accepted, authenticate with HuggingFace:
    # Generate token at https://huggingface.co/settings/tokens
    huggingface-cli login
    # Or set environment variable
    export HF_TOKEN=hf_your_token_here
    

Quick Start

Python API

from sam_audio_infer import SamAudioInfer

# Load model with lite mode (recommended, ~3 GB VRAM)
model = SamAudioInfer.from_pretrained(
    "base",              # Model size: "small", "base", or "large"
    lite_mode=True,      # Remove unused components (~40% VRAM savings)
    dtype="bfloat16",    # Mixed precision (~50% additional savings)
)

# Separate audio
result = model.separate("song.wav", description="vocals")
result.save("vocals.wav", "accompaniment.wav")

Command Line

# Basic separation
sam-audio-infer separate song.wav -d "vocals" -o vocals.wav

# With residual output
sam-audio-infer separate song.wav -d "drums" -o drums.wav --residual other.wav

# Download model with warmup
sam-audio-infer download --model base --warmup

VRAM Usage

Model Full Mode Lite Mode Reduction
Base 12.73 GB 2.84 GB 78%
Large 16.18 GB 6.15 GB 62%

Tested on RTX 4090 with bfloat16 precision

Documentation

Requirements

  • Python >= 3.11
  • PyTorch >= 2.0.0
  • torchaudio >= 2.0.0
  • CUDA-capable GPU with at least 4GB VRAM (lite + bfloat16)

Acknowledgments

This package stands on the shoulders of two important projects.

Original Research by Meta AI / Facebook Research

SAM-Audio (Segment Anything for Audio) is developed by Meta AI Research.

Lite Mode Optimization

The Lite Mode VRAM optimization technique used in this package is inspired by AudioGhost AI.

License

MIT License

Note: The underlying SAM-Audio model has its own license terms. Please refer to the official SAM-Audio repository for model usage terms.

Citation

If you use SAM Audio in your research, please cite the original paper:

@article{shi2025samaudio,
    title={SAM Audio: Segment Anything in Audio},
    author={Bowen Shi and Andros Tjandra and John Hoffman and Helin Wang and Yi-Chiao Wu and Luya Gao and Julius Richter and Matt Le and Apoorv Vyas and Sanyuan Chen and Christoph Feichtenhofer and Piotr Doll{\'a}r and Wei-Ning Hsu and Ann Lee},
    year={2025},
    url={https://arxiv.org/abs/2512.18099}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sam_audio_infer-0.1.0.tar.gz (27.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sam_audio_infer-0.1.0-py3-none-any.whl (30.1 kB view details)

Uploaded Python 3

File details

Details for the file sam_audio_infer-0.1.0.tar.gz.

File metadata

  • Download URL: sam_audio_infer-0.1.0.tar.gz
  • Upload date:
  • Size: 27.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sam_audio_infer-0.1.0.tar.gz
Algorithm Hash digest
SHA256 4fe5e94e0487d62491fe338e609efa3254bd76dfcb1b3e3793a3470043ad6365
MD5 676c553ff7a5e3cb0741c8a220a7deeb
BLAKE2b-256 018fdd1c9566620f8ea3800e3c60c5888d8a30013d197be16a6b503073590850

See more details on using hashes here.

Provenance

The following attestation bundles were made for sam_audio_infer-0.1.0.tar.gz:

Publisher: publish.yml on openmirlab/sam-audio-infer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sam_audio_infer-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for sam_audio_infer-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 632092cc0214c1e8c06c162c37f28dded95b75c6bcd40ffdd47a60c9a1925c07
MD5 0ffdb3f93458fb6ddf8b295167a260ad
BLAKE2b-256 13466975ebb5970dab8d81888c1dbc559ab83559dc883da68e725cc80a6f6cbc

See more details on using hashes here.

Provenance

The following attestation bundles were made for sam_audio_infer-0.1.0-py3-none-any.whl:

Publisher: publish.yml on openmirlab/sam-audio-infer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page