Skip to main content

A Python library for streaming audio/video files using FFmpeg with automatic resampling and channel mixing

Project description

ffmpeg-audio

A Python library for processing audio/video files using FFmpeg with automatic resampling and channel mixing.

Features

  • Streaming audio reading: Stream large audio/video files in chunks without loading everything into memory
  • Segment reading: Read specific time segments from audio files in one operation
  • Automatic resampling: Automatically resamples audio to 16kHz (fixed)
  • Channel mixing: Automatically converts to mono channel
  • Format support: Supports all audio/video formats that FFmpeg supports (MP3, WAV, FLAC, Opus, MP4, etc.)
  • Time range support: Both streaming and reading support start time and duration parameters

Installation

pip install ffmpeg-audio

Note: This package requires FFmpeg to be installed on your system. Make sure FFmpeg is available in your PATH.

Quick Start

Streaming Audio

from ffmpeg_audio import FFmpegAudio
import numpy as np

# Stream entire audio file in chunks
for chunk in FFmpegAudio.stream("audio.mp3"):
    # chunk is a numpy array (float32, range -1.0 ~ 1.0)
    # Process chunk here
    print(f"Chunk shape: {chunk.shape}, dtype: {chunk.dtype}")

# Stream specific time range (from 10s, duration 5s)
for chunk in FFmpegAudio.stream("audio.mp3", start_ms=10000, duration_ms=5000):
    # Process chunk
    pass

# Stream with custom chunk size (1 minute chunks)
for chunk in FFmpegAudio.stream("audio.mp3", chunk_duration_sec=60):
    # Process chunk
    pass

Reading Audio Segments

from ffmpeg_audio import FFmpegAudio

# Read a specific time segment (from 10s to 15s)
audio_data = FFmpegAudio.read(
    file_path="audio.mp3",
    start_ms=10000,  # 10 seconds
    duration_ms=5000,  # 5 seconds
)

# Read from beginning (start_ms defaults to 0)
audio_data = FFmpegAudio.read(
    file_path="audio.mp3",
    duration_ms=5000,  # 5 seconds from start
)

# audio_data is a numpy array (float32, range -1.0 ~ 1.0, 16kHz mono)
print(f"Audio shape: {audio_data.shape}, sample rate: {FFmpegAudio.SAMPLE_RATE} Hz")

API Reference

FFmpegAudio

Main class for processing audio/video files. All methods are static.

Constants:

  • FFmpegAudio.SAMPLE_RATE = 16000: Output sample rate (Hz)
  • FFmpegAudio.AUDIO_CHANNELS = 1: Output channel count (mono)
  • FFmpegAudio.STREAM_CHUNK_DURATION_SEC = 1200: Default chunk duration for streaming (seconds)

FFmpegAudio.stream(file_path, chunk_duration_sec=None, start_ms=None, duration_ms=None)

Stream audio file in chunks, yielding numpy arrays.

This method reads audio in chunks to minimize memory usage for large files. Each chunk is a numpy array of float32 samples in the range [-1.0, 1.0]. The generator continues until the file ends or the specified duration is reached.

Parameters:

  • file_path (str): Path to the audio/video file (supports all FFmpeg formats)
  • chunk_duration_sec (int, optional): Duration of each chunk in seconds. Defaults to STREAM_CHUNK_DURATION_SEC (1200s = 20 minutes). Must be > 0 if provided.
  • start_ms (int, optional): Start position in milliseconds. None means from file beginning. If None but duration_ms is provided, defaults to 0.
  • duration_ms (int, optional): Total duration to read in milliseconds. None means read until end. If specified, reading stops when this duration is reached.

Yields:

  • np.ndarray: Audio chunk as float32 array with shape (n_samples,). Values are normalized to [-1.0, 1.0] range.

Raises:

  • TypeError: If parameter types are invalid
  • ValueError: If file_path is empty or parameter values are invalid
  • FFmpegNotFoundError: If FFmpeg executable is not found in PATH
  • FileNotFoundError: If the input file does not exist
  • PermissionError: If file access is denied
  • UnsupportedFormatError: If file format is not supported or corrupted
  • FFmpegAudioError: For other FFmpeg processing errors

FFmpegAudio.read(file_path, start_ms=None, duration_ms=None, timeout_ms=300000)

Read a specific time segment from an audio file in one operation.

This method reads the entire segment into memory at once, suitable for small segments or when the full segment is needed immediately. For large files or streaming use cases, consider using stream() instead.

The output format (16kHz mono float32) is optimized for speech processing and energy detection algorithms.

Parameters:

  • file_path (str): Path to audio/video file (supports all FFmpeg formats)
  • start_ms (int, optional): Start position in milliseconds. None means from beginning. If None but duration_ms is provided, defaults to 0.
  • duration_ms (int, optional): Segment duration in milliseconds. Must be provided (cannot be None together with start_ms). If start_ms is provided, duration_ms is required.
  • timeout_ms (int, optional): Maximum processing time in milliseconds. Defaults to 300000 (5 minutes). Set to None to disable timeout (not recommended for production).

Returns:

  • np.ndarray: Audio segment as float32 array with shape (n_samples,) where n_samples = duration_ms * SAMPLE_RATE / 1000
    • dtype: float32
    • value range: [-1.0, 1.0]
    • sample rate: SAMPLE_RATE (16000 Hz)

Raises:

  • TypeError: If parameter types are invalid
  • ValueError: If parameter values are invalid:
    • start_ms < 0
    • duration_ms <= 0
    • timeout_ms <= 0
    • Both start_ms and duration_ms are None
    • start_ms is provided but duration_ms is None
  • FileNotFoundError: If the input file does not exist
  • FFmpegNotFoundError: If FFmpeg executable is not found in PATH
  • FFmpegAudioError: If FFmpeg processing fails or timeout is exceeded

Exceptions

FFmpegNotFoundError

Raised when FFmpeg executable is not found in system PATH.

This exception indicates that FFmpeg is either not installed or not accessible from the current environment. Users should install FFmpeg and ensure it's in PATH.

Attributes:

  • message: Human-readable error message describing the issue

FFmpegAudioError

General FFmpeg audio processing error.

Raised when FFmpeg fails for reasons other than file not found, permission denied, or unsupported format. Contains process return code and stderr for debugging.

Attributes:

  • message: Primary error message (required)
  • file_path: Path to the file that caused the error (optional)
  • returncode: FFmpeg process exit code (optional)
  • stderr: FFmpeg stderr output for debugging (optional)

UnsupportedFormatError

Raised when audio file format is unsupported or corrupted.

This exception indicates that FFmpeg cannot decode the file, either because the format is not supported or the file is corrupted/invalid.

Attributes:

  • message: Primary error message (required)
  • file_path: Path to the file that caused the error (optional)
  • returncode: FFmpeg process exit code (optional)
  • stderr: FFmpeg stderr output for debugging (optional)

Requirements

  • Python >= 3.10
  • FFmpeg (must be installed separately)
  • numpy >= 1.26.4

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ffmpeg_audio-0.1.2.tar.gz (9.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ffmpeg_audio-0.1.2-py3-none-any.whl (4.4 kB view details)

Uploaded Python 3

File details

Details for the file ffmpeg_audio-0.1.2.tar.gz.

File metadata

  • Download URL: ffmpeg_audio-0.1.2.tar.gz
  • Upload date:
  • Size: 9.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for ffmpeg_audio-0.1.2.tar.gz
Algorithm Hash digest
SHA256 c32f6f752957667d5e9f8583b12bd004cfab74e03adc426977a9831732b96104
MD5 f31306ef2feafe56142ed2e0a01341a1
BLAKE2b-256 3b9d1d5e0b8395c849d706c6ac5ca2d54ca7f02347d342431d03a0e3cdd3d7e8

See more details on using hashes here.

File details

Details for the file ffmpeg_audio-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: ffmpeg_audio-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 4.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for ffmpeg_audio-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 8279e4902a64b26690f6a9be5e41f9a055fe17b84bfd6f41e6477b866a4c745f
MD5 b8c8509bf6c032cf2fc80e06f5a88769
BLAKE2b-256 47fc7764914fa76b26c5863518a4777796613e018ea34f396624e7587aed2e93

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page