A Python library for streaming audio/video files using FFmpeg with automatic resampling and channel mixing

Project description

ffmpeg-audio

A Python library for processing audio/video files using FFmpeg with automatic resampling and channel mixing.

Features

Streaming audio reading: Stream large audio/video files in chunks without loading everything into memory
Segment reading: Read specific time segments from audio files in one operation
Automatic resampling: Automatically resamples audio to 16kHz (fixed)
Channel mixing: Automatically converts to mono channel
Format support: Supports all audio/video formats that FFmpeg supports (MP3, WAV, FLAC, Opus, MP4, etc.)
Time range support: Both streaming and reading support start time and duration parameters

Installation

pip install ffmpeg-audio

Note: This package requires FFmpeg to be installed on your system. Make sure FFmpeg is available in your PATH.

Configuration

Environment Variables

The library supports configuration through environment variables:

FFMPEG_STREAM_CHUNK_DURATION_SEC: Default chunk duration (in seconds) for streaming operations. If not set or invalid, defaults to 1200 seconds (20 minutes). The value must be a positive integer. Non-standard values (empty strings, non-numeric strings, negative numbers, or zero) will fall back to the default value.
FFMPEG_TIMEOUT_MS: Default timeout (in milliseconds) for read operations. If not set or invalid, defaults to 300000 milliseconds (5 minutes). The value must be a positive integer. Non-standard values will fall back to the default value.

Example:

# Set default chunk duration to 5 minutes (300 seconds)
export FFMPEG_STREAM_CHUNK_DURATION_SEC=300

# Use in Python
from ffmpeg_audio import FFmpegAudio

# Will use 300 seconds as default chunk duration
for chunk in FFmpegAudio.stream("audio.mp3"):
    # Process chunk
    pass

Quick Start

Streaming Audio

from ffmpeg_audio import FFmpegAudio
import numpy as np

# Stream entire audio file in chunks
for chunk in FFmpegAudio.stream("audio.mp3"):
    # chunk is a numpy array (float32, range -1.0 ~ 1.0)
    # Process chunk here
    print(f"Chunk shape: {chunk.shape}, dtype: {chunk.dtype}")

# Stream specific time range (from 10s, duration 5s)
for chunk in FFmpegAudio.stream("audio.mp3", start_ms=10000, duration_ms=5000):
    # Process chunk
    pass

# Stream with custom chunk size (1 minute chunks)
for chunk in FFmpegAudio.stream("audio.mp3", chunk_duration_sec=60):
    # Process chunk
    pass

Reading Audio Segments

from ffmpeg_audio import FFmpegAudio

# Read a specific time segment (from 10s to 15s)
audio_data = FFmpegAudio.read(
    file_path="audio.mp3",
    start_ms=10000,  # 10 seconds
    duration_ms=5000,  # 5 seconds
)

# Read from beginning (5 seconds from start)
audio_data = FFmpegAudio.read(
    file_path="audio.mp3",
    duration_ms=5000,  # 5 seconds from start
)

# Read entire file (no start_ms or duration_ms specified)
audio_data = FFmpegAudio.read(file_path="audio.mp3")

# audio_data is a numpy array (float32, range -1.0 ~ 1.0, 16kHz mono)
print(f"Audio shape: {audio_data.shape}, sample rate: {FFmpegAudio.SAMPLE_RATE} Hz")

API Reference

FFmpegAudio

Main class for processing audio/video files. All methods are static.

Constants:

FFmpegAudio.SAMPLE_RATE = 16000: Output sample rate (Hz)
FFmpegAudio.AUDIO_CHANNELS = 1: Output channel count (mono)

`FFmpegAudio.stream(file_path, start_ms=None, duration_ms=None, chunk_duration_sec=<default>)`

Stream audio file in chunks, yielding numpy arrays.

This method reads audio in chunks to minimize memory usage for large files. Each chunk is a numpy array of float32 samples in the range [-1.0, 1.0]. The generator continues until the file ends or the specified duration is reached.

Parameters:

file_path (str): Path to the audio/video file (supports all FFmpeg formats)
start_ms (int, optional): Start position in milliseconds. None means from file beginning. If < 0, will be auto-corrected to None with a warning.
duration_ms (int, optional): Total duration to read in milliseconds. None means read until end. If <= 0, will be auto-corrected to None with a warning.
chunk_duration_sec (int | None): Duration of each chunk in seconds. Defaults to 1200s (20 minutes) and can be configured via FFMPEG_STREAM_CHUNK_DURATION_SEC. Passing None (or omitting the argument) silently uses the default value, while values <= 0 keep triggering the warning-and-reset behavior.

Yields:

np.ndarray: Audio chunk as float32 array with shape (n_samples,). Values are normalized to [-1.0, 1.0] range.

Raises:

TypeError: If parameter types are invalid
ValueError: If file_path is empty or parameter values are invalid (after auto-correction):
- start_ms < 0 (auto-corrected to None)
- duration_ms <= 0 (auto-corrected to None)
FFmpegNotFoundError: If FFmpeg executable is not found in PATH
FileNotFoundError: If the input file does not exist
PermissionError: If file access is denied
UnsupportedFormatError: If file format is not supported or corrupted
FFmpegAudioError: For other FFmpeg processing errors

`FFmpegAudio.read(file_path, start_ms=None, duration_ms=None, timeout_ms=<default>)`

Read audio data from a file in one operation.

This method reads audio data into memory at once. If both start_ms and duration_ms are None, it reads the entire file. For large files or streaming use cases, consider using stream() instead.

The output format (16kHz mono float32) is optimized for speech processing and energy detection algorithms.

Parameters:

file_path (str): Path to audio/video file (supports all FFmpeg formats)
start_ms (int, optional): Start position in milliseconds. None means from beginning. If < 0, will be auto-corrected to None with a warning. If specified but duration_ms is None, reads from start_ms to end of file.
duration_ms (int, optional): Segment duration in milliseconds. None means read until end of file. If <= 0, will be auto-corrected to None with a warning. If start_ms is provided but duration_ms is None, reads from start_ms to end of file. If both are None, reads the entire file.
timeout_ms (int): Maximum processing time in milliseconds. Defaults to 300000ms (5 minutes), configurable via FFMPEG_TIMEOUT_MS environment variable. If <= 0, will be auto-corrected to default with a warning.

Returns:

np.ndarray: Audio data as float32 array with shape (n_samples,) where n_samples depends on the audio duration
- dtype: float32
- value range: [-1.0, 1.0]
- sample rate: SAMPLE_RATE (16000 Hz)

Raises:

TypeError: If parameter types are invalid
ValueError: If parameter values are invalid (after auto-correction):
- start_ms < 0 (auto-corrected to None)
- duration_ms <= 0 (auto-corrected to None)
- timeout_ms <= 0 (auto-corrected to default timeout)
FileNotFoundError: If the input file does not exist
FFmpegNotFoundError: If FFmpeg executable is not found in PATH
FFmpegAudioError: If FFmpeg processing fails or timeout is exceeded

Exceptions

`FFmpegNotFoundError`

Raised when FFmpeg executable is not found in system PATH.

This exception indicates that FFmpeg is either not installed or not accessible from the current environment. Users should install FFmpeg and ensure it's in PATH.

Attributes:

message: Human-readable error message describing the issue

`FFmpegAudioError`

General FFmpeg audio processing error.

Raised when FFmpeg fails for reasons other than file not found, permission denied, or unsupported format. Contains process return code and stderr for debugging.

Attributes:

message: Primary error message (required)
file_path: Path to the file that caused the error (optional)
returncode: FFmpeg process exit code (optional)
stderr: FFmpeg stderr output for debugging (optional)

`UnsupportedFormatError`

Raised when audio file format is unsupported or corrupted.

This exception indicates that FFmpeg cannot decode the file, either because the format is not supported or the file is corrupted/invalid.

Attributes:

message: Primary error message (required)
file_path: Path to the file that caused the error (optional)
returncode: FFmpeg process exit code (optional)
stderr: FFmpeg stderr output for debugging (optional)

Requirements

Python >= 3.10
FFmpeg (must be installed separately)
numpy >= 1.26.4

License

MIT License

Project details

Release history Release notifications | RSS feed

1.1.0

Mar 24, 2026

1.0.1

Mar 24, 2026

This version

1.0.0

Jan 11, 2026

0.3.0

Jan 6, 2026

0.2.0

Jan 1, 2026

0.1.4

Jan 1, 2026

0.1.3

Dec 30, 2025

0.1.2

Dec 30, 2025

0.1.1

Dec 30, 2025

0.1.0

Dec 30, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ffmpeg_audio-1.0.0.tar.gz (10.2 kB view details)

Uploaded Jan 11, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ffmpeg_audio-1.0.0-py3-none-any.whl (11.5 kB view details)

Uploaded Jan 11, 2026 Python 3

File details

Details for the file ffmpeg_audio-1.0.0.tar.gz.

File metadata

Download URL: ffmpeg_audio-1.0.0.tar.gz
Upload date: Jan 11, 2026
Size: 10.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for ffmpeg_audio-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`c098e81c0eefb2b18c5271f94a20f81a9412d941b174ab5299ec11f9af778ba6`
MD5	`6ef6f06c8bd4f2145ef279dc0a0c8ee7`
BLAKE2b-256	`eb3191e530803d27d421b38220b3731b185f8c27a1e7cbdaec60f603e463c767`

See more details on using hashes here.

File details

Details for the file ffmpeg_audio-1.0.0-py3-none-any.whl.

File metadata

Download URL: ffmpeg_audio-1.0.0-py3-none-any.whl
Upload date: Jan 11, 2026
Size: 11.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for ffmpeg_audio-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`34c629e60093a0c4e96b57d4f63ce2dbb3584234b3afb92917c2dd0fc1d6d8d3`
MD5	`fc70cce1c5667cf549d12a95dac9217a`
BLAKE2b-256	`ffb85c1233e820fffe2575f29294a24cb7a13871ecc0f7b722411069a8fbef7b`

See more details on using hashes here.

ffmpeg-audio 1.0.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

ffmpeg-audio

Features

Installation

Configuration

Environment Variables

Quick Start

Streaming Audio

Reading Audio Segments

API Reference

FFmpegAudio

FFmpegAudio.stream(file_path, start_ms=None, duration_ms=None, chunk_duration_sec=<default>)

FFmpegAudio.read(file_path, start_ms=None, duration_ms=None, timeout_ms=<default>)

Exceptions

FFmpegNotFoundError

FFmpegAudioError

UnsupportedFormatError

Requirements

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`FFmpegAudio.stream(file_path, start_ms=None, duration_ms=None, chunk_duration_sec=<default>)`

`FFmpegAudio.read(file_path, start_ms=None, duration_ms=None, timeout_ms=<default>)`

`FFmpegNotFoundError`

`FFmpegAudioError`

`UnsupportedFormatError`