Audio preprocessing toolkit for speech-to-text applications using ffmpeg

These details have not been verified by PyPI

Project links

Project description

Speech Prep

Audio preprocessing toolkit for speech-to-text applications using FFmpeg.

Overview

Speech Prep is a Python package designed to prepare audio files for speech-to-text processing. It provides tools for silence detection and removal, speed adjustment, and format conversion - all essential steps for optimizing audio before transcription.

Features

Silence Detection: Automatically detect silence periods in audio files
Silence Removal: Remove leading/trailing silence to clean up recordings
Speed Adjustment: Change playback speed while maintaining audio quality
Format Conversion: Convert between different audio formats (MP3, WAV, FLAC, etc.)
Clean API: Simple, intuitive interface with method chaining support
FFmpeg Integration: Leverages the power and reliability of FFmpeg

Requirements

Python 3.9+
FFmpeg (must be installed and accessible via PATH)

Installation

# Install from PyPI (when published)
pip install speech-prep

# Or install from source
git clone https://github.com/dimdasci/speech-prep.git
cd speech-prep
uv sync  # or pip install -e .

Quick Start

from speech_prep import SoundFile
from pathlib import Path

# Load an audio file
audio = SoundFile(Path("recording.wav"))

if audio:
    print(f"Duration: {audio.duration:.2f} seconds")
    print(f"Format: {audio.format}")
    print(f"Silence periods detected: {len(audio.silence_periods)}")

    # Clean up the audio for speech-to-text
    cleaned = audio.strip(output_path=Path("recording_stripped.wav"))
    faster = cleaned.speed(output_path=Path("recording_stripped_fast.wav"), speed_factor=1.2)
    final = faster.convert(output_path=Path("clean.mp3"))

    print(f"Processed file saved: {final.path}")

Usage Examples

Basic Operations

from speech_prep import SoundFile
from pathlib import Path

# Load audio file
audio = SoundFile(Path("interview.wav"))

# View audio information
print(audio)  # Shows duration, format, file size, and silence periods

# Remove silence from beginning and end
cleaned = audio.strip(output_path=Path("interview_stripped.wav"))

# Remove only leading silence
cleaned = audio.strip(output_path=Path("interview_leading.wav"), trailing=False)

# Speed up audio by 50%
faster = audio.speed(output_path=Path("interview_fast.wav"), speed_factor=1.5)

# Convert format
mp3_file = audio.convert(output_path=Path("output.mp3"))

Processing Pipeline

from speech_prep import SoundFile
from pathlib import Path

def prepare_for_transcription(input_file: Path, output_file: Path):
    """Prepare audio file for speech-to-text processing."""
    # Load the original file
    audio = SoundFile(input_file)
    if not audio:
        return None
    # Processing pipeline
    stripped = audio.strip(output_path=input_file.with_stem(input_file.stem + "_stripped"))
    faster = stripped.speed(output_path=input_file.with_stem(input_file.stem + "_stripped_fast"), speed_factor=1.1)
    processed = faster.convert(output_path=output_file)
    if processed:
        print(f"Original duration: {audio.duration:.2f}s")
        print(f"Processed duration: {processed.duration:.2f}s")
        print(f"Time saved: {audio.duration - processed.duration:.2f}s")
    return processed

# Use the pipeline
result = prepare_for_transcription(
    Path("long_meeting.wav"),
    Path("ready_for_stt.mp3")
)

Error Handling

from speech_prep import SoundFile, SpeechPrepError, FFmpegError
from pathlib import Path

try:
    audio = SoundFile(Path("audio.wav"))
    if audio:
        result = audio.strip().speed(2.0)
        print(f"Success: {result.path}")
    else:
        print("Failed to load audio file")

except FFmpegError as e:
    print(f"FFmpeg error: {e}")
    if e.stderr:
        print(f"Details: {e.stderr}")

except SpeechPrepError as e:
    print(f"Processing error: {e}")

Custom Parameters

from speech_prep import SoundFile
from pathlib import Path

# Custom silence detection settings
audio = SoundFile(
    Path("audio.wav"),
    noise_threshold_db=-40,    # More sensitive silence detection
    min_silence_duration=0.3   # Shorter minimum silence periods
)

# Custom output paths
cleaned = audio.strip(output_path=Path("custom_output.wav"))

# Custom conversion settings
mp3 = audio.convert(
    output_path=Path("output.mp3"),
    audio_bitrate="192k"  # Custom bitrate
)

API Reference

SoundFile Class

Constructor

SoundFile(file_path, noise_threshold_db=-30, min_silence_duration=0.5)

Methods

strip(output_path, leading=True, trailing=True): Remove silence
speed(output_path, speed_factor): Adjust playback speed
convert(output_path, audio_bitrate=None): Convert format

Properties

path: Path to the audio file
duration: Duration in seconds
format: Audio format
file_size: File size in bytes
silence_periods: List of detected silence periods
median_silence: Median silence duration

Contributing

Fork the repository
Create your feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Built on top of the powerful FFmpeg multimedia framework

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.4

Jun 29, 2025

This version

0.1.3

Jun 29, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

speech_prep-0.1.3.tar.gz (34.2 kB view details)

Uploaded Jun 29, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

speech_prep-0.1.3-py3-none-any.whl (11.8 kB view details)

Uploaded Jun 29, 2025 Python 3

File details

Details for the file speech_prep-0.1.3.tar.gz.

File metadata

Download URL: speech_prep-0.1.3.tar.gz
Upload date: Jun 29, 2025
Size: 34.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.9.6

File hashes

Hashes for speech_prep-0.1.3.tar.gz
Algorithm	Hash digest
SHA256	`f6481e7a9523ff163054e30bd16a4a6f4213bc3bfe28f18b9a47b8b808be5697`
MD5	`f3739abfb0d331051b8bd850aebb3be6`
BLAKE2b-256	`01ac6d27e1704f49570f8bd53ad2abb793f5e3fe778a27cd56c3a6cc01df9813`

See more details on using hashes here.

File details

Details for the file speech_prep-0.1.3-py3-none-any.whl.

File metadata

Download URL: speech_prep-0.1.3-py3-none-any.whl
Upload date: Jun 29, 2025
Size: 11.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.9.6

File hashes

Hashes for speech_prep-0.1.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8febd216b8f16930a3a7417beec0b88c8bcee69a10b309fa7b98718209ea9cc5`
MD5	`e2f3cb477cc867a3ef5d48cddc97c252`
BLAKE2b-256	`499e068aac8ff12a7d35a05f67fcb12b4ccabba2ad1dbb773e49495e199ee5a9`

See more details on using hashes here.

speech-prep 0.1.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Speech Prep

Overview

Features

Requirements

Installation

Quick Start

Usage Examples

Basic Operations

Processing Pipeline

Error Handling

Custom Parameters

API Reference

SoundFile Class

Constructor

Methods

Properties

Contributing

License

Acknowledgments

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes