Skip to main content

Audio preprocessing toolkit for speech-to-text applications using ffmpeg

Project description

Speech Prep

Audio preprocessing toolkit for speech-to-text applications using FFmpeg.

Overview

Speech Prep is a Python package designed to prepare audio files for speech-to-text processing. It provides tools for silence detection and removal, speed adjustment, and format conversion - all essential steps for optimizing audio before transcription.

Features

  • Silence Detection: Automatically detect silence periods in audio files
  • Silence Removal: Remove leading/trailing silence to clean up recordings
  • Speed Adjustment: Change playback speed while maintaining audio quality
  • Format Conversion: Convert between different audio formats (MP3, WAV, FLAC, etc.)
  • Clean API: Simple, intuitive interface with method chaining support
  • FFmpeg Integration: Leverages the power and reliability of FFmpeg

Requirements

  • Python 3.9+
  • FFmpeg (must be installed and accessible via PATH)

Installation

# Install from PyPI (when published)
pip install speech-prep

# Or install from source
git clone https://github.com/dimdasci/speech-prep.git
cd speech-prep
uv sync  # or pip install -e .

Quick Start

from speech_prep import SoundFile, AudioFormat
from pathlib import Path

# Load an audio file
audio = SoundFile(Path("recording.wav"))

if audio:
    print(audio)  # Shows duration, format, file size, and silence periods

    # Clean up the audio for speech-to-text
    cleaned = audio.strip(output_path=Path("recording_stripped.wav"))
    faster = cleaned.speed(output_path=Path("recording_stripped_fast.wav"), speed_factor=1.2)
    final = faster.convert(output_path=Path("clean.mp3", target_format=AudioFormat.MP3))

    print(f"Processed file saved: {final.path}")

Usage Examples

Basic Operations

from speech_prep import SoundFile, AudioFormat
from pathlib import Path

# Load audio file
audio = SoundFile(Path("interview.wav"))

# View audio information
print(audio)  # Shows duration, format, file size, and silence periods

# Remove silence from beginning and end
cleaned = audio.strip(output_path=Path("interview_stripped.wav"))

# Remove only leading silence
cleaned = audio.strip(output_path=Path("interview_leading.wav"), trailing=False)

# Speed up audio by 50%
faster = audio.speed(output_path=Path("interview_fast.wav"), speed_factor=1.5)

# Convert format
mp3_file = audio.convert(output_path=Path("output.mp3"), target_format=AudioFormat.MP3)

Processing Pipeline

from speech_prep import AudioFormat, SoundFile
from pathlib import Path

def prepare_for_transcription(input_file: Path, output_file: Path):
    """Prepare audio file for speech-to-text processing."""

    # Load the original file
    audio = SoundFile(input_file)
    if not audio:
        return None
    # Processing pipeline
    stripped = audio.strip(output_path=input_file.with_stem(input_file.stem + "_stripped"))
    faster = stripped.speed(output_path=input_file.with_stem(input_file.stem + "_stripped_fast"), speed_factor=1.1)
    processed = faster.convert(output_path=output_file, target_format=AudioFormat.MP3)
    if processed:
        print(f"Original duration: {audio.duration:.2f}s")
        print(f"Processed duration: {processed.duration:.2f}s")
        print(f"Time saved: {audio.duration - processed.duration:.2f}s")
    return processed

# Use the pipeline
result = prepare_for_transcription(
    Path("long_meeting.wav"),
    Path("ready_for_stt.mp3")
)

Error Handling

from speech_prep import SoundFile, SpeechPrepError, FFmpegError
from pathlib import Path

try:
    audio = SoundFile(Path("audio.wav"))
    if audio:
        result = audio.strip().speed(2.0)
        print(f"Success: {result.path}")
    else:
        print("Failed to load audio file")

except FFmpegError as e:
    print(f"FFmpeg error: {e}")
    if e.stderr:
        print(f"Details: {e.stderr}")

except SpeechPrepError as e:
    print(f"Processing error: {e}")

Custom Parameters

from speech_prep import SoundFile
from pathlib import Path

# Custom silence detection settings
audio = SoundFile(
    Path("audio.wav"),
    noise_threshold_db=-40,    # More sensitive silence detection
    min_silence_duration=0.3   # Shorter minimum silence periods
)

# Custom output paths
cleaned = audio.strip(output_path=Path("custom_output.wav"))

# Custom conversion settings
from speech_prep import AudioFormat
mp3 = audio.convert(
    output_path=Path("output.mp3"),
    target_format=AudioFormat.MP3,
    audio_bitrate="192k"  # Custom bitrate
)

API Reference

SoundFile Class

Constructor

SoundFile(file_path, noise_threshold_db=-30, min_silence_duration=0.5)

Methods

  • strip(output_path, leading=True, trailing=True): Remove silence
  • speed(output_path, speed_factor): Adjust playback speed
  • convert(output_path, target_format, audio_bitrate=None): Convert format

Properties

  • path: Path to the audio file
  • duration: Duration in seconds
  • format: Audio format (AudioFormat enum)
  • file_size: File size in bytes
  • silence_periods: List of detected silence periods
  • median_silence: Median silence duration

AudioFormat Enum

The AudioFormat enum represents supported audio formats:

from speech_prep import AudioFormat

# Available formats
AudioFormat.MP3   # MP3 format
AudioFormat.WAV   # WAV format
AudioFormat.FLAC  # FLAC format
AudioFormat.AAC   # AAC format
AudioFormat.OGG   # OGG format
AudioFormat.M4A   # M4A format
AudioFormat.UNKNOWN  # Unknown/unsupported format

Contributing

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

  • Built on top of the powerful FFmpeg multimedia framework

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

speech_prep-0.1.4.tar.gz (55.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

speech_prep-0.1.4-py3-none-any.whl (12.6 kB view details)

Uploaded Python 3

File details

Details for the file speech_prep-0.1.4.tar.gz.

File metadata

  • Download URL: speech_prep-0.1.4.tar.gz
  • Upload date:
  • Size: 55.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.23

File hashes

Hashes for speech_prep-0.1.4.tar.gz
Algorithm Hash digest
SHA256 8887d319124f8c41e2fd1e55272b74d7425c3c0ed283cde5dc9e63be2c2234b2
MD5 414a8bdf08c737e7c2c4cd44ff8f5b19
BLAKE2b-256 00235adda32b2ae95969adbec2d8ff7f578737bf146c21e4be3cb554226a1f1b

See more details on using hashes here.

File details

Details for the file speech_prep-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: speech_prep-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 12.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.23

File hashes

Hashes for speech_prep-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 db9117d15346cc029dd38ede1aa7ba2c00e89375458cdcadc5455dbd632e062f
MD5 fb2791409734269427a2baf6a4c3f286
BLAKE2b-256 610bc0cc005f7e4e7ddc2f9ea08908e6c86620ac259c526052867142774a5d22

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page