Audio preprocessing toolkit for speech-to-text applications using ffmpeg
Project description
Speech Prep
Audio preprocessing toolkit for speech-to-text applications using FFmpeg.
Overview
Speech Prep is a Python package designed to prepare audio files for speech-to-text processing. It provides tools for silence detection and removal, speed adjustment, and format conversion - all essential steps for optimizing audio before transcription.
Features
- Silence Detection: Automatically detect silence periods in audio files
- Silence Removal: Remove leading/trailing silence to clean up recordings
- Speed Adjustment: Change playback speed while maintaining audio quality
- Format Conversion: Convert between different audio formats (MP3, WAV, FLAC, etc.)
- Clean API: Simple, intuitive interface with method chaining support
- FFmpeg Integration: Leverages the power and reliability of FFmpeg
Requirements
- Python 3.9+
- FFmpeg (must be installed and accessible via PATH)
Installation
# Install from PyPI (when published)
pip install speech-prep
# Or install from source
git clone https://github.com/dimdasci/speech-prep.git
cd speech-prep
uv sync # or pip install -e .
Quick Start
from speech_prep import SoundFile
from pathlib import Path
# Load an audio file
audio = SoundFile(Path("recording.wav"))
if audio:
print(f"Duration: {audio.duration:.2f} seconds")
print(f"Format: {audio.format}")
print(f"Silence periods detected: {len(audio.silence_periods)}")
# Clean up the audio for speech-to-text
cleaned = audio.strip(output_path=Path("recording_stripped.wav"))
faster = cleaned.speed(output_path=Path("recording_stripped_fast.wav"), speed_factor=1.2)
final = faster.convert(output_path=Path("clean.mp3"))
print(f"Processed file saved: {final.path}")
Usage Examples
Basic Operations
from speech_prep import SoundFile
from pathlib import Path
# Load audio file
audio = SoundFile(Path("interview.wav"))
# View audio information
print(audio) # Shows duration, format, file size, and silence periods
# Remove silence from beginning and end
cleaned = audio.strip(output_path=Path("interview_stripped.wav"))
# Remove only leading silence
cleaned = audio.strip(output_path=Path("interview_leading.wav"), trailing=False)
# Speed up audio by 50%
faster = audio.speed(output_path=Path("interview_fast.wav"), speed_factor=1.5)
# Convert format
mp3_file = audio.convert(output_path=Path("output.mp3"))
Processing Pipeline
from speech_prep import SoundFile
from pathlib import Path
def prepare_for_transcription(input_file: Path, output_file: Path):
"""Prepare audio file for speech-to-text processing."""
# Load the original file
audio = SoundFile(input_file)
if not audio:
return None
# Processing pipeline
stripped = audio.strip(output_path=input_file.with_stem(input_file.stem + "_stripped"))
faster = stripped.speed(output_path=input_file.with_stem(input_file.stem + "_stripped_fast"), speed_factor=1.1)
processed = faster.convert(output_path=output_file)
if processed:
print(f"Original duration: {audio.duration:.2f}s")
print(f"Processed duration: {processed.duration:.2f}s")
print(f"Time saved: {audio.duration - processed.duration:.2f}s")
return processed
# Use the pipeline
result = prepare_for_transcription(
Path("long_meeting.wav"),
Path("ready_for_stt.mp3")
)
Error Handling
from speech_prep import SoundFile, SpeechPrepError, FFmpegError
from pathlib import Path
try:
audio = SoundFile(Path("audio.wav"))
if audio:
result = audio.strip().speed(2.0)
print(f"Success: {result.path}")
else:
print("Failed to load audio file")
except FFmpegError as e:
print(f"FFmpeg error: {e}")
if e.stderr:
print(f"Details: {e.stderr}")
except SpeechPrepError as e:
print(f"Processing error: {e}")
Custom Parameters
from speech_prep import SoundFile
from pathlib import Path
# Custom silence detection settings
audio = SoundFile(
Path("audio.wav"),
noise_threshold_db=-40, # More sensitive silence detection
min_silence_duration=0.3 # Shorter minimum silence periods
)
# Custom output paths
cleaned = audio.strip(output_path=Path("custom_output.wav"))
# Custom conversion settings
mp3 = audio.convert(
output_path=Path("output.mp3"),
audio_bitrate="192k" # Custom bitrate
)
API Reference
SoundFile Class
Constructor
SoundFile(file_path, noise_threshold_db=-30, min_silence_duration=0.5)
Methods
strip(output_path, leading=True, trailing=True): Remove silencespeed(output_path, speed_factor): Adjust playback speedconvert(output_path, audio_bitrate=None): Convert format
Properties
path: Path to the audio fileduration: Duration in secondsformat: Audio formatfile_size: File size in bytessilence_periods: List of detected silence periodsmedian_silence: Median silence duration
Contributing
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
License
This project is licensed under the MIT License - see the LICENSE file for details.
Acknowledgments
- Built on top of the powerful FFmpeg multimedia framework
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file speech_prep-0.1.3.tar.gz.
File metadata
- Download URL: speech_prep-0.1.3.tar.gz
- Upload date:
- Size: 34.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f6481e7a9523ff163054e30bd16a4a6f4213bc3bfe28f18b9a47b8b808be5697
|
|
| MD5 |
f3739abfb0d331051b8bd850aebb3be6
|
|
| BLAKE2b-256 |
01ac6d27e1704f49570f8bd53ad2abb793f5e3fe778a27cd56c3a6cc01df9813
|
File details
Details for the file speech_prep-0.1.3-py3-none-any.whl.
File metadata
- Download URL: speech_prep-0.1.3-py3-none-any.whl
- Upload date:
- Size: 11.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8febd216b8f16930a3a7417beec0b88c8bcee69a10b309fa7b98718209ea9cc5
|
|
| MD5 |
e2f3cb477cc867a3ef5d48cddc97c252
|
|
| BLAKE2b-256 |
499e068aac8ff12a7d35a05f67fcb12b4ccabba2ad1dbb773e49495e199ee5a9
|