A utility library for working with speech-to-text transcriptions.
Project description
stt-utils
A utility library for working with word-level speech-to-text timestamps. The main goal is to simplify working with timestamps generated by openai's whisper model with "word" timestamp_granularities.
Features
- Realign word timestamps with the full text
- Merge several transcriptions together
- (Optional dependency) Split long audio into smaller segments by the moments of silence.
Installation
pip install stt-utils
With audio tools support:
pip install "stt-utils[audio]"
Usage
Python API
Align word timestamps from openai whisper
from stt_utils import UnprocessedTranscription, Transcription
transcription = openai_client.audio.transcriptions.create(
model="whisper-1",
file=audio_file,
response_format="verbose_json",
timestamp_granularities=["word"],
)
unprocessed_transcription = UnprocessedTranscription(**transcription.model_dump())
aligned_transcription = Transcription.from_unprocessed_transcription(unprocessed_transcription)
aligned_transcription.dump_prevew()
Split an audio file in roughly 10 minute segments with splits in moments of silence.
from stt_utils.splitter import split_audio_on_silence
from pydub import AudioSegment
audio = AudioSegment.from_file("example_audio.mp3")
splitted = split_audio_on_silence(audio)
for i, segment in enumerate(splitted):
segment.export(f"segment_{i}.wav")
Command Line Interface
Split audio files from the command line:
python -m stt_utils.splitter audio.mp3 [OPTIONS]
Options:
--segment-length FLOAT: Target segment length in seconds (default: 600)--segment-delta FLOAT: Allowed deviation from segment length in seconds (default: 30)--silence-thresh-delta INT: Silence threshold delta in dB (default: -16)--min-silence-len FLOAT: Minimum silence length in seconds (default: 0.5)--output-dir PATH: Output directory for split segments (default: current directory)
Example:
python -m stt_utils.splitter audio.mp3 --segment-length 600 --output-dir ./segments
Development
It is recomended to use uv toolset for development.
Testing
There are unittests available in the tests/ directory.
uv run pytest
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file stt_utils-0.1.1.tar.gz.
File metadata
- Download URL: stt_utils-0.1.1.tar.gz
- Upload date:
- Size: 6.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.9 {"installer":{"name":"uv","version":"0.9.9"},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
86b64e1453e7c93223a56030f3e1d0972fa7c90230e7dc7a8e348e643950c198
|
|
| MD5 |
74f8f2217bff170c154bef46b33965fc
|
|
| BLAKE2b-256 |
0e7378ac2066daaec9e14708c4287ad3efc34f03ac062bb8a44d843a42d01827
|
File details
Details for the file stt_utils-0.1.1-py3-none-any.whl.
File metadata
- Download URL: stt_utils-0.1.1-py3-none-any.whl
- Upload date:
- Size: 8.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.9 {"installer":{"name":"uv","version":"0.9.9"},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
74b8cbe453b55c622a2ac657d0d328c6c174478dc42862877c2e591d56733329
|
|
| MD5 |
8199b6aaeaaa296651968d1f35bd4a36
|
|
| BLAKE2b-256 |
c8ec91eb49701f96e3475260ca9eadeea4e840fbe9e1ac99a7c0c915510ac6b3
|