Skip to main content

A utility library for working with speech-to-text transcriptions.

Project description

stt-utils

PyPI version License: MIT

A utility library for working with word-level speech-to-text timestamps. The main goal is to simplify working with timestamps generated by openai's whisper model with "word" timestamp_granularities.

Features

  • Realign word timestamps with the full text
  • Merge several transcriptions together
  • (Optional dependency) Split long audio into smaller segments by the moments of silence.

Installation

pip install stt-utils

With audio tools support:

pip install "stt-utils[audio]"

Usage

Python API

Align word timestamps from openai whisper

from stt_utils import UnprocessedTranscription, Transcription

transcription = openai_client.audio.transcriptions.create(
    model="whisper-1",
    file=audio_file,
    response_format="verbose_json",
    timestamp_granularities=["word"],
)

unprocessed_transcription = UnprocessedTranscription(**transcription.model_dump())
aligned_transcription = Transcription.from_unprocessed_transcription(unprocessed_transcription)

aligned_transcription.dump_prevew()

Split an audio file in roughly 10 minute segments with splits in moments of silence.

from stt_utils.splitter import split_audio_on_silence
from pydub import AudioSegment

audio = AudioSegment.from_file("example_audio.mp3")

splitted = split_audio_on_silence(audio)

for i, segment in enumerate(splitted):
    segment.export(f"segment_{i}.wav")

Command Line Interface

Split audio files from the command line:

python -m stt_utils.splitter audio.mp3 [OPTIONS]

Options:

  • --segment-length FLOAT: Target segment length in seconds (default: 600)
  • --segment-delta FLOAT: Allowed deviation from segment length in seconds (default: 30)
  • --silence-thresh-delta INT: Silence threshold delta in dB (default: -16)
  • --min-silence-len FLOAT: Minimum silence length in seconds (default: 0.5)
  • --output-dir PATH: Output directory for split segments (default: current directory)

Example:

python -m stt_utils.splitter audio.mp3 --segment-length 600 --output-dir ./segments

Development

It is recomended to use uv toolset for development.

Testing

There are unittests available in the tests/ directory.

uv run pytest

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

stt_utils-0.1.1.tar.gz (6.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

stt_utils-0.1.1-py3-none-any.whl (8.0 kB view details)

Uploaded Python 3

File details

Details for the file stt_utils-0.1.1.tar.gz.

File metadata

  • Download URL: stt_utils-0.1.1.tar.gz
  • Upload date:
  • Size: 6.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.9 {"installer":{"name":"uv","version":"0.9.9"},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for stt_utils-0.1.1.tar.gz
Algorithm Hash digest
SHA256 86b64e1453e7c93223a56030f3e1d0972fa7c90230e7dc7a8e348e643950c198
MD5 74f8f2217bff170c154bef46b33965fc
BLAKE2b-256 0e7378ac2066daaec9e14708c4287ad3efc34f03ac062bb8a44d843a42d01827

See more details on using hashes here.

File details

Details for the file stt_utils-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: stt_utils-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 8.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.9 {"installer":{"name":"uv","version":"0.9.9"},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for stt_utils-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 74b8cbe453b55c622a2ac657d0d328c6c174478dc42862877c2e591d56733329
MD5 8199b6aaeaaa296651968d1f35bd4a36
BLAKE2b-256 c8ec91eb49701f96e3475260ca9eadeea4e840fbe9e1ac99a7c0c915510ac6b3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page