Stabilizing timestamps of OpenAI's Whisper outputs down to word-level.

Project description

Stabilizing Timestamps for Whisper

Description

This script modifies and adds more robust decoding logic on top of OpenAI's Whisper to produce more accurate segment-level timestamps and obtain to word-level timestamps with extra inference.

TODO

Add function to stabilize with multiple inferences
Add word timestamping (previously only token based)

Setup

Option 1: Install Whisper+stable-ts (one line)

pip install git+https://github.com/jianfch/stable-ts.git

Option 2: Install Whisper (repo) and stable-ts (PyPI) separately

Install Whisper
Check if Whisper is installed correctly by running a quick test
Install stable-ts

import whisper
model = whisper.load_model('base')
assert model.transcribe('audio.mp3').get('segments')

pip install stable-ts

Command-line usage

Transcribe audio then save result as JSON file.

stable-ts audio.mp3 -o audio.json

Processing JSON file of the results into ASS.

stable-ts audio.json -o audio.ass

Transcribe multiple audio files then process the results directly into SRT files.

stable-ts audio1.mp3 audio2.mp3 audio3.mp3 -o audio1.srt audio2.srt audio3.srt

Show all available arguments and help.

stable-ts -h

Python usage

import stable_whisper

model = stable_whisper.load_model('base')
# modified model should run just like the regular model but accepts additional parameters
results = model.transcribe('audio.mp3')

# word-level
stable_whisper.results_to_word_srt(results, 'audio.srt')
# sentence/phrase-level
stable_whisper.results_to_sentence_srt(results, 'audio.srt')

https://user-images.githubusercontent.com/28970749/202782436-0d56140b-5d52-4f33-b32b-317a19ad32ca.mp4

# sentence/phrase-level & word-level
stable_whisper.results_to_sentence_word_ass(results, 'audio.ass')

https://user-images.githubusercontent.com/28970749/202782412-dfa027f8-7073-4023-8ce5-285a2c26c03f.mp4

Additional Info

Although timestamps are chronological, they can still very inaccurate depending on the model, audio, and parameters.
To produce production ready word-level results, the model needs to be fine-tuned with high quality dataset of audio with word-level timestamp.

License

This project is licensed under the MIT License - see the LICENSE file for details

Acknowledgments

Includes slight modification of the original work: Whisper

Project details

Release history Release notifications | RSS feed

2.19.1

Aug 16, 2025

2.19.0

Mar 25, 2025

2.18.3

Jan 29, 2025

2.18.2

Jan 16, 2025

2.18.1

Jan 9, 2025

2.18.0

Dec 28, 2024

2.17.5

Oct 13, 2024

2.17.4

Sep 12, 2024

2.17.3

Jun 1, 2024

2.17.2

May 14, 2024

2.17.1

May 4, 2024

2.17.0

May 3, 2024

2.16.0

Apr 14, 2024

2.15.11

Apr 1, 2024

2.15.10

Mar 26, 2024

2.15.9

Mar 8, 2024

2.15.8

Feb 28, 2024

2.15.7

Feb 26, 2024

2.15.6

Feb 12, 2024

2.15.5

Feb 8, 2024

2.15.4

Feb 2, 2024

2.15.3

Jan 31, 2024

2.15.2

Jan 28, 2024

2.15.1

Jan 27, 2024

2.15.0

Jan 27, 2024

2.14.4

Jan 14, 2024

2.14.3

Jan 11, 2024

2.14.2

Dec 31, 2023

2.14.1

Dec 29, 2023

2.14.0

Dec 29, 2023

2.13.7

Dec 8, 2023

2.13.6 yanked

Dec 4, 2023

2.13.5

Nov 27, 2023

2.13.4

Nov 20, 2023

2.13.3

Nov 14, 2023

2.13.2

Nov 7, 2023

2.13.1

Oct 29, 2023

2.13.0

Oct 21, 2023

2.12.3

Oct 16, 2023

2.12.2

Oct 15, 2023

2.12.1

Oct 14, 2023

2.12.0

Oct 11, 2023

2.11.7

Oct 5, 2023

2.11.6

Oct 3, 2023

2.11.5

Oct 3, 2023

2.11.4

Sep 29, 2023

2.11.3

Sep 23, 2023

2.11.2

Sep 22, 2023

2.11.1

Sep 22, 2023

2.11.0

Sep 21, 2023

2.10.1

Sep 14, 2023

2.10.0

Sep 12, 2023

2.9.0

Aug 18, 2023

2.8.1

Aug 5, 2023

2.8.0 yanked

Aug 4, 2023

2.7.2

Jul 28, 2023

2.7.1

Jul 20, 2023

2.7.0

Jul 13, 2023

2.6.4

Jun 9, 2023

2.6.3

Jun 9, 2023

2.6.2

May 9, 2023

2.6.1

May 9, 2023

2.6.0

May 8, 2023

2.5.3

Apr 30, 2023

2.5.2

Apr 28, 2023

2.5.0

Apr 24, 2023

2.4.1

Apr 18, 2023

2.4.0

Apr 16, 2023

2.3.1

Apr 8, 2023

2.3.0

Apr 4, 2023

2.2.0

Mar 30, 2023

2.1.3

Mar 29, 2023

2.1.2

Mar 28, 2023

2.1.1

Mar 22, 2023

2.1.0

Mar 21, 2023

2.0.4

Mar 20, 2023

2.0.3

Mar 19, 2023

2.0.2

Mar 18, 2023

2.0.1

Mar 17, 2023

2.0.0

Mar 17, 2023

1.4.0

Mar 10, 2023

1.3.0

Mar 6, 2023

1.2.0

Feb 26, 2023

1.1.5

Feb 23, 2023

1.1.4

Feb 16, 2023

1.1.3

Feb 15, 2023

1.1.2

Feb 15, 2023

1.1.1

Feb 15, 2023

This version

1.1.1b0 pre-release yanked

Feb 15, 2023

1.1.0

Feb 15, 2023

1.0.3

Jan 18, 2023

1.0.2

Jan 6, 2023

1.0.1 yanked

Nov 25, 2022

1.0.0 yanked

Nov 20, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

stable-ts-1.1.1b0.tar.gz (30.4 kB view details)

Uploaded Feb 15, 2023 Source

File details

Details for the file stable-ts-1.1.1b0.tar.gz.

File metadata

Download URL: stable-ts-1.1.1b0.tar.gz
Upload date: Feb 15, 2023
Size: 30.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.8.15

File hashes

Hashes for stable-ts-1.1.1b0.tar.gz
Algorithm	Hash digest
SHA256	`d2c98f0ec6513a7f8be0b6b17dcc0ef4cc772f929d769ebea090cd52ff84d06c`
MD5	`f4a5e101e6c99dd2b3b1d56b7a1f825e`
BLAKE2b-256	`074c9f89c9a4661394dfe8de5cb1e6f87f3357fd3894d5fe2368a6c049a2897a`

See more details on using hashes here.

stable-ts 1.1.1b0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

Stabilizing Timestamps for Whisper

Description

TODO

Setup

Option 1: Install Whisper+stable-ts (one line)

Option 2: Install Whisper (repo) and stable-ts (PyPI) separately

Command-line usage

Python usage

Additional Info

License

Acknowledgments

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes