Stabilizing timestamps of OpenAI's Whisper outputs down to word-level.

Project description

Stabilizing Timestamps for Whisper

Description

This script modifies methods of Whisper's model to gain access to the predicted timestamp tokens of each word (token) without needing additional inference. It also stabilizes the timestamps down to the word (token) level to ensure chronology. Additionally, it can suppress gaps in speech for more accurate timestamps.

TODO

Add function to stabilize with multiple inferences
Add word timestamping (previously only token based)

Dependency

Whisper

Setup

Install Whisper
Check if Whisper is installed correctly by running a quick test

import whisper
model = whisper.load_model('base')
assert model.transcribe('audio.mp3').get('segments')

Install stable-ts

pip install stable-ts

Executing script

from stable_whisper import load_model

model = load_model('base')
# modified model should run just like the regular model but with additional hyperparameters and extra data in results
results = model.transcribe('audio.mp3')
stab_segments = results['segments']
first_segment_word_timestamps = stab_segments[0]['whole_word_timestamps']

# or to get token timestamps that adhere more to the top prediction
from stable_whisper import stabilize_timestamps
stab_segments = stabilize_timestamps(results, top_focus=True)

Generate .srt with stable timestamps

# word-level 
from stable_whisper import results_to_word_srt
# after you get results from modified model
# this treats a word timestamp as end time of the word
# and combines words if their timestamps overlap
results_to_word_srt(results, 'audio.srt')  # combine_compound=True will merge words with no prepended space

# sentence/phrase-level
from stable_whisper import results_to_sentence_srt
# after you get results from modified model
results_to_sentence_srt(results, 'audio.srt')
# below is from large model default settings

https://user-images.githubusercontent.com/28970749/202782436-0d56140b-5d52-4f33-b32b-317a19ad32ca.mp4

# sentence/phrase-level & word-level
from stable_whisper import results_to_sentence_word_ass
# after you get results from modified model
results_to_sentence_word_ass(results, 'audio.ass')
# below is from large model default settings

https://user-images.githubusercontent.com/28970749/202782412-dfa027f8-7073-4023-8ce5-285a2c26c03f.mp4

Additional Info

Since the sentence/segment-level timestamps are predicted directly, they are always more accurate and precise than word/token-level timestamps.
Although timestamps are chronological, they can still be off sync depending on the model and audio.
The unstable_word_timestamps are left in the results, so you can possibly find better way to utilize them.

License

This project is licensed under the MIT License - see the LICENSE file for details

Acknowledgments

Includes slight modification of the original work: Whisper

Project details

Release history Release notifications | RSS feed

2.19.0

Mar 25, 2025

2.18.3

Jan 29, 2025

2.18.2

Jan 16, 2025

2.18.1

Jan 9, 2025

2.18.0

Dec 28, 2024

2.17.5

Oct 13, 2024

2.17.4

Sep 12, 2024

2.17.3

Jun 1, 2024

2.17.2

May 14, 2024

2.17.1

May 4, 2024

2.17.0

May 3, 2024

2.16.0

Apr 14, 2024

2.15.11

Apr 1, 2024

2.15.10

Mar 26, 2024

2.15.9

Mar 8, 2024

2.15.8

Feb 28, 2024

2.15.7

Feb 26, 2024

2.15.6

Feb 12, 2024

2.15.5

Feb 8, 2024

2.15.4

Feb 2, 2024

2.15.3

Jan 31, 2024

2.15.2

Jan 28, 2024

2.15.1

Jan 27, 2024

2.15.0

Jan 27, 2024

2.14.4

Jan 14, 2024

2.14.3

Jan 11, 2024

2.14.2

Dec 31, 2023

2.14.1

Dec 29, 2023

2.14.0

Dec 29, 2023

2.13.7

Dec 8, 2023

2.13.6 yanked

Dec 4, 2023

2.13.5

Nov 27, 2023

2.13.4

Nov 20, 2023

2.13.3

Nov 14, 2023

2.13.2

Nov 7, 2023

2.13.1

Oct 29, 2023

2.13.0

Oct 21, 2023

2.12.3

Oct 16, 2023

2.12.2

Oct 15, 2023

2.12.1

Oct 14, 2023

2.12.0

Oct 11, 2023

2.11.7

Oct 5, 2023

2.11.6

Oct 3, 2023

2.11.5

Oct 3, 2023

2.11.4

Sep 29, 2023

2.11.3

Sep 23, 2023

2.11.2

Sep 22, 2023

2.11.1

Sep 22, 2023

2.11.0

Sep 21, 2023

2.10.1

Sep 14, 2023

2.10.0

Sep 12, 2023

2.9.0

Aug 18, 2023

2.8.1

Aug 5, 2023

2.8.0 yanked

Aug 4, 2023

2.7.2

Jul 28, 2023

2.7.1

Jul 20, 2023

2.7.0

Jul 13, 2023

2.6.4

Jun 9, 2023

2.6.3

Jun 9, 2023

2.6.2

May 9, 2023

2.6.1

May 9, 2023

2.6.0

May 8, 2023

2.5.3

Apr 30, 2023

2.5.2

Apr 28, 2023

2.5.0

Apr 24, 2023

2.4.1

Apr 18, 2023

2.4.0

Apr 16, 2023

2.3.1

Apr 8, 2023

2.3.0

Apr 4, 2023

2.2.0

Mar 30, 2023

2.1.3

Mar 29, 2023

2.1.2

Mar 28, 2023

2.1.1

Mar 22, 2023

2.1.0

Mar 21, 2023

2.0.4

Mar 20, 2023

2.0.3

Mar 19, 2023

2.0.2

Mar 18, 2023

2.0.1

Mar 17, 2023

2.0.0

Mar 17, 2023

1.4.0

Mar 10, 2023

1.3.0

Mar 6, 2023

1.2.0

Feb 26, 2023

1.1.5

Feb 23, 2023

1.1.4

Feb 16, 2023

1.1.3

Feb 15, 2023

1.1.2

Feb 15, 2023

1.1.1

Feb 15, 2023

1.1.1b0 pre-release yanked

Feb 15, 2023

1.1.0

Feb 15, 2023

1.0.3

Jan 18, 2023

1.0.2

Jan 6, 2023

1.0.1 yanked

Nov 25, 2022

This version

1.0.0 yanked

Nov 20, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

stable-ts-1.0.0.tar.gz (22.6 kB view details)

Uploaded Nov 20, 2022 Source

File details

Details for the file stable-ts-1.0.0.tar.gz.

File metadata

Download URL: stable-ts-1.0.0.tar.gz
Upload date: Nov 20, 2022
Size: 22.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.1 CPython/3.8.13

File hashes

Hashes for stable-ts-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`63f3b79b11b2ce8a6771d519ed423b49491a6161d281a452a679d81d57fdb5dc`
MD5	`a90ea687bf6cdc03c87b8926091142ca`
BLAKE2b-256	`7c30dfceac5e5da83b6459756c18f914110a9d02f3f33bfa3f710f2f9c312dd2`

See more details on using hashes here.

stable-ts 1.0.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

Stabilizing Timestamps for Whisper

Description

TODO

Dependency

Setup

Executing script

Generate .srt with stable timestamps

Additional Info

License

Acknowledgments

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes