Improve Whisper with RoPE and latest tokenizers of OpenAI
Project description
NeoWhisper
Improve whisper of OpenAI by integrating Rotary Positional Embeddings and adding more options for tokenizers published by OpenAI
Installation
pip install neo-whisper
Requirement
pip install git+https://github.com/openai/whisper.git
Usage
Loading tokenizer
from neo_whisper import get_tokenizer
tokenizer_name = 'cl100k_base'
tokenizer = get_tokenizer(multilingual=True, language='km', task='transcribe', encoder_name=tokenizer_name)
print(tokenizer.eot)
Loading NeoWhisper model
from neo_whisper import NeoWhisper, NeoModelDimensions
dims = NeoModelDimensions(
n_vocab=tokenizer.encoding.n_vocab, # use the tokenizer's vocab size
n_mels=80, # or whatever context size you're training with
n_audio_ctx=1500,
n_audio_state=384,
n_audio_head=6,
n_audio_layer=4,
n_text_ctx=448,
n_text_state=384,
n_text_head=4,
n_text_kv_head=4,
n_text_layer=6
)
model = NeoWhisper(dims)
This model works like the original model of OpenAI whisper (NeoWhisper inherits from Whisper of openai-whisper. TextDecoder of NeoWhisper is different from the one of Whisper in the sense that RoPE is integrated in NeoWhisper.).
Loading Original Whisper model
It is possible to load the model implemented in openai-whisper but with new tokenizer (such as cl100k_base).
from neo_whisper import Whisper, ModelDimensions
dims = ModelDimensions(
n_vocab=tokenizer.encoding.n_vocab, # use the tokenizer's vocab size
n_mels=80, # or whatever context size you're training with
n_audio_ctx=1500,
n_audio_state=384,
n_audio_head=6,
n_audio_layer=4,
n_text_ctx=448,
n_text_state=384,
n_text_head=4,
n_text_layer=6
)
model = Whisper(dims)
NOTE: When using new tokenizer, you need to train your model.
Train TextDecoder
When the config of AudioEncoder is the same as the original whisper audio encoder trained by OpenAI, we can load pre-trained weight for the encoder and just train the text decoder.
from neo_whisper import NeoWhisper, NeoModelDimensions
import whisper
dims = NeoModelDimensions(
n_vocab=tokenizer.encoding.n_vocab, # use the tokenizer's vocab size
n_mels=80, # or whatever context size you're training with
n_audio_ctx=1500,
n_audio_state=384,
n_audio_head=6,
n_audio_layer=4,
n_text_ctx=448,
n_text_state=384,
n_text_head=4,
n_text_kv_head=4,
n_text_layer=6
)
model = NeoWhisper(dims)
# load pre-trained weight of audio encoder
model.encoder.load_state_dict(whisper.load_model("tiny").encoder.state_dict())
# freeze the pre-trained weight
for p in model.encoder.parameters():
p.requires_grad = False
TODO:
- implement decoding function for
NeoWhisperandWhisper - notebook colab for training
NeoWhisper - implement transcription for
NeoWhisperandWhisper - benchmarking
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file neo_whisper-0.0.2.tar.gz.
File metadata
- Download URL: neo_whisper-0.0.2.tar.gz
- Upload date:
- Size: 15.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
31557aed340be3b4aaead85c386676c4d81833f27540a510802a1b2697720850
|
|
| MD5 |
742af6e3e1d6ef979614cd0da6d8f5c8
|
|
| BLAKE2b-256 |
69c35fbc0e33f17c0fc6f45afa65e6a71a2c42a63cd0e1c44920d5a753e6d1df
|
File details
Details for the file neo_whisper-0.0.2-py3-none-any.whl.
File metadata
- Download URL: neo_whisper-0.0.2-py3-none-any.whl
- Upload date:
- Size: 213.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
036af2d275e0874258c24d2dd63e116503ee2a2c8b36035acbe1de226d70e4d1
|
|
| MD5 |
e35cefd4c3f4a7a1842396c197b49454
|
|
| BLAKE2b-256 |
02f6e488c48b583f3fdd3fd28a01d0db9cc01f1a82fbd2957e4c5fe88e993b25
|