Skip to main content

Improve Whisper with RoPE and latest tokenizers of OpenAI

Project description

NeoWhisper

Improve whisper of OpenAI by integrating Rotary Positional Embeddings and adding more options for tokenizers published by OpenAI

Installation

pip install neo-whisper

Requirement

pip install git+https://github.com/openai/whisper.git

Usage

Loading tokenizer

from neo_whisper import get_tokenizer
tokenizer_name = 'cl100k_base'
tokenizer = get_tokenizer(multilingual=True, language='km', task='transcribe', encoder_name=tokenizer_name)
print(tokenizer.eot)

Loading NeoWhisper model

from neo_whisper import NeoWhisper, NeoModelDimensions
dims = NeoModelDimensions(
    n_vocab=tokenizer.encoding.n_vocab, # use the tokenizer's vocab size
    n_mels=80,       # or whatever context size you're training with
    n_audio_ctx=1500,
    n_audio_state=384,
    n_audio_head=6,
    n_audio_layer=4,
    n_text_ctx=448,
    n_text_state=384,
    n_text_head=4,
    n_text_kv_head=4,
    n_text_layer=6
)
model = NeoWhisper(dims)

This model works like the original model of OpenAI whisper (NeoWhisper inherits from Whisper of openai-whisper. TextDecoder of NeoWhisper is different from the one of Whisper in the sense that RoPE is integrated in NeoWhisper.).

Loading Original Whisper model

It is possible to load the model implemented in openai-whisper but with new tokenizer (such as cl100k_base).

from neo_whisper import Whisper, ModelDimensions
dims = ModelDimensions(
    n_vocab=tokenizer.encoding.n_vocab, # use the tokenizer's vocab size
    n_mels=80,       # or whatever context size you're training with
    n_audio_ctx=1500,
    n_audio_state=384,
    n_audio_head=6,
    n_audio_layer=4,
    n_text_ctx=448,
    n_text_state=384,
    n_text_head=4,
    n_text_layer=6
)
model = Whisper(dims)

NOTE: When using new tokenizer, you need to train your model.

Train TextDecoder

When the config of AudioEncoder is the same as the original whisper audio encoder trained by OpenAI, we can load pre-trained weight for the encoder and just train the text decoder.

from neo_whisper import NeoWhisper, NeoModelDimensions
import whisper

dims = NeoModelDimensions(
    n_vocab=tokenizer.encoding.n_vocab, # use the tokenizer's vocab size
    n_mels=80,       # or whatever context size you're training with
    n_audio_ctx=1500,
    n_audio_state=384,
    n_audio_head=6,
    n_audio_layer=4,
    n_text_ctx=448,
    n_text_state=384,
    n_text_head=4,
    n_text_kv_head=4,
    n_text_layer=6
)
model = NeoWhisper(dims)
# load pre-trained weight of audio encoder
model.encoder.load_state_dict(whisper.load_model("tiny").encoder.state_dict())
# freeze the pre-trained weight
for p in model.encoder.parameters():
    p.requires_grad = False

TODO:

  • implement decoding function for NeoWhisper and Whisper
  • notebook colab for training NeoWhisper
  • implement transcription for NeoWhisper and Whisper
  • benchmarking

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

neo_whisper-0.0.2.tar.gz (15.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

neo_whisper-0.0.2-py3-none-any.whl (213.1 kB view details)

Uploaded Python 3

File details

Details for the file neo_whisper-0.0.2.tar.gz.

File metadata

  • Download URL: neo_whisper-0.0.2.tar.gz
  • Upload date:
  • Size: 15.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for neo_whisper-0.0.2.tar.gz
Algorithm Hash digest
SHA256 31557aed340be3b4aaead85c386676c4d81833f27540a510802a1b2697720850
MD5 742af6e3e1d6ef979614cd0da6d8f5c8
BLAKE2b-256 69c35fbc0e33f17c0fc6f45afa65e6a71a2c42a63cd0e1c44920d5a753e6d1df

See more details on using hashes here.

File details

Details for the file neo_whisper-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: neo_whisper-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 213.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for neo_whisper-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 036af2d275e0874258c24d2dd63e116503ee2a2c8b36035acbe1de226d70e4d1
MD5 e35cefd4c3f4a7a1842396c197b49454
BLAKE2b-256 02f6e488c48b583f3fdd3fd28a01d0db9cc01f1a82fbd2957e4c5fe88e993b25

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page