Improve Whisper with RoPE and latest tokenizers of OpenAI

Project description

NeoWhisper

Improve whisper of OpenAI by integrating Rotary Positional Embeddings and adding more options for tokenizers published by OpenAI

Installation

pip install neo-whisper

Requirement

pip install git+https://github.com/openai/whisper.git

Usage

Loading tokenizer

from neo_whisper import get_tokenizer
tokenizer_name = 'cl100k_base'
tokenizer = get_tokenizer(multilingual=True, language='km', task='transcribe', encoder_name=tokenizer_name)
print(tokenizer.eot)

Loading NeoWhisper model

from neo_whisper import NeoWhisper, NeoModelDimensions
dims = NeoModelDimensions(
    n_vocab=tokenizer.encoding.n_vocab, # use the tokenizer's vocab size
    n_mels=80,       # or whatever context size you're training with
    n_audio_ctx=1500,
    n_audio_state=384,
    n_audio_head=6,
    n_audio_layer=4,
    n_text_ctx=448,
    n_text_state=384,
    n_text_head=4,
    n_text_kv_head=4,
    n_text_layer=6
)
model = NeoWhisper(dims)

This model works like the original model of OpenAI whisper (NeoWhisper inherits from Whisper of openai-whisper. TextDecoder of NeoWhisper is different from the one of Whisper in the sense that RoPE is integrated in NeoWhisper.).

Loading Original Whisper model

It is possible to load the model implemented in openai-whisper but with new tokenizer (such as cl100k_base).

from neo_whisper import Whisper, ModelDimensions
dims = ModelDimensions(
    n_vocab=tokenizer.encoding.n_vocab, # use the tokenizer's vocab size
    n_mels=80,       # or whatever context size you're training with
    n_audio_ctx=1500,
    n_audio_state=384,
    n_audio_head=6,
    n_audio_layer=4,
    n_text_ctx=448,
    n_text_state=384,
    n_text_head=4,
    n_text_layer=6
)
model = Whisper(dims)

NOTE: When using new tokenizer, you need to train your model.

Train TextDecoder

When the config of AudioEncoder is the same as the original whisper audio encoder trained by OpenAI, we can load pre-trained weight for the encoder and just train the text decoder. To load model with AudioEncoder of OpenAI whisper, simply provide neo_encoder=False when initialize NeoWhisper (by default, neo_encoder=True).

from neo_whisper import NeoWhisper, NeoModelDimensions
import whisper

dims = NeoModelDimensions(
    n_vocab=tokenizer.encoding.n_vocab, # use the tokenizer's vocab size
    n_mels=80,       # or whatever context size you're training with
    n_audio_ctx=1500,
    n_audio_state=384,
    n_audio_head=6,
    n_audio_layer=4,
    n_text_ctx=448,
    n_text_state=384,
    n_text_head=4,
    n_text_kv_head=4,
    n_text_layer=6
)
model = NeoWhisper(dims, neo_encoder=False)
# load pre-trained weight of audio encoder
model.encoder.load_state_dict(whisper.load_model("tiny").encoder.state_dict())
# freeze the pre-trained weight
for p in model.encoder.parameters():
    p.requires_grad = False

Transcription

We can use trained model for transcription in the same way as openai-whisper pypi. The only difference is that you must specify tokenzer_name properly. Concretely, tokenizer used in the transcription task must be the tokenizer used to train the model. So, tokenizer_name must be provided in the arguments of transcribe.

from neo_encoder import (
    get_tokenizer,
    NeoWhisper,
    NeoModelDimensions,
    transcribe
)
tokenizer_name = 'cl100k_base'
tokenizer = get_tokenizer(multilingual=True, task='transcribe', encoder_name=tokenizer_name)
dims = NeoModelDimensions(
    n_vocab=tokenizer.encoding.n_vocab, # use the tokenizer's vocab size
    n_mels=80,       # or whatever context size you're training with
    n_audio_ctx=1500,
    n_audio_state=384,
    n_audio_head=6,
    n_audio_layer=4,
    n_text_ctx=448,
    n_text_state=384,
    n_text_head=4,
    n_text_kv_head=4,
    n_text_layer=6
)
model = NeoWhisper(dims, neo_encoder=False) # if you use neo_encoder, specify accordingly
best_model_params_path = "path/to/your/weights.pt"
model.load_state_dict(torch.load(best_model_params_path))

result = transcribe(wmodel, audio_array, verbose=True, tokenizer_name=tokenizer_name)
print(result['text'])

TODO:

implement decoding function for NeoWhisper and Whisper
implement transcription for NeoWhisper and Whisper
notebook colab for training NeoWhisper
benchmarking

Project details

Release history Release notifications | RSS feed

0.1.10

Mar 29, 2026

0.1.9

Mar 14, 2026

0.1.8

Feb 20, 2026

0.1.7

Feb 20, 2026

0.1.6

Feb 20, 2026

0.1.5

Feb 19, 2026

0.1.4

Feb 19, 2026

0.1.3

Jan 18, 2026

0.1.2

Jan 17, 2026

0.1.1

Jan 10, 2026

0.1.0

Jan 8, 2026

This version

0.0.3

Dec 17, 2025

0.0.2

Dec 14, 2025

0.0.1

Dec 14, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

neo_whisper-0.0.3.tar.gz (26.0 kB view details)

Uploaded Dec 17, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

neo_whisper-0.0.3-py3-none-any.whl (126.8 kB view details)

Uploaded Dec 17, 2025 Python 3

File details

Details for the file neo_whisper-0.0.3.tar.gz.

File metadata

Download URL: neo_whisper-0.0.3.tar.gz
Upload date: Dec 17, 2025
Size: 26.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for neo_whisper-0.0.3.tar.gz
Algorithm	Hash digest
SHA256	`672e82f603cdf90e5cb2e2c9c46427c312dee1a0ed09a11abecfd151fbc0669f`
MD5	`cbf051e0be6e2774bda9af3b761b3d4d`
BLAKE2b-256	`678a5444551c1c19821e224587cb43062e8d5f59e7f37558279701a1fe638f45`

See more details on using hashes here.

File details

Details for the file neo_whisper-0.0.3-py3-none-any.whl.

File metadata

Download URL: neo_whisper-0.0.3-py3-none-any.whl
Upload date: Dec 17, 2025
Size: 126.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for neo_whisper-0.0.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f40628400988405615d04abbdd820e6fa9621455a8a02418fbf08d6524ae55b0`
MD5	`51abd955ab0efb4094dd431f44fc6f51`
BLAKE2b-256	`edee1eade050db27a7a03846f51db63601e2c19199383488656906fcabf20924`

See more details on using hashes here.

neo-whisper 0.0.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

NeoWhisper

Installation

Requirement

Usage

Loading tokenizer

Loading NeoWhisper model

Loading Original Whisper model

Train TextDecoder

Transcription

TODO:

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes