Skip to main content

CTC beam search decoder for speech recognition.

Project description

pyctcdecode

A fast and feature-rich CTC beam search decoder for speech recognition written in Python, offering n-gram (kenlm) language model support similar to DeepSpeech, but incorporating many new features such as byte pair encoding to support modern architectures like Nvidia's Conformer-CTC or Facebooks's Wav2Vec2.

pip install .

Main Features:

  • 🔥 hotword boosting
  • 🤖 handling of BPE vocabulary
  • 👥 multi-LM support for 2+ models
  • 🕒 stateful LM for realtime decoding
  • ✨ native frame index annotation of words
  • 💨 fast runtime, comparable to C++ implementation
  • 🐍 easy to modify Python code

Quick Start:

import kenlm
from pyctcdecode import build_ctcdecoder

labels = [" ", "b", "u", "g"]  # tokens as they appear in logits
kenlm_model = kenlm.Model("/my/dir/kenlm_model.binary")  # load kenlm model

decoder = build_ctcdecoder(
    labels,
    kenlm_model, 
    alpha=0.5,  # tuned on a val set 
    beta=1.0,  # tuned on a val set 
)
text = decoder.decode(logits)  # decode via shallow fusion

if the vocabulary is BPE based, adjust the labels and set the is_bpe flag (merging of tokens for the LM is handled automatically):

labels = ["<unk>", "▁bug", "s", "▁bunny"]

decoder = build_ctcdecoder(
    labels,
    kenlm_model, 
    is_bpe=True,
)
text = decoder.decode(logits)

improve domain specificity by adding hotwords during inference:

hotword_list = ["looney tunes", "anthropomorphic"]
text = decoder.decode(logits, hotword_list=hotword_list)

batch support via multiprocessing:

from multiprocessing import Pool

with Pool() as pool:
    text_list = decoder.decode_batch(logits_list, pool)

use pyctcdecode for a production Conformer-CTC model:

import nemo.collections.asr as nemo_asr

asr_model = nemo_asr.models.EncDecCTCModelBPE.from_pretrained(
  model_name='stt_en_conformer_ctc_small'
)
logits = asr_model.transcribe(["my_file.wav"], logprobs=True)[0].cpu().detach().numpy()

decoder = build_ctcdecoder(asr_model.decoder.vocabulary, is_bpe=True)
decoder.decode(logits)

The tutorials folder contains many well documented notebook examples on how to run speech recognition from scratch using pretrained models from Nvidia's NeMo and Huggingface/Facebook's Wav2Vec2.

For more details on how to use all of pyctcdecode's features, have a look at our main tutorial.

Why pyctcdecode?

The flexibility of using Python allows us to implement various new features while keeping runtime competitive through little tricks like caching and beam pruning. When comparing pyctcdecode's runtime and accuracy to a standard C++ decoders, we see favorable trade offs between speed and accuracy, see code here.

Python also allows us to do nifty things like hotword support (at inference time) with only a few lines of code.

The full beam results contain the language model state to enable real time inference as well as word based logit indices (frames) to calculate timing and confidence scores of individual words natively through the decoding process.

Additional features such as BPE vocabulary as well as examples of pyctcdecode as part of a full speech recognition pipeline can be found in the tutorials section.

Resources:

License:

Licensed under the Apache 2.0 License. Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Copyright 2021-present Kensho Technologies, LLC. The present date is determined by the timestamp of the most recent commit in the repository.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyctcdecode-0.0.1.tar.gz (34.8 kB view details)

Uploaded Source

Built Distribution

pyctcdecode-0.0.1-py2.py3-none-any.whl (35.3 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file pyctcdecode-0.0.1.tar.gz.

File metadata

  • Download URL: pyctcdecode-0.0.1.tar.gz
  • Upload date:
  • Size: 34.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.5.0.1 requests/2.24.0 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.7.10

File hashes

Hashes for pyctcdecode-0.0.1.tar.gz
Algorithm Hash digest
SHA256 7a548f9c86f058ec3e5db305a99e953d409ea9862b94662bb9d5df145aa655f4
MD5 8e456b6db7a99497224399ad775e42cb
BLAKE2b-256 279613dbdece6e079135c8e7a4e04bb5acebd7c7bce6170002ff62bbab4df471

See more details on using hashes here.

File details

Details for the file pyctcdecode-0.0.1-py2.py3-none-any.whl.

File metadata

  • Download URL: pyctcdecode-0.0.1-py2.py3-none-any.whl
  • Upload date:
  • Size: 35.3 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.5.0.1 requests/2.24.0 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.7.10

File hashes

Hashes for pyctcdecode-0.0.1-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 b4663b4f4faa4a234e5eeb40f62c22d757f185b3ffef88bfc07306418e58dff1
MD5 8039cce209fa18296a478efc61bcbb06
BLAKE2b-256 ec707cfd9fdf002c9fdda8b89d4e5eaac23df842973684bc16bb8bd26de18d5c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page