pyctcdecode

CTC beam search decoder for speech recognition.

These details have been verified by PyPI

Maintainers

andrew-titus gkucsko Kensho lopez86 mshulman

Project description

pyctcdecode

A fast and feature-rich CTC beam search decoder for speech recognition written in Python, providing n-gram (kenlm) language model support similar to PaddlePaddle's decoder, but incorporating many new features such as byte pair encoding and real-time decoding to support models like Nvidia's Conformer-CTC or Facebook's Wav2Vec2.

pip install pyctcdecode

Main Features:

🔥 hotword boosting
🤖 handling of BPE vocabulary
👥 multi-LM support for 2+ models
🕒 stateful LM for real-time decoding
✨ native frame index annotation of words
💨 fast runtime, comparable to C++ implementation
🐍 easy-to-modify Python code

Quick Start:

import kenlm
from pyctcdecode import build_ctcdecoder

# load trained kenlm model
kenlm_model = kenlm.Model("/my/dir/kenlm_model.arpa")

# specify alphabet labels as they appear in logits
labels = [
    " ", "a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l",
    "m", "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z",
]

# prepare decoder and decode logits via shallow fusion
decoder = build_ctcdecoder(
    labels,
    kenlm_model,
    alpha=0.5,  # tuned on a val set
    beta=1.0,  # tuned on a val set
)
text = decoder.decode(logits)

If the vocabulary is BPE-based, pyctcdecode will automatically recognize that and handled token merging automatically.

(Note: the LM itself has no notion of this and is still word-based.)

labels = ["<unk>", "▁bug", "s", "▁bunny"]

decoder = build_ctcdecoder(
    labels,
    kenlm_model,
)
text = decoder.decode(logits)

Improve domain specificity by adding important contextual words ("hotwords") during inference:

hotwords = ["looney tunes", "anthropomorphic"]
text = decoder.decode(
    logits,
    hotwords=hotwords,
    hotword_weight=10.0,
)

(Note: pyctcdecode contains several free hyperparameters that can strongly influence error rate and wall time. Default values for these parameters were (merely) chosen in order to yield good performance for one particular use case. For best results, especially when working with languages other than English, users are encouraged to perform a hyperparameter optimization study on their own data.)

Batch support via multiprocessing:

from multiprocessing import Pool

with Pool() as pool:
    text_list = decoder.decode_batch(pool, logits_list)

Use pyctcdecode for a pretrained Conformer-CTC model:

import nemo.collections.asr as nemo_asr

asr_model = nemo_asr.models.EncDecCTCModelBPE.from_pretrained(
  model_name='stt_en_conformer_ctc_small'
)
logits = asr_model.transcribe(["my_file.wav"], logprobs=True)[0]

decoder = build_ctcdecoder(asr_model.decoder.vocabulary)
decoder.decode(logits)

The tutorials folder contains many well documented notebook examples on how to run speech recognition using pretrained models from Nvidia's NeMo and Huggingface/Facebook's Wav2Vec2.

For more details on how to use all of pyctcdecode's features, have a look at our main tutorial.

Why pyctcdecode?

In scientific computing, there’s often a tension between a language’s performance and its ease of use for prototyping and experimentation. Although C++ is the conventional choice for CTC decoders, we decided to try building one in Python. This choice allowed us to easily implement experimental features, while keeping runtime competitive through optimizations like caching and beam pruning. We compare the performance of pyctcdecode to an industry standard C++ decoder at various beam widths (shown as inline annotations), allowing us to visualize the trade-off of word error rate (y-axis) vs runtime (x-axis). For beam widths of 10 or greater, pyctcdecode yields strictly superior performance, with lower error rates in less time, see code here.

The use of Python allows us to easily implement features like hotword support with only a few lines of code.

pyctcdecode can return either a single transcript, or the full results of the beam search algorithm. The latter provides the language model state to enable real-time inference as well as word-based logit indices (frames) to enable word-based timing and confidence score calculations natively through the decoding process.

Additional features such as BPE vocabulary, as well as examples of pyctcdecode as part of a full speech recognition pipeline, can be found in the tutorials section.

Resources:

License:

Licensed under the Apache 2.0 License. Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Project details

These details have been verified by PyPI

Maintainers

andrew-titus gkucsko Kensho lopez86 mshulman

Release history Release notifications | RSS feed

0.5.0

Jan 20, 2023

0.4.0

Jul 19, 2022

0.3.0

Jan 13, 2022

This version

0.2.1

Dec 15, 2021

0.2.0

Dec 2, 2021

0.1.1

Oct 1, 2021

0.1.0

Jun 12, 2021

0.0.1

Jun 8, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyctcdecode-0.2.1.tar.gz (208.7 kB view details)

Uploaded Dec 15, 2021 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pyctcdecode-0.2.1-py2.py3-none-any.whl (43.7 kB view details)

Uploaded Dec 15, 2021 Python 2Python 3

File details

Details for the file pyctcdecode-0.2.1.tar.gz.

File metadata

Download URL: pyctcdecode-0.2.1.tar.gz
Upload date: Dec 15, 2021
Size: 208.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.7.1 importlib_metadata/4.8.2 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.6

File hashes

Hashes for pyctcdecode-0.2.1.tar.gz
Algorithm	Hash digest
SHA256	`6e58428c603b6d5a849af6ff3269faeca87e56ca52ee0a9df684b8f72246931e`
MD5	`87cec675abb918f9b221971528eda9e7`
BLAKE2b-256	`9579cb5fa2147b6b8376702896aaca597552ab998d5542672f07d1177fdebc3e`

See more details on using hashes here.

File details

Details for the file pyctcdecode-0.2.1-py2.py3-none-any.whl.

File metadata

Download URL: pyctcdecode-0.2.1-py2.py3-none-any.whl
Upload date: Dec 15, 2021
Size: 43.7 kB
Tags: Python 2, Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.7.1 importlib_metadata/4.8.2 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.6

File hashes

Hashes for pyctcdecode-0.2.1-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`b037ccc7a58dadd16d02c4082a6b12a9e1cd47eaa80569901353b1cf8cd1c9a4`
MD5	`23ef32281822dd48215c64910368cf33`
BLAKE2b-256	`7d2c7c13e01264d78676cfa23b3c06c4a6ada62b3a99ec4586c87bafa107d522`

See more details on using hashes here.

pyctcdecode 0.2.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

pyctcdecode

Main Features:

Quick Start:

Why pyctcdecode?

Resources:

License:

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes