optimized-transducer

No project description provided

These details have not been verified by PyPI

Project links

Homepage

Project description

Introduction

This project implements the optimization techniques proposed in Improving RNN Transducer Modeling for End-to-End Speech Recognition to reduce the memory consumption for computing transducer loss.

How does it differ from the RNN-T loss from torchaudio

Actually, the implementation is based on torchaudio, so the two are functionally equivalent, i.e., they produce the same output for the same input.

However, this project is more memory efficient and potentially faster (TODO: This needs some benchmarks)

How does it differ from warp-transducer

I don't have much experience with warp-transducer. But I know that warp-transducer produces different gradients for CPU and CUDA when using the same input. See https://github.com/HawkAaron/warp-transducer/issues/93

optimized_transducer uses less memory than that of warp-transducer.

Installation

You can install it via pip:

pip install optimized_transducer

Installation FAQ

What operating systems are supported ?

It has been tested on Ubuntu 18.04. It should also work on macOS and other unixes systems. It may work on Windows, though it is not tested.

How to display installation log ?

Use

pip install --verbose optimized_transducer

How to reduce installation time ?

Use

export OT_MAKE_ARGS="-j"
pip install --verbose optimized_transducer

It will pass -j to make.

Which version of PyTorch is supported ?

It has been tested on PyTorch >= 1.5.0. It may work on PyTorch < 1.5.0

How to install a CPU version of `optimized_transducer` ?

Use

export OT_CMAKE_ARGS="-DCMAKE_BUILD_TYPE=Release -DOT_WITH_CUDA=OFF"
export OT_MAKE_ARGS="-j"
pip install --verbose optimized_transducer

It will pass -DCMAKE_BUILD_TYPE=Release -DOT_WITH_CUDA=OFF to cmake.

What Python versions are supported ?

Python >= 3.6 is known to work. It may work for Python 2.7, though it is not tested.

Where to get help if I have problems with the installation ?

Please file an issue at https://github.com/csukuangfj/optimized_transducer/issues and describe your problem there.

Usage

optimized_transducer expects that the output shape of the joint network is NOT (N, T, U, V), but is (sum_all_TU, V), which is a concatenation of 2-D tensors: (T_1 * U_1, V), (T_2 * U_2, V), ..., (T_N, U_N, V). Note: (T_1 * U_1, V) is just the reshape of a 3-D tensor (T_1, U_1, V).

Suppose your original joint network looks somewhat like the following:

encoder_out = torch.rand(N, T, D) # from the encoder
decoder_out = torch.rand(N, U, D) # from the decoder, i.e., the prediction network

encoder_out = encoder_out.unsqueeze(2) # Now encoder out is (N, T, 1, D)
decoder_out = decoder_out.unsqueeze(1) # Now decoder out is (N, T, 1, D)

x = encoder_out + decoder_out # x is of shape (N, T, U, D)
activation = torch.tanh(x)

logits = linear(activation) # linear is an instance of `nn.Linear`.

loss = torchaudio.functional.rnnt_loss(
    logits=logits,
    targets=targets,
    logit_lengths=logit_lengths,
    target_lengths=target_lengths,
    blank=blank_id,
    reduction="mean",
)

You need to change it to the following:

encoder_out = torch.rand(N, T, D) # from the encoder
decoder_out = torch.rand(N, U, D) # from the decoder, i.e., the prediction network

encoder_out_list = [encoder_out[i, :logit_lengths[i], :] for i in range(N)]
decoder_out_list = [decoder_out[i, :target_lengths[i]+1, :] for i in range(N)]

x = [e.unsqueeze(1) + d.unsqueeze(0) for e, d in zip(encoder_out_list, decoder_out_list)]
x = [p.reshape(-1, D) for p in x]
x = torch.cat(x)

activation = torch.tanh(x)
logits = linear(activation) # linear is an instance of `nn.Linear`.

loss = optimized_transducer.transducer_loss(
    logits=logits,
    targets=targets,
    logit_lengths=logit_lengths,
    target_lengths=target_lengths,
    blank=blank_id,
    reduction="mean",
)

For more usages, please refer to

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

1.4

Jun 10, 2022

1.3

Jan 20, 2022

1.2

Dec 31, 2021

1.1

Dec 30, 2021

This version

1.0

Dec 28, 2021

0.9.1

Dec 28, 2021

0.9

Dec 28, 2021

0.0.1

Dec 23, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

optimized_transducer-1.0.tar.gz (38.6 kB view hashes)

Uploaded Dec 28, 2021 Source

Hashes for optimized_transducer-1.0.tar.gz

Hashes for optimized_transducer-1.0.tar.gz
Algorithm	Hash digest
SHA256	`3f3b9cadeab35aee3d84ce26949167fa3f5679dd3141079b93a689e91595920d`
MD5	`2726a75a11cd9eeecfea689ed92d2fca`
BLAKE2b-256	`0c1b5deab90a308e1a69b4c048eab26c5a65bb71c8466924beca183273514f59`

optimized-transducer 1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Introduction

How does it differ from the RNN-T loss from torchaudio

How does it differ from warp-transducer

Installation

Installation FAQ

What operating systems are supported ?

How to display installation log ?

How to reduce installation time ?

Which version of PyTorch is supported ?

How to install a CPU version of `optimized_transducer` ?

What Python versions are supported ?

Where to get help if I have problems with the installation ?

Usage

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

optimized-transducer 1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Introduction

How does it differ from the RNN-T loss from torchaudio

How does it differ from warp-transducer

Installation

Installation FAQ

What operating systems are supported ?

How to display installation log ?

How to reduce installation time ?

Which version of PyTorch is supported ?

How to install a CPU version of optimized_transducer ?

What Python versions are supported ?

Where to get help if I have problems with the installation ?

Usage

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

How to install a CPU version of `optimized_transducer` ?