k2-sherpa

No project description provided

These details have not been verified by PyPI

Project links

Homepage

Project description

Introduction

An ASR server framework in Python, supporting both streaming and non-streaming recognition.

CPU-bound tasks, such as neural network computation, are implemented in C++; while IO-bound tasks, such as socket communication, are implemented in Python.

Caution: For offline ASR, we assume the model is trained using pruned stateless RNN-T from icefall and it is from a directory like pruned_transducer_statelessX where X >=2. For streaming ASR, we assume the model is using pruned_stateless_emformer_rnnt2.

For the offline ASR, we provide a Colab notebook, containing how to start the server, how to start the client, and how to decode test-clean of LibriSpeech.

For the streaming ASR, we provide a YouTube demo, showing you how to use it. See https://www.youtube.com/watch?v=z7HgaZv5W0U

Installation

First, you have to install PyTorch and torchaudio. PyTorch 1.10 is known to work. Other versions may also work.

Second, clone this repository

git clone https://github.com/k2-fsa/sherpa
cd sherpa
pip install -r ./requirements.txt

Third, install the C++ extension of sherpa. You can use one of the following methods.

Option 1: Use `pip` (Support Linux/macOS/Windows)

pip install --verbose k2-sherpa

pip install --verbose git+https://github.com/k2-fsa/shera

Option 2: Build from source with `setup.py` (Support Linux/macOS/Windows)

python3 setup.py install

Option 3: Build from source with `cmake` (Support Linux/macOS/Windows)

mkdir build
cd build
cmake ..
make -j
export PYTHONPATH=$PWD/../sherpa/python:$PWD/lib:$PYTHONPATH

Usage

First, check that sherpa has been installed successfully:

python3 -c "import sherpa; print(sherpa.__version__)"

It should print the version of sherpa.

Streaming ASR with pruned stateless Emformer RNN-T

Start the server

To start the server, you need to first generate two files:

(1) The torch script model file. You can use export.py --jit=1 in pruned_stateless_emformer_rnnt2 from icefall.
(2) The BPE model file. You can find it in data/lang_bpe_XXX/bpe.model in icefall, where XXX is the number of BPE tokens used in the training.

With the above two files ready, you can start the server with the following command:

./sherpa/bin/pruned_stateless_emformer_rnnt2/streaming_server.py \
  --port 6006 \
  --max-batch-size 50 \
  --max-wait-ms 5 \
  --nn-pool-size 1 \
  --nn-model-filename ./path/to/exp/cpu_jit.pt \
  --bpe-model-filename ./path/to/data/lang_bpe_500/bpe.model

You can use ./sherpa/bin/pruned_stateless_emformer_rnnt2/streaming_server.py --help to view the help message.

We provide a pretrained model using the LibriSpeech dataset at https://huggingface.co/csukuangfj/icefall-asr-librispeech-pruned-stateless-emformer-rnnt2-2022-06-01

The following shows how to use the above pretrained model to start the server.

git lfs install
git clone https://huggingface.co/csukuangfj/icefall-asr-librispeech-pruned-stateless-emformer-rnnt2-2022-06-01

./sherpa/bin/pruned_stateless_emformer_rnnt2/streaming_server.py \
  --port 6006 \
  --max-batch-size 50 \
  --max-wait-ms 5 \
  --nn-pool-size 1 \
  --nn-model-filename ./icefall-asr-librispeech-pruned-stateless-emformer-rnnt2-2022-06-01/exp/cpu_jit-epoch-39-avg-6-use-averaged-model-1.pt \
  --bpe-model-filename ./icefall-asr-librispeech-pruned-stateless-emformer-rnnt2-2022-06-01/data/lang_bpe_500/bpe.model

Start the client

We provide two clients at present:

(1) ./sherpa/bin/pruned_stateless_emformer_rnnt2/streaming_client.py It shows how to decode a single sound file.
(2) ./sherpa/bin/pruned_stateless_emformer_rnnt2/web You can record your speech in real-time within a browser and send it to the server for recognition.

streaming_client.py

./sherpa/bin/pruned_stateless_emformer_rnnt2/streaming_client.py --help

./sherpa/bin/pruned_stateless_emformer_rnnt2/streaming_client.py \
  --server-addr localhost \
  --server-port 6006 \
  ./icefall-asr-librispeech-pruned-stateless-emformer-rnnt2-2022-06-01/test_wavs/1221-135766-0001.wav

Web client

cd ./sherpa/bin/pruned_stateless_emformer_rnnt2/web
python3 -m http.server 6008

Then open your browser and go to http://localhost:6008/record.html. You will see a UI like the following screenshot.

web client screenshot

Click the button Record.

Now you can speak and you will get recognition results from the server in real-time.

Caution: For the web client, we hard-code the server port to 6006. You can change the file ./sherpa/bin/pruned_stateless_emformer_rnnt2/web/record.js to replace 6006 in it to whatever port the server is using.

Caution: http://0.0.0.0:6008/record.html or http://127.0.0.1:6008/record.html won't work. You have to use localhost. Otherwise, you won't be able to use your microphone in your browser since we are not using https which requires a certificate.

Offline ASR

Start the server

To start the server, you need to first generate two files:

(1) The torch script model file. You can use export.py --jit=1 in pruned_transducer_statelessX from icefall.
(2) The BPE model file. You can find it in data/lang_bpe_XXX/bpe.model in icefall, where XXX is the number of BPE tokens used in the training.

With the above two files ready, you can start the server with the following command:

sherpa/bin/offline_server.py \
  --port 6006 \
  --num-device 0 \
  --max-batch-size 10 \
  --max-wait-ms 5 \
  --feature-extractor-pool-size 5 \
  --nn-pool-size 1 \
  --nn-model-filename ./path/to/exp/cpu_jit.pt \
  --bpe-model-filename ./path/to/data/lang_bpe_500/bpe.model

You can use ./sherpa/bin/offline_server.py --help to view the help message.

We provide a pretrained model using the LibriSpeech dataset at https://huggingface.co/csukuangfj/icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13

The following shows how to use the above pretrained model to start the server.

git lfs install
git clone https://huggingface.co/csukuangfj/icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13

sherpa/bin/offline_server.py \
  --port 6006 \
  --num-device 0 \
  --max-batch-size 10 \
  --max-wait-ms 5 \
  --feature-extractor-pool-size 5 \
  --nn-pool-size 1 \
  --nn-model-filename ./icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/exp/cpu_jit.pt \
  --bpe-model-filename ./icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/data/lang_bpe_500/bpe.model

Start the client

After starting the server, you can use the following command to start the client:

./sherpa/bin/offline_client.py \
    --server-addr localhost \
    --server-port 6006 \
    /path/to/foo.wav \
    /path/to/bar.wav

You can use ./sherpa/bin/offline_client.py --help to view the usage message.

The following shows how to use the client to send some test waves to the server for recognition.

sherpa/bin/offline_client.py \
  --server-addr localhost \
  --server-port 6006 \
  icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13//test_wavs/1089-134686-0001.wav \
  icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13//test_wavs/1221-135766-0001.wav \
  icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13//test_wavs/1221-135766-0002.wav

RTF test

We provide a demo ./sherpa/bin/decode_manifest.py to decode the test-clean dataset from the LibriSpeech corpus.

It creates 50 connections to the server using websockets and sends audio files to the server for recognition.

At the end, it will display the RTF and the WER.

To give you an idea of the performance of the pretrained model, the Colab notebook shows the following results:

RTF: 0.0094
total_duration: 19452.481 seconds (5.40 hours)
processing time: 183.305 seconds (0.05 hours)
%WER = 2.06

Errors: 112 insertions, 93 deletions, 876 substitutions, over 52576 reference words (51607 correct)

If you have a GPU with a larger RAM (e.g., 32 GB), you can get an even lower RTF.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

1.2

Mar 10, 2023

1.0

Nov 4, 2022

0.9.1

Sep 19, 2022

0.8

Sep 15, 2022

0.7

Aug 21, 2022

0.6

Jul 3, 2022

0.5

Jun 11, 2022

This version

0.4

Jun 7, 2022

0.3

Jun 4, 2022

0.2

May 26, 2022

0.1

May 24, 2022

0.0.2

May 24, 2022

0.0.1

May 24, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

k2-sherpa-0.4.tar.gz (179.2 kB view details)

Uploaded Jun 7, 2022 Source

File details

Details for the file k2-sherpa-0.4.tar.gz.

File metadata

Download URL: k2-sherpa-0.4.tar.gz
Upload date: Jun 7, 2022
Size: 179.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.1 CPython/3.8.12

File hashes

Hashes for k2-sherpa-0.4.tar.gz
Algorithm	Hash digest
SHA256	`a37ba0527b18a90f1124a003b48647f69550188f3721ac6c99161495dcaf9d65`
MD5	`afc13e7f509e4f6ab88488edcc07bf70`
BLAKE2b-256	`e87ff75f80dbaac4ffd036cb42d2067bea547cc5c77bbf7b8ebac4795a5c579a`

See more details on using hashes here.

k2-sherpa 0.4

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Introduction

Installation

Option 1: Use `pip` (Support Linux/macOS/Windows)

Option 2: Build from source with `setup.py` (Support Linux/macOS/Windows)

Option 3: Build from source with `cmake` (Support Linux/macOS/Windows)

Usage

Streaming ASR with pruned stateless Emformer RNN-T

Start the server

Start the client

streaming_client.py

Web client

Offline ASR

Start the server

Start the client

RTF test

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes

k2-sherpa 0.4

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Introduction

Installation

Option 1: Use pip (Support Linux/macOS/Windows)

Option 2: Build from source with setup.py (Support Linux/macOS/Windows)

Option 3: Build from source with cmake (Support Linux/macOS/Windows)

Usage

Streaming ASR with pruned stateless Emformer RNN-T

Start the server

Start the client

streaming_client.py

Web client

Offline ASR

Start the server

Start the client

RTF test

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes

Option 1: Use `pip` (Support Linux/macOS/Windows)

Option 2: Build from source with `setup.py` (Support Linux/macOS/Windows)

Option 3: Build from source with `cmake` (Support Linux/macOS/Windows)