deepspeech-server

server for mozilla deepspeech

These details have not been verified by PyPI

Project links

Homepage

Project description

https://github.com/MainRo/deepspeech-server/actions/workflows/pythonpackage.yml/badge.svg

https://badge.fury.io/py/deepspeech-server.svg

Key Features

This is an http server that can be used to test the Coqui STT project (the successor of the Mozilla DeepSpeech project). You need an environment with DeepSpeech or Coqui to run this server.

This code uses the Coqui STT 1.0 APIs.

Installation

The server is available on pypi, so you can install it with pip:

pip3 install deepspeech-server

You can also install deepspeech server from sources:

python3 setup.py install

Note that python 3.6 is the minimum version required to run the server.

Starting the server

deepspeech-server --config config.yaml

What is a STT model?

The quality of the speech-to-text engine depends heavily on which models it loads at runtime. Think of them as a sort of pattern that controls how the engine works.

How to use a specific STT model

You can use coqui without training a model. Pre-trained models are on offer at the Coqui Model Zoo (Make sure the STT Models tab is selected):

https://coqui.ai/models

Once you’ve downloaded a pre-trained model, make a copy of the sample configuration file. Edit the “model” and “scorer” fields in your new file for the engine you want to use so that they match the downloaded files:

cp config.sample.yaml config.yaml
$EDITOR config.yaml

Lastly, start the server:

deepspeech-server --config config.yaml

Server configuration

The configuration is done with a yaml file, provided with the “–config” argument. Its structure is the following one:

coqui:
  model: coqui-1.0.tflite
  scorer: huge-vocabulary.scorer
  beam_width: 500
server:
  http:
    host: "0.0.0.0"
    port: 8080
    request_max_size: 1048576
log:
  level:
    - logger: deepspeech_server
      level: DEBUG

The configuration file contains several sections and sub-sections.

coqui section configuration

Section “coqui” contains configuration of the coqui-stt engine:

model: The model that was trained by coqui. Must be a tflite (TensorFlow Lite) file.

scorer: [Optional] The scorer file. Use this to tune the STT to understand certain phrases better.

lm_alpha: [Optional] alpha hyperparameter for the scorer.

lm_beta: [Optional] beta hyperparameter for the scorer.

beam_width: [Optional] The size of the beam search. Corresponds directly to how long decoding takes.

http section configuration

request_max_size (default value: 1048576, i.e. 1MiB) is the maximum payload size allowed by the server. A received payload size above this threshold will return a “413: Request Entity Too Large” error.

host The listen address of the http server.

port The listening port of the http server.

log section configuration

The log section can be used to set the log levels of the server. This section contains a list of log entries. Each log entry contains the name of a logger and its level. Both follow the convention of the python logging module.

Using the server

Inference on the model is done via http post requests. For example with the following curl command:

curl -X POST --data-binary @testfile.wav http://localhost:8080/stt

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

3.0.1

Jul 12, 2022

2.2.0

Jul 11, 2021

2.1.0

May 26, 2020

2.0.0

Jan 30, 2020

1.1.0

Jun 24, 2019

1.0.0

Sep 26, 2018

0.6.0

Sep 3, 2018

0.5.0

Aug 27, 2018

0.4.1

Feb 27, 2018

0.4.0

Dec 22, 2017

0.3.0

Dec 7, 2017

0.2.1

Dec 5, 2017

0.2.0

Dec 5, 2017

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

deepspeech-server-3.0.1.tar.gz (12.3 kB view details)

Uploaded Jul 12, 2022 Source

File details

Details for the file deepspeech-server-3.0.1.tar.gz.

File metadata

Download URL: deepspeech-server-3.0.1.tar.gz
Upload date: Jul 12, 2022
Size: 12.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.1 CPython/3.9.13

File hashes

Hashes for deepspeech-server-3.0.1.tar.gz
Algorithm	Hash digest
SHA256	`cf0c1dd785bfa6f8e7ba290ba0ceee7c3d6451f5cf8c1269f94f782611b21605`
MD5	`e417cd34020360c886608d31d7a4761b`
BLAKE2b-256	`874b4abea371bfd34621594e2383fca9c9fca2e54a941574a2421d972da4ab29`