Skip to main content

Simple but extensible API for Speech Recognition.

Project description

PyPI - Version PyPI - Python Version PyPI - License PyPI - Downloads codecov

Speech Recognition API

Simple but extensible API for Speech Recognition.

Installation

From pip:

pip install speech_recognition_api[all]

From git:

git clone https://github.com/asai95/speech-recognition-api.git
pip install -r requirements.txt

Usage

Simple dev server:

python -m speech_recognition_api

Gunicorn:

gunicorn "speech_recognition_api:create_app()" -k uvicorn.workers.UvicornWorker -w 1 -b 127.0.0.1:8888

Celery worker:

celery -A speech_recognition_api.extra.celery_bus worker

Huey worker:

huey_consumer speech_recognition_api.extra.huey_bus.huey

Description

This project is aimed to simplify building and deploying of applications that require Speech Recognition functionality.

It is designed to work as a microservice, so it does not handle stuff like auth and rate limits.

However, it is also designed to be extensible in 3 major areas:

  • Models
  • File Storages
  • Message Busses

There are two types of APIs available.

Synchronous API

This API is designed for simple workloads, where the machine that runs the server is capable of running a model. You probably want to limit the payload size for these routes.

Routes:

POST /sync/v1/transcribe

Accepts an audio file. File type depends on the model that is being used.

Returns an object with transcription. Response model.

Asynchronous API

This API is designed to process files asynchronously, i.e. to create tasks and process them on separate workers. Typical client flow here is as follows:

  • Create a task and receive task id
  • Use this task id to periodically check if it is completed.

Routes:

POST /async/v1/transcribe

Accepts an audio file. File type depends on the model that is being used.

Returns an object with async task id. Response model.

GET /async/v1/transcribe/{task_id}

Returns an object with status and a transcription (if transcription is available). Response model.

Async API also requires a worker to run the actual work.

Configuring

Configuration is done by .env file or env variables (they take preference).

The main variables required for the API and worker to run are:

  • MODEL - model class path (it will do the actual audio-to-text conversion)
  • STORAGE - storage class path (in Async API it will be responsible for uploading/downloading files)
  • MESSAGE_BUS - message bus class path (in Async API it will be responsible for sending tasks to remoted workers and getting the result back from them)

These classes will be imported only when used for the fist time.

Each class may require its own variables. Please refer to config.py of the specific module to get the config reference.

Built-in classes:

Models:

Storages:

Message Busses:

Extending

It is easy to extend the API by adding models, storages and message busses.

To do that, one can just create a class that implements an interface:

Then just add a path to the class to the config file and that's it!

I suggest to distribute new modules through PyPI, so other people could reuse them.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

speech-recognition-api-0.1.1.tar.gz (15.8 kB view details)

Uploaded Source

Built Distribution

speech_recognition_api-0.1.1-py3-none-any.whl (27.0 kB view details)

Uploaded Python 3

File details

Details for the file speech-recognition-api-0.1.1.tar.gz.

File metadata

File hashes

Hashes for speech-recognition-api-0.1.1.tar.gz
Algorithm Hash digest
SHA256 f53c54a9c3328876721fc9f3e9f5f80b2ea4afc9f86a929552f75e73b2d5c257
MD5 6e12698f4497d20f74bfa7b61590a60c
BLAKE2b-256 74fd57eb039e2a5176dd6e182f135f58500c3e94fc9ec8c884489be8100b69de

See more details on using hashes here.

File details

Details for the file speech_recognition_api-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for speech_recognition_api-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 b19ce26bb2331884d9b9905e35d447c7fb16bc786358c142c96788276147ee84
MD5 2736d772e461529dab4fc75695d6d657
BLAKE2b-256 b9f44b2104cbdf9c21a67332ee1737d69fcb52e097f3c380537057778ae8f531

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page