Skip to main content

at16k is a Python library to perform automatic speech recognition or speech to text conversion.

Project description

Maintenance made-with-python PyPI license Open Source Love svg1 PyPI - Python Version Downloads

at16k

Pronounced as at sixteen k.

Try out the interactive demo here.

What is at16k?

at16k is a Python library to perform automatic speech recognition or speech to text conversion. The goal of this project is to provide the community with a production quality speech-to-text library.

Installation

It is recommended that you install at16k in a virtual environment.

Prerequisites

  • Python >= 3.6
  • Tensorflow = 1.14
  • Scipy (for reading wav files)

Install via pip

$ pip install at16k

Install from source

Requires: poetry

$ git clone https://github.com/at16k/at16k.git
$ poetry env use python3.6
$ poetry install

Download models

Currently, three models are available for speech to text conversion.

  • en_8k (Trained on English audio recorded at 8 KHz, supports offline ASR)
  • en_16k (Trained on English audio recorded at 16 KHz, supports offline ASR)
  • en_16k_rnnt (Trained on English audio recorded at 16 KHz, supports real-time ASR)

To download all the models:

$ python -m at16k.download all

Alternatively, you can download only the model you need. For example:

$ python -m at16k.download en_8k
$ python -m at16k.download en_16k
$ python -m at16k.download en_16k_rnnt

By default, the models will be downloaded and stored at <HOME_DIR>/.at16k. To override the default, set the environment variable AT16K_RESOURCES_DIR. For example:

$ export AT16K_RESOURCES_DIR=/path/to/my/directory

You will need to reuse this environment variable while using the API via command-line, library or REST API.

Preprocessing audio files

at16k accepts wav files with the following specs:

  • Channels: 1
  • Bits per sample: 16
  • Sample rate: 8000 (en_8k) or 16000 (en_16k)

Use ffmpeg to convert your audio/video files to an acceptable format. For example,

# For 8 KHz
$ ffmpeg -i <input_file> -ar 8000 -ac 1 -ab 16 <output_file>

# For 16 KHz
$ ffmpeg -i <input_file> -ar 16000 -ac 1 -ab 16 <output_file>

Usage

at16k supports two modes for performing ASR - offline and real-time. And, it comes with a handy command line utility to quickly try out different models and use cases.

Here are a few examples -

# Offline ASR, 8 KHz sampling rate
$ at16k-convert -i <path_to_wav_file> -m en_8k

# Offline ASR, 16 KHz sampling rate
$ at16k-convert -i <path_to_wav_file> -m en_16k

# Real-time ASR, 16 KHz sampling rate, from a file, beam decoding
$ at16k-convert -i <path_to_wav_file> -m en_16k_rnnt -d beam

# Real-time ASR, 16 KHz sampling rate, from mic input, greedy decoding (requires pyaudio)
$ at16k-convert -m en_16k_rnnt -d greedy

If the at16k-convert binary is not available for some reason, replace it with -

python -m at16k.bin.speech_to_text ...

Library API

Check this file for examples on how to use at16k as a library.

Limitations

The max duration of your audio file should be less than 30 seconds when using en_8k, and less than 15 seconds when using en_16k. An error will not be thrown if the duration exceeds the limits, however, your transcript may contain errors and missing text.

License

This software is distributed under the MIT license.

Acknowledgements

We would like to thank Google TensorFlow Research Cloud (TFRC) program for providing access to cloud TPUs.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

at16k-0.1.5.tar.gz (21.8 kB view details)

Uploaded Source

Built Distribution

at16k-0.1.5-py3-none-any.whl (24.4 kB view details)

Uploaded Python 3

File details

Details for the file at16k-0.1.5.tar.gz.

File metadata

  • Download URL: at16k-0.1.5.tar.gz
  • Upload date:
  • Size: 21.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.0.0a5 CPython/3.6.5 Darwin/19.6.0

File hashes

Hashes for at16k-0.1.5.tar.gz
Algorithm Hash digest
SHA256 c49c88a31f05d15c52c39d6cc39a25e0429949e38540b4e418be49633ed8b2be
MD5 a2ab7b46af04209d64bb637171e60154
BLAKE2b-256 7649f346340b0abae6e26cabba6ff91edd4ed32d0aeda9a5a04b8bbe5278ffd3

See more details on using hashes here.

File details

Details for the file at16k-0.1.5-py3-none-any.whl.

File metadata

  • Download URL: at16k-0.1.5-py3-none-any.whl
  • Upload date:
  • Size: 24.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.0.0a5 CPython/3.6.5 Darwin/19.6.0

File hashes

Hashes for at16k-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 cc658cdfa65dceb440ed60c6a91866617a3f1526cd4ec1c36ce34dedaf8e8066
MD5 9eb14b2bb003093035ce1aad8c86fa1a
BLAKE2b-256 3756832951570494d8718da89de135f9e308843b452b96f1677f3dce18a1f419

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page