Skip to main content

at16k is a Python library to perform automatic speech recognition or speech to text conversion.

Project description


Pronounced as at sixteen k

Maintenance made-with-python PyPI license Open Source Love svg1 PyPI - Python Version Downloads

What is at16k?

at16k is a Python library to perform automatic speech recognition or speech to text conversion. The goal of this project is to provide the community with a production quality speech-to-text library.


It is recommended that you install at16k in a virtual environment.


  • Python >= 3.6
  • Tensorflow = 1.14
  • Scipy (for reading wav files)

Install via pip

$ pip install at16k

Install from source

Requires: poetry

$ git clone
$ poetry env use python3.6
$ poetry install

Download models

Currently, two models are available for speech to text conversion.

  • en_8k (Trained on english audio recorded at 8 KHz)
  • en_16k (Trained on english audio recorded at 16 KHz)

To download all the models:

$ python -m all

Alternatively, you can download only the model you need. For example:

$ python -m en_8k
$ python -m en_16k

Preprocessing audio files

at16k accepts wav files with the following spces:

  • Channels: 1
  • Bits per sample: 16
  • Sample rate: 8000 (en_8k) or 16000 (en_16k)

Use ffmpeg to convert your audio/video files to an acceptable format. For example,

# For 8 KHz
$ ffmpeg -i <input_file> -ar 8000 -ac 1 -ab 16 <output_file>

# For 16 KHz
$ ffmpeg -i <input_file> -ar 16000 -ac 1 -ab 16 <output_file>


There are three ways to invoke at16k speech-to-text converter.

Command line

at16k-convert -i <input_wav_file> -m <model_name>


python -m at16k.bin.speech_to_text -i <input_wav_file> -m <model_name>

Library API

from at16k.api import SpeechToText

# One-time initialization
STT = SpeechToText('en_16k') # or en_8k

# Run STT on an audio file, returns a dict

Check for details on how to use the API.

REST API server

at16k-serve -p <port> -m <model_name>


python -m at16k.bin.serve -i <input_wav_file> -m <model_name>

Lastly, via Docker -

$ docker pull at16k/at16k:0.1.2
$ docker run -it at16k/at16k:0.1.2 -p <port> -m <model_name>

Check API Docs for details on how to use the REST API.


The max duration of your audio file should be less than 30 seconds when using en_8k, and less than 15 seconds when using en_16k. An error will not be thrown if the duration exceeds the limits, however, your transcript may contain errors and missing text.


This software is distributed under the MIT license.


We would like to thank Google TensorFlow Research Cloud (TFRC) program for providing access to cloud TPUs.

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for at16k, version 0.1.3
Filename, size File type Python version Upload date Hashes
Filename, size at16k-0.1.3-py3-none-any.whl (19.6 kB) File type Wheel Python version py3 Upload date Hashes View hashes
Filename, size at16k-0.1.3.tar.gz (17.9 kB) File type Source Python version None Upload date Hashes View hashes

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN SignalFx SignalFx Supporter DigiCert DigiCert EV certificate StatusPage StatusPage Status page